15 09 2010
Generating data for tables in table per class inheritance tree using data generation plans
I am using Visual Studio database project and data generation plan to populate my database with random data. In my database there are some tables for table per class inheritance mapping and there are some additional steps required to get these tables filled with data correctly. In this posting I will describe the process of generating data for table per class inheritance tables and give you some hints how to make this data valid for O/R-mappers.
Table per class inheritance mapping
Let’s start with simple class diagram. If you are not familiar with party abstraction you can read more about it from my blog posting Modeling people and organizations: Class Party.
In database I have table for each class shown in diagram. Company and Person share primary key with Party. Party defines primary key and if class is of type Person then table Person has row with same primary key as row in table Party. So the connections are 1:1 relationships between primary keys of tables. Here is the fragment of my database diagram that illustrates parties.
This inheritance strategy is supported by NHibernate and Entity Framework but not by LINQ to SQL. You can read more about inheritance mapping from article Mapping Objects to Relational Databases: O/R Mapping In Detail by Scott W. Ambler. Also I strongly recommend reading his books Agile Database Techniques: Effective Strategies for the Agile Software Developer and Building Object Applications That Work.
Data generation plan
I need a lot of data to test my database and see how mappers perform. I use data generation plans to simplify this task. Although data generation plan makes good work it has no knowledge or idea about my mapping strategy and therefore I need some data fixing when new data is generated. The following screenshot shows plan with not so much data.
I tried different tricks to get data generated correctly but I failed at every attempt. The problem is – there will be primary keys in party table that exist in both tables – persons and companies. On object level it means that object has two types at same time and two identities. It is not possible on object level and mappers throw errors when they discover this situation. Also I cannot allow rows in party table that have no related rows in persons or companies table – remember, we cannot instantiate abstract classes.
To avoid instantiating abstract classes I let generate data to persons and companies tables and it is not problem if data is duplicated. I wrote simple stored procedure that deletes 50% rows from persons table and all the rows from companies table that are represented in persons table. This way I have 10000 rows in both tables but their identities are different. Here is my stored procedure.
CREATE PROCEDURE CleanData
SET NOCOUNT ON;
-- Delete parties that have no rows in person and company tables
party_id not in (select party_id from person)
and party_id not in (select party_id from company)
DECLARE @i INTEGER
DECLARE @sql NVARCHAR(MAX)
-- Delete 50% of rows from person
SELECT @i = COUNT(*) / 2 FROM person
SELECT @sql = N'DELETE FROM person WHERE party_id IN (SELECT TOP ' +
CAST(@i AS NVARCHAR(MAX)) + ' party_id FROM person)'
EXEC sp_executesql @sql
-- Delete rows from company that also exist in person table
party_id in (select party_id from person)
After generating new data to database I run this stored procedure to avoid errors in mappings.
Data generation plans are powerful tools to generate data to databases so it is possible to test your application in close to real-life situations. Data generation plans are flexible. Still they have some lacking support for object relational mapping needs but as we saw it was very easy to get over this problem. Same procedure can be extended or generalized for more than one cleaning situation.