Generating data for tables in table per class inheritance tree using data generation plans

I am using Visual Studio database project and data generation plan to populate my database with random data. In my database there are some tables for table per class inheritance mapping and there are some additional steps required to get these tables filled with data correctly. In this posting I will describe the process of generating data for table per class inheritance tables and give you some hints how to make this data valid for O/R-mappers.

Table per class inheritance mapping

Let’s start with simple class diagram. If you are not familiar with party abstraction you can read more about it from my blog posting Modeling people and organizations: Class Party.

In database I have table for each class shown in diagram. Company and Person share primary key with Party. Party defines primary key and if class is of type Person then table Person has row with same primary key as row in table Party. So the connections are 1:1 relationships between primary keys of tables. Here is the fragment of my database diagram that illustrates parties.

This inheritance strategy is supported by NHibernate and Entity Framework but not by LINQ to SQL. You can read more about inheritance mapping from article Mapping Objects to Relational Databases: O/R Mapping In Detail by Scott W. Ambler. Also I strongly recommend reading his books Agile Database Techniques: Effective Strategies for the Agile Software Developer and Building Object Applications That Work.

Data generation plan

I need a lot of data to test my database and see how mappers perform. I use data generation plans to simplify this task. Although data generation plan makes good work it has no knowledge or idea about my mapping strategy and therefore I need some data fixing when new data is generated. The following screenshot shows plan with not so much data.

I tried different tricks to get data generated correctly but I failed at every attempt. The problem is – there will be primary keys in party table that exist in both tables – persons and companies. On object level it means that object has two types at same time and two identities. It is not possible on object level and mappers throw errors when they discover this situation. Also I cannot allow rows in party table that have no related rows in persons or companies table – remember, we cannot instantiate abstract classes.

Cleaning data

To avoid instantiating abstract classes I let generate data to persons and companies tables and it is not problem if data is duplicated. I wrote simple stored procedure that deletes 50% rows from persons table and all the rows from companies table that are represented in persons table. This way I have 10000 rows in both tables but their identities are different. Here is my stored procedure.

CREATE PROCEDURE CleanData
AS
BEGIN
 SET NOCOUNT ON;
 
 -- Delete parties that have no rows in person and company tables
 delete
 from 
  party
 where 
  party_id not in (select party_id from person)
  and party_id not in (select party_id from company)
 
 DECLARE @i INTEGER
 DECLARE @sql NVARCHAR(MAX)
 
 -- Delete 50% of rows from person
 SELECT @i = COUNT(*) / 2 FROM person 
 SELECT @sql = N'DELETE FROM person WHERE party_id IN (SELECT TOP ' + 
                       CAST(@i AS NVARCHAR(MAX)) + ' party_id FROM person)'
 EXEC sp_executesql @sql
 
 -- Delete rows from company that also exist in person table
 delete 
 from
  company
 where
  party_id in (select party_id from person)
END

After generating new data to database I run this stored procedure to avoid errors in mappings.

Conclusion

Data generation plans are powerful tools to generate data to databases so it is possible to test your application in close to real-life situations. Data generation plans are flexible. Still they have some lacking support for object relational mapping needs but as we saw it was very easy to get over this problem. Same procedure can be extended or generalized for more than one cleaning situation.

Unity, Castle Windsor, StructureMap, Ninject - who has best performance? »

« Using timer based Unit of Work and Command classes to measure repositories performance

Categories: Data platform Visual Studio

Building web API apps on ASP.NET Core 2 and VB.NET
ASP.NET Core 2 will come with full support for VB.NET. In current preview we can…
MSSQL: Finding weekend days in month
One of my applications displays tables with some calculations about users working time. This is…
Readable fluent queries with Entity Framework Core
After my first experiments with Query Specification pattern on Entity Framework Core I came to…
Hosting web application on Azure
Coming to cloud with web application can be scary if it’s the first time you…
MSSQL & NHibernate – mapping week numbers to properties
In one of my applications I have to use week numbers as properties of business…