Wednesday, August 4, 2010

Entity Framework 4.0 and DB versioning

I've recently tried VS 2010 on a new commercial project (no more labs and investigating spherical horse in vacuum:). The application uses DB extensively, there're many various business entities and EF 4.0 was chosen to be used as an ORM solution and DAL for it .

In fact EF 4.0 introduces various new features (either comparing to previous version of EF or to other OR Mappers). That makes development easier and code cleaner:

Having many improvements in EF 4.0 I still didn't find a feature that could be very convenient in enterprise development: the capability to version the DB, generated from entity model. Regarding code generation capabilities of EF 4.0 they are the same as in EF 3.5: generate DB scripts from model OR generate model using an existing DB.

On most of my projects (so as on the current) there were no full understanding of how the data model will look in the end. A convenient work flow is that we have a set of user stories for the current milestone and create all the data structures for them. When a new milestone comes - new user stories, changes to the existing data model, new entities etc. When you have a DB for a great application with many data in the DB you can't simply drop and recreate it every time there're changes to data model. That's why DB versioning approach involving bunches of SQL scripts is usually used. The traditional workflow for DB versioning with ORM used is the following (I used it with nHibernate, LinqToSql and some other ORM):
  1. Maintain DB via a set of change and create scripts
  2. When data model is to be changed - add a new change script (when a table or relation is to be changed) or a create scripts (some tables are added)
  3. Run scripts against DB server (either use some special versioning tool that may track which scripts are to be run or a simple approach with DBVersion table)
  4. If necessary - also manually run some migration scripts if changes require that
  5. Sync DB changes with O/R mapping in the code, adjust entity classes
As you may see, the approach desribed above is technically 'DB first' (I used Visio to draw the schema and generate scripts) . But with the EF I've decided to use the Entity Model designer to create the data model and use it for DB creation (model first approach). The question was how may I version the DB in this case. After I googled for a while, played with VS I didn't find any feature in EF 4.0 that could provided me with the necessary functionality in place. On the other hand VS 2010 presented a new and very powerful DB project.

Just a few words about VS 2010 DB projects. In a new version the approach to maintaing DB's has significantly changed. At a first glance you may see numerous SQL Script files that are spread over numerous folders under the project and decide that all of that is just a repository for scripts that are then directly executed against a DB:

But the fact is that all these scripts are not for executing against DB. They serve to describe internal DB structure within the project only. Each table, each key, any DB object has it's own script file with simple SQL create statement. You may also use a graphical schema editor to change the DB structure - all scripts will be changed accordingly. And here comes the power of the DB project: when you build the DB project against a specific DB on a server the build system creates the internal representation of the DB in the project, compares it with the target DB and dynamically creates a scripts with all necessary SQL statements . Executing this script leads to creating and changing the target DB and databases in the project and on a server are synchronized. You may find more info on new DB project here.

To cut long story short I've decided to use EF generation tool and VS 2010 DB project together and that's what I came to:

At first
  1. Create the first version of Entity Model and generate SQL script for it
  2. Run the script and create DB
  3. Create a new DB project and generate it's contents using existing DB (created at previous step)
During further development
  1. Change data model via EF designer (add/delete entities, associations, new fields etc.)
  2. Generate scripts for the entire model
  3. Manually sync changes with the DB project: you may either extract some parts of the generated scripts and copy past them, manually fix scripts within DB project, use Schema editor
  4. Build and deploy the DB project to the used DB (Deploy action for DB project generates script and automatically executes it against DB if you check the corresponding option in the DB project properties window)
On the one hand this approach is not fully automated (you can't just change the model and click a button to have DB updated), on the other hand it gives good versioning and deployment capabilities (no more need to dig in hundreds of SQL scripts), also very simple rules for keeping the DB and Data Model synchronized.

NOTE #1. For continuous integration scenario:
I've faced a problem with building and publishing VS 2010 DB projects on a build server with no VS installed. The bunch was the following: SVN, CC.NET, nAnt, msbuild. The problem is that you can't do that. Even having .NET 4.0 SDK (I've use the latest available version in July) you'll not be able to publish and deploy the VS 2010 DB project without having VS 2010 installed.

NOTE #2. For those who's trying to change column name in a table via script in DB project :)
Use schema editor, this will not lead to dropping the column and creating a new one, it'll be only renamed.

Hope this post may help others.