One of the main things we had learned was the importance of designing for automation. Probably the single most valuable automation related thing we did was automate the scripting of our database. We are using FluentMigrator to define migrations and psake with sqlcmd to create and drop database instances. I'm not all that crazy about FluentMigrator (its syntax is overly verbose, it's code seems overly complex) but it has worked very well for us so far.
We use this tool in a few different circumstances:
- To apply other people's changes to our local dev databases
- To migrate our QA environment
- To spin up from scratch (and tear down) automated test environments
- To deploy database changes to production
Database migrations have made working with our code base so dramatically easy, it's really amazing. You can spin up a new environment with one command in seconds. Incorporate other devs changes without a second thought.
Migration tools pay most of their attention to schema changes, but data gets largely ignored. So much so that we had to rough in our own approach to dealing with data, which works, but also sucks. And there doesn't seem to be any clear recommendations for data. There are some categories of data I'm concerned with:
- List Data: these are values that are usually displayed in drop downs and are either not user generated, or there is a default set we want to seed (ex: states, name suffixes, record types, etc)
- Configuration Data: this is data that is required for the system to work, (ex: server uris, an email address to send from, labels to display in the app, etc)
There are only two approaches to dealing with seed that I'm aware of:
- Run the schema migrations, then insert all the seed data
- Insert the seed data inside the migrations
At first blush #1 seemed easier, so it's the approach we originally took, but it has some drawbacks. The first challenge is to avoid duplicates. You can do that with IF NOT EXISTS(...) statements, or by inserting Primary Keys and ignoring PK violation errors. The second is dealing with schema changes to tables with seed data.
For example, suppose you write a migration to create a table, and you write insert statements to add seed data to it. You have already run your migrations and seeded the database. Now you write a new migration which adds a new not null column to the table. This migration will fail because the table already has data from the seeds and the new column requires a value. In this situation, you're hosed. You have to write the migration to add data into that column before making the column not allow nulls (or use a default value). Effectively you have to duplicate the seed data for that column in both the migration and the seed data file. You need it in both places in order to support creating the database from scratch and migrating an existing database.
#2 seems to have it's own downsides too. There is no one place that shows you all the data that will be seeded in a given table, it's spread throughout the migrations. It precludes you from having a schema.rb type of system (unless that system could somehow be smart enough to include data). That point is somewhat academic at this point though, because FluentMigrator doesn't have anything like this.
However, #2 is much more flexible and can accommodate anything you could ever dream of doing with data (that SQL would support of course). And I feel that pure #2 would be better than a #1/#2 hybrid, because it's just more straight forward.
And now I'd like to ask you, do you know of a good way of dealing with seed data? Are there any existing tools or strategies out there that you could link me to?