This is about me and my quest for, er... greater things in life? Good food, good drinks, friends and family and my eternal quest to figure out what I want to do when I grow up. (hint: it's probably going to involve code)

Tuesday, July 29, 2008

Exploratory and evolutionary data schema design

When I was at the SD West conference last March in Santa Clara, I listened to Scott Ambler's talk about databases and the need to make them evolve in the same manner we write code: by refactoring them. Most programmers know about Agile methodologies and how they apply to our code, but not a whole lot of people know how to apply those methods to database design.

Your data schema will change. This is as certain as death and taxes (although the income tax is really only about 100 years old!) You design your project and your data schema, you implement and deploy your solution and... boom!, you need to change something. When your database design needs to change and you have code in production, or your tables are being accessed by many other applications, you can't just rename columns or delete tables. Scott, in his book Refactoring Databases: Evolutionary Database Design, talks about how to implement changes to your schemas without affecting current production code. 


During his talk at SD West, Scott mentioned that data problems in the United States is costing us about 600 billion dollars! I truly believe that with proper evolutionary database design practices in place, we can bring that value down quite a bit, perhaps making some money in the process. After all, $600 billion seems like quite a reasonable market to me! 

Unfortunately, we currently lack tools to version our database schemas. The only one I've come across so far is from Ruby on Rails and is called Migrations, which is really useful. You basically edit a file for each new revision of your schema that tells Migrations what to do to get your schema to the next version number. Then, should you want to go from version 2 to 8, Migrations will do that for you automatically.

But I need something more powerful. I work with two other developers and we might each have a different version of a database design on our desktops in order to isolate, or sandbox, our development. Once we are ready, we apply our changes to the integration server. This means that four different computers will have different schema versions. If I integrate first, and then Joe integrates, what happens? I added a column A to a table and he removed the table! We have a conflict. What if he added a column B and I added a column C to the same table? We need to merge.

Wait, that sounds a lot like source control! Yep, and Migrations doesn't do that. I want to create a tool that handles database schemas the way source control handles source code, with commits, branches and merges in a way that allows programmers to experiment with their designs. Exploratory and evolutionary data schema design; we'll see where this takes me!

0 comments: