Tuesday, September 28, 2010

Questioning ORM Assumptions

I come from a stored procedure background to data access, with output parameters and datatables strewn throughout C# code.  I have "recently" been learning ORMs (specifically NHibernate, and Entity Framework).  I've done some prototyping with both, and used NHibernate on a few projects.

In 1-1 situations I have found it is incredibly nice to not have to write SQL, or virtually any data access code at all.  In situations that required some form of mapping (components, inheritance, etc) it's also very nice, though things become more brittle and error prone.  In fact, even in 1-1 situations, I've been surprised by how brittle NH mappings are.  Change just about anything on your entity and you're likely to break your mapping somehow.  But that seems to be a price it is worth paying to avoid writing SQL and manual mapping code.

However, I've recently been questioning some of the features ORMs bring.  I think most people would consider these features absolute requirements of an ORM.  However, I'm beginning to doubt how valuable they really are.  Perhaps some of this is in reality more trouble than it's worth?

Unit of Work

The first pattern I have some issues with is the Unit of Work pattern.  This is the pattern used by ORMs to allow you to get a bunch of objects from the ORM, make any changes you want, and then just tell the ORM to save.  The ORM figures out what you changed, and takes care of it.  There are two major benefits to this pattern:
1. You don't have to manually keep track of all the objects you changed in order to save them.  The ORM will just know what you changed, and make sure it gets persisted.
2. You don't have to concern yourself with the order things get saved in.  The ORM will automatically figure it out for you.

My first issue with this pattern is that it is not very intuitive.  You have to tell the ORM about new objects, and you to tell it to delete objects, but you don't have to tell it to update objects.  And, in fact, you don't have to tell it about ALL new objects as it will automatically insert some of them depending on how your mappings and objects are setup (Parent/Child relationships, for examples).  It tends to be further confused by the APIs frequently used.  For example, a lot of people use a Repository/Unit of Work pattern to hide NHibernate's session object.
var crypto = BookRepo.GetByTitle( "Cryptonomicon" );
crpto.Rating = 5;

var ender = new Book { Title = "Ender's Game", Author = "Orsan Scott Card" };
BookRepo.Add( ender );

uow.Save();
What happens at BookRepo.Add( ender )?  Does that issue an Insert to the database?  Is the crypto.Rating update saved?  And where the heck did this uow object come from and what relationship does it have with the BookRepo?!  If you know this pattern, you're probably so used to it that it doesn't seem strange.  But when you step back from it, I think you'll agree this is a pretty bizarre API.

Truth be told, some of this confusion is actually due to the Repository pattern.  You are supposed to think of a Repository as an in memory collection of objects.  The persistence is under the covers magic.  If you're writing an application where persistence is one of the primary concerns, I always thought it was kind of stupid to adopt a pattern which tries to pretend that persistence isn't happening...

But back to Unit of Work, the second issue I have is a certain loss of control.  It is very easy for you to write code using a unit of work and then have no idea what is actually being saved to the database when you issue the Save command.  To me, that's a really scary thing.  Now, to be fair, if you find yourself with code like that, it's probably really bad code.  But that doesn't change the fact that this pattern almost encourages it.  There is something nice about ActiveRecord's approach of calling Save on each entity you want to save to the database.  You're certainly gaining back control.

My last issue, and this one isn't really that big of a deal, but is still something that bothers me a bit...  The Unit of Work pattern couples the way you make changes to the transactions that are used to save them.  In other words, you can't change object A and object B, then save A in one transaction and B in another.  Instead, you'd have to change A, save it, change B, save it.  Like I said, this is a minor sort of quibble, but demonstrates again the assumptions made by the UoW pattern which steals some of your control.

None of these issues are all that serious.  But I still believe that Unit of Work is a very awakward way of dealing with your objects and persistence.

Lazy Loading

ORMs use Lazy Loading to combat the "Object Web" problem.    The object web problem arises when you have entities that reference other entities that reference other entities that reference other entities that ...  How do you load a single object in that web without loading the ENTIRE web?  Lazy Loading solves the problem by not loading all the references up front.  It instead loads them only when you ask for them.

NHibernate and Entity Framework use some pretty advanced and somewhat scary "dynamic proxy" techniques to accomplish this.  Basically they inherit from your class at run time and change the implementation of your reference properties so they can intercept when they are accessed.  There are some scenarios where this dynamic inheritance can cause you problems, but by and large it works and you can pretend its not even happening.

Lazy loading as a technique is very valueable.  But I think ORMs depend on it too heavily.  The problem with Lazy Loading is performance.  Its easy to write code that looks like it executes a single query to the database, but in reality ends up executing 10 or more.  At the extreme you have the N+1 select problem.  Once again, it boils down to trying to pretend the data access isn't happening.

DDD's solution to the Object Web problem is Aggregates.  An Aggregate is a group of entities.  The assumption is that when you load an Entity all its members will be loaded.  If you want to access another aggregate, then you have to query for it.  This cleanly defines when you can use an object traversal, and when you need to execute a query.  Basically, it forces you to remove some of the links in your object web.

By making Lazy Loading so easy, ORMs kind of encourage you to build large object webs.  Entity Framework in particular because it's designer will automatically make your objects mimic the database if you use the db-first approach and drag and drop your tables into the designer.  Meaning you will have every association and every direction included in your model.

While I don't have a problem with Lazy Loading, I do have a problem with using it too much.  This is the main reason why you read so much about people "profiling" their ORM applications and discovering crazy performance problems.  Personally, I'd rather put some thought into how I'm going to get my data from the persistance store up front then have to come back after the fact and waste tons of time trying to find all the areas where my app is executing a crazy number of queries needlessly.

Object Caching

NHibernate and Entity Framework keep a cache of the objects they load.  So if you ask for the same object twice, they'll be sure to give you the same instance of the object both times.  This prevents you from having two different versions of the same object in memory at the same time.  If you think about that for awhile, I'm sure you'll come up with all kinds of horror scenarios you could get into if you had two representations of the same object.

But I think this is an example of the ORM protecting me from myself too much, its just not that important of a feature.  Instead it adds more magic that makes the data access of my application even harder to understand.  One time when I say GetById( 1 ), it issues a select.  But the next time it doesn't.  So if I actually wanted it to (to get the latest data for example), I now have to call Refresh()...

Wrap Up

I got into all this because I didn't want to write SQL and I didn't want to write manual mappings.  I certainly got that.  But I also got Unit of Work, Lazy Loading, and Implicit Caching.  None of which I actually NEED and certainly never wanted.  And many of which actually create more problems than I had before!

Some Active Record implementations manage to fix these issues.  But I have concerns with using Active Record on DDD like code.  The main concern is that I want to model my domain, not my database.  The other big concern is I prefer keeping query definitions out of the entities, as it doesn't feel like their responsibility.

Now I'm not claiming any of these issues are a deal breaker to using NHibernate or Entity Framework or other ORMs.  But on the other hand, it doesn't feel like these patterns are the best possible approach.  I suspect there are alternative ways of thinking about Object Relational Mapping which may have some subtle affects on how we code data access and lead to better applications, developed more efficiently.  For now though, I'm settling for NHibernate.

7 comments:

  1. I find that NHibernate makes me think harder about my data access, but in return it frees me from busy work and gives me less code.

    Since I like thinking much more than I like writing stored procedures and dumb data access code, it's a worthwhile tradeoff for me.

    Also, I don't think you're giving enough credit to all the things you get for free with an ORM. Caching, almost free. Optimistic concurrency, almost free. Utility to update your db schema from your mappings, free. Complicated patterns like Hi/Lo ids and batch fetching, free. It's worth the learning curve and extra mental work, IMO.

    ReplyDelete
  2. I'm with Gabe on this one. I think the problem you are encountering is that many ORMs don't have a low barrier to entry for apps with a simpler data model, or access and persistence patterns. And the ones that do, such as Active Record, require your design to be subservient to their model, rather than vice versa.

    For simple use cases, you're correct, much of the stuff you get is of little true value, and the things you DO want may not be worth the hassle. The only real value proposition there is avoidance of SQL (a useful DSL in my opinion), and I'm not convinced that results in a net reduction of implementation effort... Unless you just have no skill with SQL or relational modeling.

    I haven't seen a good story in the ORM space for transitioning from a simple use case to a more complex one you might see in a mature system. Most ORMs just aim at one extreme or the other.

    Maybe the end answer is that we just need to be satisfied with the fact that the problems differ a lot between the two extremes. Maybe we need to just accept that transitioning from one to the other is inherently an *architectural* challenge, rather than just an implementation detail as it's often proposed the ideal case should be.

    ReplyDelete
  3. basically ORMs, some more (e.g. NHibernate) than others, are a very leaky abstraction

    ReplyDelete
  4. Thanks for the comments guys!

    I'm not trying to advocate throwing the baby out with the bath water and abandoning ORMs and all the advanced features they bring altogether (though my focus on mapping and SQL in the post kind of makes it sound that way...).

    Instead I'm more wondering out loud if some of the patterns used by ORMs are really the best way to think about data access.

    For example, I could imagine a system that doesn't use Unit of Work, uses limited lazy loading, and doesn't bother with in memory caching but DOES integrate with 2nd levels caches (like memcache), and handles optimistic concurrency, and is capable of batching reads and writes.

    I agree that what we have now not only works, but also has a lot of benefits. I just think it's adopted some patterns that hinder how efficient and easy to work with it could be.

    ReplyDelete
  5. Have you looked at myBatis.NET? http://www.mybatis.org/dotnet.html It's a pretty good compromise between NHibernate and writing your own SQL statements.

    ReplyDelete
  6. Great thoughts for sure. We spent close to a year trying to switch from a brittle old data access layer to NHibernate. We are using Sybase ASE so our ORM options were limited.

    We spent close to a year trying to implement the beast. We encountered a lot of the same friction that you did. We ultimately stumbled upon http://bltoolkit.net/. It has been such a freeing experience. Everything is so fast and simple. I know what's going on at all times.

    Everyone needs to forge their own path, but this has been a boon for us.

    ReplyDelete