Tuesday, June 30, 2009

SQL Deadlocks: More with child data

In last week's post, SQL Performance: Child data, I wandered through an issue involving caching data about a parent table's child data. In that post I talked about writing SQL that would save a foreign key on the parent table to the most recent child record. This is very simple and would look like this:

begin tran TxExample

declare @NewChild int

insert into Child ( blah, blah, blah ) values ( @blah, @blah, @blah )

select @NewChildId = SCOPE_IDENTITY()

update Parent set CurrentChildId = @NewChildId where Parent = @ParentId

if @@TRANCOUNT > 0 commit tran TxExample

if @@TRANCOUNT > 0 rollback tran

This is relatively straight forward. It inserts the child, then updates the parent's cached data. Those two operations are wrapped in a transaction and a try catch to ensure that if anything should fail for any reason, both statements will be rolled back. This ensures data integrity.

And now it's time to talk about deadlocks. This code is susceptible to deadlocks. As a relatively contrived but none the less possible example suppose the following SQL could also be run:

begin tran TxExample2

update Parent set blah = @blah where ParentId = @ParentId


select * from Child where ChildParentId = @ParentId

commit tran
If these two queries were to run at the same time, operating on the same parentId, and SQL Server were to context switch them at just the write moment, they would deadlock. Specifically, if the first query completed its insert statement (line 6) and then SQL switched to the second query, we would deadlock.

This is because when the second query tries to select from the Child table, it will wait because the first query has inserted a new row and SQL Server's default isolation level is read committed, which means dirty data will not be read, instead it will wait for the data to be committed. So it's going to sit there, waiting for the first query to commit.

This isn't a deadlock yet. The deadlock happens when SQL switches back to the first query and attempts to execute the update on the parent. When it does this, it will try to obtain an exclusive lock on that parent row, but it won't be able to because the second query already has an exclusive lock from it's update. So it will wait for the second query to commit.

The first query is now waiting for the second query which is waiting for the first query and you have yourself a deadlock.

Before we fix it, we should ask ourselves "is this a big deal?" The answer is, it depends, but in general yes. If all your SQL is small and all your transactions complete quickly and you don't have very many users banging on the system concurrently then you probably wont see any deadlocks. But unless you can guarantee that all those conditions will remain the same you have to be at least a little worried. And if those conditions don't apply to you, you definitely have to be worried.

So how do we fix it? First thing we could do is to commit the transaction in the second query before executing the select. If this is possible, then it's a good idea. You want your transaction to commit as quickly as possible and you want to touch as few objects as you can while in the transaction. That said, there are plenty of reasons why you might not be able to commit the transaction after the update. For example, maybe you're reading the child data because you need it to perform another update, and those two updates have to be in the same transaction. In that case, there is nothing you can do to fix query #2.

But even if you could fix query #2, someone could some day come along and write query #3 which would introduce the same problem again. So what we really need to do is fix query #1. The way we do that is by having query #1 obtain a shared lock on all the resources we know it will need to touch, immediately at the top of the query.

Add this code after the BEGIN TRY:
set transaction isolation level repeatableread
select ParentId from Parent where ParentId = @ParentId
set transaction isolation level readcommitted

With this code in place, query #2 will not be able to execute it's update until query #1 completes. Thus, preventing the deadlock and saving the day!

This example was simple but the deadlock was still subtle and hard to see. This problem just gets more complicated the more complicated your SQL gets. And your SQL will get more complicated in direct relation to how complicated your data schema is. So you really have to be on the look out for this issue.

Before I wrap this up, I should mention that if you need to lock more than just one row in one table at the top of your query (like we did in query #1), life can get interesting. If the tables you are locking are all related you can lock them by inner joining to them. But if they are unrelated, things get interesting. If they're unrelated, you can't join from one to the next, so you need to execute separate select statements. And if two queries need to lock the same records in two unrelated tables, but they lock them in different orders (A, B vs. B, A) you can end up with a deadlock! For these cases you have to resort to what you learned in your operating systems class: always lock all your resources in the same order. Good luck with that.

I'll leave you with some rules of thumb, which apply to most cases but, of course, not all:
  1. Keep your transactions as small as possible by touching as few objects as possible
  2. Keep your transactions as fast as possible: if you have a query that can execute on n records in a single transaction where n is unbounded you are likely to find yourself in a world of hurt
  3. Obtain shared locks on everything your transaction will eventually require exclusive locks on before you acquire any other locks
  4. If you need to do any reads that don't need to be repeatable, do them before you obtain any shared or exclusive locks (this is really just in keeping with #2)
  5. If you set the transaction isolation level to repeatable read make sure you're setting it back to read committed (even if its the last line of your query, this will make sure triggers don't execute in repeatable read)
Now, all of this has been learned by trial and error, experimentation, and a lot of reading. If you know of other ways around these issues, or if you have a different take, I definitely want to hear about it.

No comments:

Post a Comment