Wednesday, April 11, 2007

Memory Leaks, Garbage Collection, and .NET

I have frequently read and been told that .NET (and Java) do not have memory leaks. Of course, this depends on the definition of a memory leak.

One definition says a memory leak occurs when a program allocates memory from the operating system but never gives it back. This is pretty clear, so I'm going to stick with it.

Why does .NET not have memory leaks? It has a garbage collector. The garbage collector checks your memory for you every so often to determine if it is still referenced anywhere. If it is, its assumed you're using it. If it isn't, you can't possibly be using it, and the memory is freed and given back to the system.

Therefore, languages with garbage collectors clearly can't have memory leaks, right? I'm going to show you why I disagree with that statement.

Lets look at a simple example in C#. Suppose you've written a class which acts as a DataTable cache. This will be a static class. The purpose of this class is to store DataTables which will be used by various controls throughout the application. If a DataTable gets updated, this class will update all the controls that use that DataTable.
public static class DataTableCache
{
Dictionary dtCache = new Dictionary();
Dictionary controlStore = new Dictionary();

public static void AddDataTable( string key, DataTable table, Control ctl )
{
if ( dtCache.ContainsKey( key ) )
{
dtCache[key] = table;
// update controls using key, if any exist yet
}
else
{
dtCache.Add( key, table );
// update controls using key, if any exist yet
}
AddControl( ctl, key );
}

public static void AddControl( Control ctl, string key )
{
controlStore.Add( ctl, key );
}

public static DataTable LookupDataTable( string key )
{
if ( dtCache.ContainsKey( key ) )
return dtCache[key];
else
return null;
}
}

I'm keeping this example as simple as possible, but it is modeled on real code. There is a "memory leak" problem here. Suppose you create a Form, f. On that form you add a ComboBox, cb. You obtain a DataTable which you use as cb's DataSource and you add them both to the DataTableCache. When f is closed it will be disposed along with all of its controls, including cb. However, because cb is still in the controlStore dictionary it will never be freed by the garbage collector. It has been disposed, but it is still referenced by our static DataTableCache.

No, don't worry, I'm not claiming this qualifies as a memory leak. This is simply programmer error. The programmer should have accounted for this problem. There are two obvious ways to fix this. The first might be to add a RemoveControl method and call that when the form closes. The second, and better approach, is to register the control's Disposed event in DataTableCache and remove it from the dictionary when that event handler is fired. And while we're at it, we'll also make the event handler check to see if any other controls are still using the disposed control's key after it is removed. If none are, we'll remove the DataTable too.

AddControl would now look like:

public static void AddControl( Control ctl, string key )
{
controlStore.Add( ctl, key );
ctl.Disposed += new EventHandler( ctl_Disposed );
}

And the event handler would look like:
private void ctl_Disposed( object sender, EventArgs e )
{
string key = controlStore[(Control)sender];
controlStore.Remove( (Control)sender );
if ( !controlStore.ContainsValue( key ) )
dtCache.Remove( key );
}

Problem solved. No more memory leak.
Actually, wrong. We still have a memory leak. The DataTable will be disposed and freed by the garbage collector, but the Control wont be. If you don't believe me, try it out yourself. You can use a memory profiler to see that the combo box remains in memory no matter how many times you run the garbage collector.

This is the memory leak! No where in our code do we have a reference to the combo box, so why isn't the garbage collector working? Well, it turns out its because we registered the Disposed event. Any time you register an event you get this little intermediate object behind the scenes which holds a reference to the object you registered the event on and the object you registered the event from. So if you have ever registered an event between two objects, where one of them had a longer lifetime than the other, then you have a memory leak. And if one of those objects is static, as in this example, then the memory wont be given back to the system until the application is closed.

How serious is this? It depends on the circumstances, but most of the time it will be pretty serious. In our example its very likely that the Form, f, had registered the ValueChanged event of the ComboBox, cb. That's a very common need. Because of this, both cb and f will never be freed. And imagine how many controls could potentially be on f. Now none of them can be freed either.

How do we solve this? Its very easy, we just modify the Disposed event handler as follows:
private void ctl_Disposed( object sender, EventArgs e )
{
string key = controlStore[(Control)sender];
controlStore.Remove( (Control)sender );
((Control)key).Disposed -= new EventHandler( ctl_Disposed );
if ( !controlStore.ContainsValue( key ) )
dtCache.Remove( key );
}

I consider this a memory leak because 1) its hard to consider this programmer error since its actually the .NET framework which contains the references and 2) this problem is very very hard to remember to avoid. This means you really do need to use a memory profiler on your applications before releasing them. And therefore I claim that C#, at least, does suffer from memory leaks.

I haven't had a chance to test this on any other languages. I'd be very interested to hear if the same is true in Java, Python, and Ruby for example.

7 comments:

  1. Im going to go out on a limb and say the flaw i see with this series of examples is that common design standards say that a static class should not have state. By using a static class as a collection of other object, the creation and destruction of which occur outside of the control of the class you are giving it state it shouldnt be given.

    Regardless i know this happens, and i believe this is the same thing you mentioned a while back on the BForumn which i found interesting enough to take it to our architects here. FWIW they looked into it and found some places where it did indeed decrease the memory footprint.

    ReplyDelete
  2. I have to admit, I've never heard that design standard. It certainly makes sense though. Most static classes serve more of a utility function. Thanks for pointing that out.

    That being said, this pattern is used quite frequently. In fact, Infragistic's Win Form controls use it internally and therefore suffer heavily from this problem. I should write a post about that...

    Of course, this problem still exists even without static classes. As long as one object has a longer life time than another and events are being registered between them, you'll get these memory leaks.

    ReplyDelete
  3. I really cant speak to infragistics win forms stuff. It might be quite good.

    However, we stopped using infragistics web forms stuff months ago because, to put it kindly, it is complete and utter shit.

    Their lack of any concern for CSS standards led to a million headaches when we tried to customize or attach javascript events to their controls. They use these layers of nested html controls with their own really freaking weird custom javascript that doesnt work well.

    I think i either read the static classes dont have state in "The Pragmatic Programmer" or "Code Complete", my first instict is Code Complete because TPP isnt really about low level development concepts, its about how to be a truley professional programmer.

    ReplyDelete
  4. Well,

    I would say the static class design specification is a myth. The question here is what is a singleton? mmmm.... I would say that is equivalent to a static class with state. So lets go back to bad microsoft design.

    ReplyDelete
  5. Ed, if you don't unregister your events, it isn't Microsoft's fault. It would be a really bad design if it *did* unregister events automatically in the garbage collector, as that would produce non-deterministic behavior (because you don't know when the gc is gonna run, so you wouldn't know when the event will be unregistered.) Non-determinism is bad. Language/platform designers generally try to avoid it. Furthermore, what if you had an object that only needed to do anything when some event occurred? It would not seem unreasonable for the only reference to that object to be a delegate registered with some other object's event. In Java, this is the standard way to create event listeners, largely due to Java's lack of function pointers or delegates.

    ReplyDelete
  6. Clarification on that last part: I meant that using objects that are only referenced as event subscribers is the standard way of handling events in Java, not that this process used delegates (because Java doesn't have delegates.)

    ReplyDelete
  7. its a good understandable blog which helps me a lot to solve my memory leak issue in .net :)

    ReplyDelete