Thursday, April 30, 2009

What Mercurial Can't Do: Merge by Changeset

UPDATE 12/1/2011: Mercurial CAN do this!  You used to need to enable the transplant extension.  But since v2.0 it's included out of the box in the form of the graft command.  I should also note that the problems I mention with this approach in TFS do not exist in the Mercurial.  So, basically, go use the graft command and don't read this old post (unless you want to read about how bad TFS is).


Mercurial is a distributed source control system that works very well on Windows and has great windows shell integration with TortoiseHg.

TFS is a centralized behemoth that does source control but also integrates (usually poorly) with every product Microsoft has ever released (not including Bob).

I've used both of these, but I've used TFS much more extensively. I recently started looking into what it would take to switch from TFS to Mercurial and was rather surprised to find a couple things that TFS can do that Mercurial cannot.

The first of these is the ability to do a merge by changeset. In TFS, say you create some branches as follows:
  1. Create a new TFS project called Project
  2. Check in some source at $\Project\Source
  3. Branch that source to $\Project\NewBranch
  4. Do 3 checkins to $\Project\NewBranch
Suppose you're not completely done with whatever it is you're working on so you can't merge all three changesets back to $\Project\Source just yet. But lets say your second changeset includes a bug fix to a file that you wont be changing as part of your "main" work in the other changesets. Maybe you want to go ahead and merge this bugfix back to $\Project\Source right now.

In TFS you could do this very easily.
  1. In Source Control Explorer, go to $\Project\NewBranch
  2. Right click and select "merge"
  3. Change to the "Merge by selected changeset" radio button
  4. Make sure the target is $\Project\Source
  5. Click next
  6. On the next page, select only the second changeset
  7. Click next and the merge is performed
Ta-Da! You just merged changeset #2 without merging changeset #1 or #3.

You cannot do this in Mercurial, at least, not with a "merge" operation. The only way to accomplish the same type of thing would be to create a patch out of NewBranch and apply it to Source using the hg export and hg import commands.

So the big question is, why can't you do this in Mercurial? The answer goes to the heart of what makes Mercurial so different from TFS. The first thing to realize is that Mercurial does not have "branch lines."

In TFS when you branch code TFS knows that the one is a parent of the other and you can only merge across that branch line. This means you can't do merges between two siblings. For example, in B -> A <- C, you can't merge B to C. You can only merge along the branch lines (Unless you do a baseless merge, which doesn't really count). In Mercurial, the normal way of creating a branch is to simply clone the repository, which means you have a full copy of the entire history of the repo. The image to the right shows what an hg pull would look like when bringing changes from a cloned repository into the original repository.

"A" represents the starting point. "B" represents the first change from the repository we're pulling in. What you can see here is that when you pull changes from mercurial, a temporary "branch" is created containing all the changesets from NewProject in parallel with any changesets from Source.

"C" is the only actual merge, in the tranditional way we think of merging, because it actually brings all the changes together. This is beautiful in its simplicity because until you get to C you don't have to do any work. Each changeset represents what changed from the parent, so you just import all the changesets and associate them with the correct parent. Then only at the very end do you have to do any merges.

A merge in TFS is not so clever. Basically all TFS does is figure out every file that changed on either side, and do a 3-way merge on each in turn, resulting in a new changeset. The upside of this is we can select a single changeset, ignore all the changesets around it, and do a merge.

In Mercurial, you can't pick just one changeset and merge. You have to merge all the changesets before it too because that's the definition of how a merge works in Mercurial. The upside of this is that all the changesets are preserved.

For example, say you want to know who added a certain file. In mercurial, you'll be able to figure this out regardless of what "branch" (cloned repository) it was added in. In TFS, you're screwed because the file will be added in a "merge" changeset. The merge may not have been done by the same person who added the file (in fact, it usually wont be), so to find out who added it, you have to manually follow the branches and inspect the history on each in turn. The same is true (and worse, actually) if you want to know who updated a line in a certain file.

Sadly for me, we're constantly "de-tangling" our changes by doing merges by changeset. But lets think about that for a minute. Is merging a single changeset even a sane thing to do? It turns out, not so much, because its possible for this to result in a broken state. Here's how:
  1. Joe Bob adds a new file "hippo.cs" and updates the C# project file
  2. Joe Smith adds a different new file "giraffe.cs" and updates the C# project file
  3. Joe Smith merges his changeset and ONLY his changeset up
  4. The result of the merge does not compile. The error is, "Can not find file "hippo.cs"
WTF!? Why is it looking for hippo.cs?! We didn't merge that changeset! How does it even know about hippo.cs? Its because Joe Bob and Joe Smith both changed the C# project file. When Joe Smith changed it, Joe Bob's changes were already in it. So Joe Smith's final C# project file includes Joe Bob's "hippo.cs" file. But when Joe Smith did a merge, he didn't include Joe Bob's changeset, so the hippo.cs file didn't get merged. And now you're broken.

This happens like all the freaking time with project files (which are the bane of branches and merges). But fortunately it's easy to fix. Just remove the missing files from the project file. But I think you can probably see that if this happened to any file other than the project file, like a real source file, you'd be in a world of hurt.

I'm actually STUNNED, given how much effort TFS puts into protecting the users from themselves that it allows you to merge selected changesets in this way! But it does. And its a feature that Mercurial just can't match, even if it is a feature that can lead to trouble. But maybe that's a good thing.

13 comments:

  1. git has a cherry-pick command which does exactly this - apply one commit from another branch.

    ReplyDelete
  2. @Gabe Cool, thanks. I looked at it and from what I can tell it looks like it's basically automating the import/export process. Which is really all TFS does with its standard merge anyway. Mercurial does not seem to have a similar shortcut command.

    ReplyDelete
  3. Is the fact that Mercurial doesn't support this feature in TFS such a big problem? I think the problem is actually in trying to use TFS workflows with a different tool.

    If I were in the situation where I had a cloned repository that I was working on a new feature for, but also had to fix a bug, I'd clone a repository just for the bug fix. This would result in tons of "branches," but since they're rather cheap with Mercurial, does that really matter? That at least feels more correct to me.

    I understand that there might be a situation, though, where you need to apply something buried in the changesets in Mercurial, and that an import/export would be required. ( At least occasionally. ) Using tons of clones might help prevent that need.

    ReplyDelete
  4. @Toby You're absolutely right, and that's the recommended development style for Mercurial: http://www.selenic.com/mercurial/wiki/index.cgi/WorkingPractices

    But like you pointed out, occasionally you'll forget, or not think you need a new repo, etc, and that's when you'll need to import/export

    ReplyDelete
  5. Interesting article. I have never worked with Mercurial or TFS.

    I really like the Git workfow for creating fast/cheap branches for each feature/bug. If your in a situation where you can break your development workflow into small focused user stories, the branches don't get too large before they are merged back in.

    The problem I am still trying to figure out, is a good way to managing ever-changing customer requirements for what is going to go into a release/deliverable. If we have separate branches/change-sets for different features but some build on eachother ... and then the customer requests to push A, B and D without C. Well, then your stuck b/c C was built and tested with A and B already done. Even if C isn't dependent on A and B, it hasn't been tested without the other two.

    ReplyDelete
  6. The command you're looking for in Mercurial to do this is "transplant." It's a standard extension distributed with Mercurial, but you have to enable it in your hgrc.

    hg transplant --merge -r 123 128

    will merge revisions 123 and 128 (and only those revisions) into the current branch.

    ReplyDelete
  7. @SER I looked at that transplant extension. It appears that that command actually modifies the history. I don't want to modify the history, I just want to get some changes and check them in as the new tip (like the git cherry-pick command).

    @Ben I think when that situation arises, the only thing you can do is say "NO! It's too late now, you have to wait for C."

    ReplyDelete
  8. I don't think that this is a problem with Mercurial, but with the workflow being considered.

    I'm currently getting to grips with the new workflows needed for mercurial myself, but this is what I would do:

    If B contains both a bug fix and new functionality, then you need to split these up.

    The best way to fix a bug is to clone the repoository, update the clone to the revision where the bug was introduced, fix it there and commit that change, introducing a new head.

    Now, you have a branch which is common to both left and right branches, so you can easily merge it into either.

    If you add these merges to your existing ones, you get the following:

    C Fixed3
    |\/ \
    |/\ \
    3 4 \
    | | \
    | FixedB |
    | | | /
    2 B | /
    | / |/
    |/ /
    1 /
    | /
    A Fixed0
    |/
    0

    Sorry about the ascii representation, but it's the best I could manage. *8')

    Here you can see that because the bug fix is before the common ancestor to the left and right branches, you can can happily merge in the bug fix to either branch, even before you are ready to merge the right branch into the left.

    Take care,

    Mark..........

    ReplyDelete
  9. Thanks for the comment Mark. I totally agree with you.

    Creating a new branch (possibly at an earlier revision if needed) is the right way to go. Mercurial even recommends creating a branch for each distinct piece of work you may need to do.

    As long as you're very disciplined and always remember to do this, you'll be fine.

    Unfortunately, you can't count on everyone to be disciplined all the time. That's when the patch approach may be necessary. But like I concluded in the post, I think Mercurial actually gets this right. If someone messes up, you have a way out: create a patch. This may not be easy, but you messed up, so it's not important that it be easy.

    ReplyDelete
  10. A tool will never replace the developer's good sense. As long as a human is involved, you will always have to deal with PEBKAC. And import/export provides a method to deal with that category of error. ;-D

    ReplyDelete
  11. Is it for sure that the result of the merge is as described (i.e. including a reference to hippo.cs in the project-file)?

    It contradicts e.g. the description on http://weblogs.asp.net/dmckinstry/archive/2006/07/03/Understanding-ChangeSets-and-Merge-with-Team-Foundation-Server.aspx and what I'd 'hope' Team System would (sanely) do.

    ReplyDelete
  12. Steffen,
    Yes, it's for sure.

    The article you've linked makes it sound like TFS is smart enough to only merge the changes made in the specific changeset you're merging, and not any changes made in previous changesets. Unfortunately, that's simply not how it works.

    Maybe TFS DOES track only the differences under the covers, but when it does a merge it clearly doesn't use that information. It just does a "dump" compare between the two files.

    That's why you get this behavior. It's easy to try it out. Just:
    1. create a new project
    2. branch it
    3. add a new file & check in
    4. add another file and check in
    5. merge only the last changeset

    You'll see the prj file includes both files, but the first file is missing on the drive.

    ReplyDelete
  13. Transplant doesn't modify history, it does exactly what you want, it merges a copy of a specific changeset from one branch to another.

    ReplyDelete