Monday, March 19, 2012

Rebasing Is For Liars

Rebasing is very popular in the Git community.  Of course, I'm primarily a Mercurial guy, but we have rebasing too.  It's a built in extension, you just have to turn it on.

What is rebasing?  The typical scenario goes something like this:
  1. You make changes and commit one or more changesets
  2. Meanwhile, other people have committed changes
  3. You pull down those changes
  4. But instead of merging, you rebase
  5. Which detaches your changes from history and reapplies them after the changes you pulled in
People like this because it keeps the history linear, avoiding "merge bubbles."  And certainly linear history is much easier to understand.  

But I have a problem with rebasing: it's lying.  Understanding the context that changes were made in can be very useful, but rebasing rewrites the history, changing the parent pointers, and thereby changing the context.  Lying about what the code looked like when you changed it.

That said, I still use rebase.  But only when my changes are small or inconsequential and I know that the consequences of lying about what the code looked like when I made those changes wont matter at all.  And in those cases, it's nice to reorder the history to be sequential because it does limit the conceptual overhead of understanding those kinds of changes.  But in general, I prefer to see the merges simply because it accurately represents what really happened.

4 comments:

  1. I love linear history. In my usage patterns, if I think context is important, a feature branch is the cure. If the branch is important, give it a name. Don't force it to be anonymous.

    ReplyDelete
  2. At first I agreed. Then I didn't. Now, I do again, but with a small caveat ...

    The term "rebase" is overloaded in many posts to mean the specific command in git, a placeholder for the capability to change "history", and a branching/integration strategy that leverages the prior meanings to some end. It might be nice to discuss them independently.

    The command: Other than the changing semantics when using --onto being frustrating to remember, I have no opinion.

    The capability: This has never bothered me at a "lying about what happened" level but I can easily see how a single incident could knock over that line in the sand. I don't have experience with Hg (other than your great talk on why Github is better than it ;-) ) but using Git everyday I tend to look at commits more from the point of view as individuals with a pedigree of sorts. Moving individual clumps of codebase deltas and metadata around at will is a power that is infrequently invaluable and the rest of the time unneeded. Working on teams with rapid pair/story switching and parallel work has pushed me to think of it less as history and more as the steps needed to get my codebase to a particular point.

    The branching/merging/integration patter: For a brief time I really enjoyed leveraging a branch and merging strategy that left me with "perfect" branching between levels (several stories are branched off of an MMF branch which is branched off of a demo branch which is branched off of production or release branch. Over time (short) I felt that the cost of maintaining this perfect visualization of our larger team process was not in line with the value. We still have the same team process for managing in progress and done work at various levels, but have changed to a strict merging (git command with --no-ff and messages describing the reason for the merge; e.g. finished:peer-reviewed by __ OR demoed on mm/dd/yyyy) but if you look at our code base it is often a mess of merge lines. Once you get used to it, the 0 cost of merging up and down to get the right set of code changes outweighed the need for a pretty visualization.

    my 2 cents

    ReplyDelete
  3. Ben: I'm only referring to the technique of modifying history by changing the parent of one of your commits. It's lying because you are saying "this is what I wish the world had looked like when I wrote this code", instead of leaving it alone to accurately reflect what the world really did look like. I don't mean to say that it is inherently bad, but it is lying.

    I like your comment about thinking of the changesets as deltas that you apply to get you from one place to another. That mind set has an interesting implication on the naming of changeset commits:

    "adds feature xyz" vs. "added feature xyz"

    Instead of describing what YOU did to the code, you describe what the CHANGESET will do when applied. I like that. I bet that style of thinking would encourage that deliberate change mindset.

    ReplyDelete
  4. Fair enough.

    My tolerance for lying would also be impacted by the size* of the commits in question. Moving around smaller focused commits that have good messages describing changeset delta intentions (like you mentioned) may be less prone to being time relevant and therefore less painful to lie about.

    *size being some magical mental combination of factors including true textual delta size (lines changes), number of files, number of concepts and responsibilities, etc. This is obviously driven not only by development style, but team process, the task at hand, experience, and the tech stack being used (big IDE based tech stacks often inflate the number of touch points for small concepts).

    ReplyDelete