Mercurial vs Git

Thu Jan 20 05:35:26 EST 2011

On Thu, Jan 20, 2011 at 5:56 AM, John Bailey <rekkanoryo at rekkanoryo.org> wrote:
> On 01/19/2011 06:17 PM, Felipe Contreras wrote:
>> This is why it's possible in the linux kernel to have hundreds of maintainer
>> repositories, each one with dozens of branches and patches coming from the
>> branches of even more developers. Branches are merely a utility to identify
>> commits.
>
> That's great for projects that operate in that manner, like the kernel.  For our
> practices, I think a more "traditional" branching model where a branch is more
> than just a floating reference is more appropriate.

Why? How exactly would a floating reference affect your project negatively?

> Now, for some clarification, in the branch flow you described there, you get
> nearly identical behavior with monotone if Richard's branch is not included in
> the branch pattern pushed to a different database.  You end up with the
> revisions because they're needed as part of the history, but you don't get the
> branch certs because the branch's name didn't match the push pattern.  For a
> concrete example, if you look carefully in our monotone database, you'll find a
> number of revisions from the maemo guy that have no branch certs.

And that demonstrates one of the many design inconsistencies about this model.

 1) commits can be part of no branch, even though they are parents of
a commit in the main branch

A
| \
B C

A (master), B (master), C (nowhere)

 2) branches can follow weird patterns

A (master)
|
B (test)
|
C (master)
|
D (test)

 3) branches can be scattered

A B *
| /
C  F *
| /
D E
| /
G *

B, F, G = (test)

The only thing that prevents those inconsistencies is the front-end
app (does it?). With the git model, those things are impossible.

And you would see from which branch the commits came from:
Merge branch 'rlaager-foo'
https://github.com/ecoffey/pidgin-illustration/commit/6fc76c5bc9ac0929f7fd1e2e2d2fcb2840d394e1

>> I haven't seen anyone discuss this fact, except John Bailey who called the git
>> implementation 'braindead' without going into any details.
>
> Perhaps "braindead" was a poor choice of words.  But I don't like the idea of
> the "branch" being a floating reference to the head of a portion of the revision
> graph.
>
> Our branches generally are larger feature branches, where (in my opinion) it's
> more useful to always have those revisions tied to a given branch.  It's not the
> immutability of the branch that I'm going for here, more the fact that *each*
> revision on the branch is *explicitly* on the branch, not implicitly by walking
> the history graph if the branch ref is present.

Yes, but why? The git model would represent exactly the same thing,
unless you want some representation in which not all the commits
ancestor of a branch head are part of the branch, in which case it's
not what people call a branch, but more like "commit labels", and I
still don't see why you would prefer those.

>> == graft points ==
>>
>> I saw a proposal from John Bailey to have a structure like:
>>
>> libgnt
>> libpurple
>> finch
>>  |-- libgnt
>>  |-- libpurple
>>  `-- po
>> pidgin
>>  |-- libpurple
>>  `-- po
>> po
>>
>> The disadvantage, he claimed, is that the history would have to start from zero
>> since otherwise each one of these repos would have to be 215 MB.
>
> Note that this wasn't directly a proposal.  It was mostly intended as an
> illustration in response to Sadrul's question about how the subrepos worked.
>
> The size of each repo being 215 MB was assuming we wanted to retain history in
> *all* the repos, which I admittedly should have made clear in the first place.
> For some sections of the tree, for example the 'po' directory, I'm not sure the
> history has any intrinsic value.  Where the history doesn't have enough value
> for us to care about, throwing the history away and starting over is trivial and
> would reduce the size of the collection of repos.

It was clear to me, that's why I said it would have to start from
zero, or the size would be 215 MB.

> Also, I'm not an expert on the 'hg convert' tool, so it's possible there's
> additional functionality therein that I don't know about that could assist in,
> or completely handle, slicing and dicing the history appropriately.

I don't know how that's possible, or beneficial. If you go back to say
v2.7.9, you would want to see what is currently in mtn, not some
sliced uncompilable collection of files.

>> Not with git.
>>
>> My proposal would be the following. Convert the whole history to git, then,
>> make a new release, say 2.8 where the directories are split (libpurple, pidgin,
>> etc.). That repository would be the legacy one, then, start new separate
>> repositories that start from scratch.
>
> As a concept, this particular proposal has merit.  A lot of merit, in my
> opinion.  I'd propose that if we want to move away from the single-repo model we
> make 2.8.x the end of the line for both one-repo and 2.x.  Then for 3.0.0 we
> create whatever arrangement of repositories we feel is appropriate (I'd say
> libpurple, libgnt, finch, pidgin, and po if we don't split the po's into each
> repo) with all the subrepo glue we want/need and start anew, referring to the
> old repo if we need history.
>
>> Then, for the people that want to have the full history, they can setup a
>> graft-point[3] and voilá; the full history would be available on each one of the
>> repos.
>
> This is an interesting feature that some may find useful.  That said, the Adium
> guys have done pretty well without such functionality in their new "each branch
> is a repo" model.  It does make me wonder how difficult writing such an
> extension for mercurial would be, though.

No idea, perhaps there is already one, but my intuition tells me it's
not possible with an extension, the core would have to support this.

>> In conclusion, everything that can be done with Mercurial can be done with
>> Git, but not the other way around (at least I'm pretty sure about the
>> branches). Therefore, the sensible choice IMO is Git.
>
> Perhaps git seems reasonable to you.  Whether it's reasonable for us or not
> remains to be seen.  The recent votes to come in are making our decision lean
> toward hg.  If the consensus is to move to hg, that's where we'll go.  Likewise,
> if the consensus is git, that's where we'll go, as much as it displeases me.

Sure, I haven't plotted the votes closely, but from what I recall most
of the votes where git/hg (slightly favoring one or the other), or
don't care. I don't recall anyone saying definitely no to git.

Cheers.

-- 
Felipe Contreras