Monotone analysis

Felipe Contreras felipe.contreras at gmail.com
Tue Jul 8 20:32:06 EDT 2008


On Wed, Jul 9, 2008 at 1:56 AM, Richard Laager <rlaager at wiktel.com> wrote:
> On Tue, 2008-07-08 at 23:17 +0300, Felipe Contreras wrote:
>> A cert is basically metadata signed by some person. It could be anything, it's
>> a key string, and a string value, that's it.
>
> This opens up some really interesting workflow possibilities that
> wouldn't exist otherwise. That said, I'll stipulate that nobody seems to
> be interested in doing these things in the real world right now, which
> is really sad.
>
> I have a great example of where this would be useful, though: Right now,
> the kernel developers are doing Signed-Off-By markers (and the like) in
> commits. Monotone has built-in support for this (with real crypto
> backing it, even). It's also possible for someone like Linus to
> configure Monotone to ignore all revisions that aren't signed by his
> Lieutenants.

I don't see how can revisions be "ignored", but Linus can already do
that by grepping the logs.

There's no need for integrated cryptography because Linus is
responsible for verifying the identity of his lieutenants. It could be
by ssh, a patch sent through email and verified with PGP. It's a
network of trust, as Linus points out; the only way to do real
security.

>> there is no sane way to traverse a branch
>> but to search through the whole repository for the commits.
>
> I don't know what you mean by "traverse a branch".

Go from the head of the branch to the tails, commit by commit.

In git you have one head commit, and the you just visit the parents recursively.

>> A similar thing happens with commits between certain date ranges.
>> Since dates are
>> just certs, finding revisions between certain ranges requires cheking all the
>> revisions.
>
> It involves checking all the certs, which can be indexed (it's a SQL
> database, after all): SELECT * FROM certs WHERE ...

Exactly, but then if you want to get the commit message the query must
be different, and specially since you can have more than one commit
message.

>> It's not efficient.
>
> I'm curious how this looks in Git.

It's the same, except you traverse only the branch (not all the
commits), and all the relevant information is readily available.

>> Another curious thing aobut certs is that you can actually set many different
>> changelogs. Each developer can set a different message for the same revision.
>>
>> It's confusing, and annoying.
>
> This is an excellent feature. It allows you to go back and add more
> information after the fact.

Or remove information, so the integrity is never really ensured.

In git you can also alter the information, but if you do that, then
it's effectively creating new commits, which is exactly what you would
want. Otherwise how you know that you are looking at the same
information when the sha1s are the same but the information might not?

>> In other DSCMS (git, bzr, hg) a revision is not complete without commiter,
>> author, date, and message. So you can't have more than one of these
>> values or the
>> revision would be different.
>
> How does git deal with this case?
>
> Imagine a branch with two heads.
> A
> |\
> | \
> |  \
> |  |
> B  C

There is only one head per branch.

> Two people each merge them:
> A
> |\
> | \
> |  \
> |  |
> B  C
>  \ /
>  D
>
> Then they push their changes. In Monotone, they both created the same
> revision D with a set of certs, so this collapses nicely. Does git lead
> to two new heads? (Or, since you can't have two heads in git, does it
> force a merge?)

Let's suppose that 'john' and 'steven' both apply a patch somebody
submitted. The two commits would have the same code, but the
information would be different (different committer, different date,
different message), so effectively these are two different commits.

These commits would be on different branches: 'john/master' and
'steven/master'. Assuming there's one central repository and both have
access to it, let's say 'steven' pushes the commit first, then when
'john' tries to do the same there will be a merge conflict, which
would be resolved, but would create an empty commit. Then 'john'
realizes his commit is no longer valid and just drops it.

In this case git is doing the right thing; telling exactly what is
happening, and letting you handle the conflicts.

>> In other DSCMS (git, bzr) revisions are stored individually, and if they are
>> similar they are compressed. That achieves the same goal as just storing
>> deltas; saving space. But also achieves the goal of efficiency on different
>> operations.
>
> This makes it more expensive to show a diff between two adjacent
> revisions, a common operation. That said, from what I hear, git's
> compression these days is SO GOOD that it achieves huge wins here.

Only if there are massive changes which is usually not the case, and
even then it's quite fast.

>> In Bazaar each branch is developed separately. That means there's only one
>> head on each branch
>
> Imagine a branch like this:
>
> A -> B -> C -> D
>
> If I have two checkouts from that and I make a different commit from
> each, what happens? In Monotone, I get two heads. Is BZR/git/hg/whatever
> going to stop me from committing the second change until I merge it up
> with the new head? If so, I think this is a fundamental design flaw
> because I can't easily preserve the state of the code BEFORE a risky
> merge.

Each checkout is a different branch, you can continue the development
as much as you want and merge whenever you want.

I think you are confused by what a 'commit' means in other DSCMs. In
montone, a local commit is going to spread across all repositories as
soon as you sync. In other DSCMs a local commit is just that; you
decide if you want to distribute it, or keep it local.

>> So each branch is simple; you can traverse it by looking at the parents of the
>> head of the branch.
>
> If this is your definition of traversing a branch, you can do the same
> thing in Monotone, so I don't understand your concern above.

No, in mtn a branch can have completely unrelated commits; not
connected by any sequence of commits. While it probably doesn't happen
in real life, the database design allows it.

So you don't traverse branches in mtn, you just dump all the commits in it.

>> In Git, each branch has only one head. So it's similar to bzr; branches are
>> simple.
>
> Same question as above. How do you prevent multiple heads from
> happening?

Multiple branches.

>> = Popularity =
>
>> The more popular DSCMs right now are git and bzr, with hg not so far away.
>> Choosing anything else doesn't seem to be a wise choice.
>
> Right. I think this is also a potentially compelling reason to switch.
>
> On Wed, 2008-07-09 at 01:05 +0300, Felipe Contreras wrote:
>> Keith Packard understood this, and chose git knowing that the
>> fundamentals where right [1].
>
> Half of that document explains why a DVCS is better than a CVCS, so
> that's not really a difference here.

The important point is that the internal design matters.

> Monotone has a much more flexible representation (with certs being
> generic). Here your argument cuts against you. If you want to do
> something new with git, you have to write support for another
> "column" (using SQL terminology) in your "table", whereas with Monotone,
> you just create the cert. For example, let's say you had Signed-Off-By
> (which you don't) and now you want to add Reviewed-By...
>
> "Files containing object data are never modified. Once written, every
> file is read-only from that point forward."
>
> s/Files containing/ and this applies to Monotone. In fact, that's where
> it gets its name. There's always the danger that the database could be
> corrupted. Basically, here we're comparing the potential for corrupting
> a single file database vs. a directory tree of files. In all reality, I
> don't think it's that big of a difference, but I do tend to trust ext3
> more than sqlite. Of course, with this being about *distributed* version
> control systems, you end up with backups all around the world regardless
> of which system you choose.

Except it's very easy to check the integrity in git: the sha1 ensures
you that the information hasn't been altered. All the dates,
committers, signed off, changelogs, are ok, if the sha1 is the same
that you had.

It's funny that git is better at doing what mtn was set to do, but
after all Linus got inspiration from it.

> However, if the files are write-once, then you must end up with at least
> one file per revision (except, perhaps for imports where you could batch
> up the existing revisions). This leads to the criticism being applied
> against SVN: "The FSFS backend places one file per revision in a single
> directory; a test import of Mozilla generated hundreds of thousands of
> files in this directory, causing performance to plummet as more
> revisions were imported." How does git do write-once files without
> having one file per revision?

That's one of the things that I personally find more clever about git;
each file has a calculated sha1, so if the if the file hasn't changed,
the sha1 is the same, so there's no need to store it again, and if you
do, nothing would change anyway.

It's like hardlinks, with sha1s.

>> In any case; the important thing as you say is the cost of switching.
>> I bet switching to mtn was painful, but that doesn't mean switching to
>> git would be
>
> Converting the repository from SVN was painful, which wouldn't be
> incurred again. Learning DVCS concepts was some work, which also
> wouldn't be duplicated.
>
> I don't think that switching would be all that difficult and I like the
> idea of choosing something that contributors would be more familiar with
> (as well as something that has good hosting opportunities so said
> contributors can really take advantage of the branch-sharing workflow).
> I think the really killer application here is something like Launchpad,
> with integrated code hosting, code review, bug tracking, etc.

Indeed, although Launchpad is closed source.

> If we decide to switch, I'd like to wait just a little longer before we
> do it to see how git vs. bzr is going to turn out so we don't end up
> switching twice.

Right, I think those two are sensible options, mtn is not.

-- 
Felipe Contreras


More information about the Devel mailing list