Monotone analysis

Tue Jul 8 16:17:52 EDT 2008

Hi everybody,

This is an attempt to shed light into some deficiencies of Monotone and how
other DSCMs do the same things. I'm not attempting to make you switch your
DSCM, just present my findings.

I order to do that I'll explain what monotone is and how it works internally.

Some people argue that in a good SCM you don't need to know the details of the
implementation, but quite often understanding the details helps to understand
why some operations are not so efficient and will never be.

= Overview =

Monotone is a DSCM that has some time already, and it's distinctive because it
values correctness over optimization.

Despite of its long time of existence, very few projects use it. The most
important ones are OpenEmbedded and Pidgin.

= Certs =

In order to understand most of the issues in Monotone you need to understand
what a cert is.

A cert is basically metadata signed by some person. It could be anything, it's
a key string, and a string value, that's it. A person can set multiple keys
with different values, and two persons can set the same keys.

Some common certs are: author, date, branch, changelog, tag, comment.

== Branches ==

Branches are just certs. That means that a revision can have a multitude of
branches, according to different developers.

For example, John can say cert 476eb8 is in branch im.john, while Leo can say
it's in im.experimental.

That also means ce5b79 can also be on im.experimental and be completely
unrelated to 476eb8.

This is important; completely unrelated revisions can be on the same branch.
This design affects performance, as there is no sane way to traverse a branch
but to search through the whole repository for the commits. And still, it
would be tricky.

Even if you manage to get all the revisions of a branch, how to represent them
can be tricky because of this deisgn.

== Dates ==

A similar thing happens with commits between certain date ranges.
Since dates are
just certs, finding revisions between certain ranges requires cheking all the
revisions.

It's not efficient.

== Changelog ==

Another curious thing aobut certs is that you can actually set many different
changelogs. Each developer can set a different message for the same revision.

It's confusing, and annoying.

== Solution ==

In other DSCMS (git, bzr, hg) a revision is not complete without commiter,
author, date, and message. So you can't have more than one of these
values or the
revision would be different.

This makes things easier.

= Storage =

The way each revision is stored is crucial; it defines how many operations
behave.

In Monotone, revisions are stored as deltas from the previous one.

This means operations like checkout take a lot of time, specially as revisions
accumulate. In order to get the last revision you need to get all the previous
ones.

Also, if you try to compare (diff) two revisions that are very far away, you
would have to accumulate the differences of all the revisions in the middle;
it's also expensive.

== Solution ==

In other DSCMS (git, bzr) revisions are stored individually, and if they are
similar they are compressed. That achieves the same goal as just storing
deltas; saving space. But also achieves the goal of efficiency on different
operations.

= Workflow =

Most DSCMS base on the fact that branches are cheap. That allows for a branch
for each developer, topic branches and temporal branches.

== Bazaar ==

In Bazaar each branch is developed separately. That means there's only one
head on each branch, and each developer has it's own branch, and often,
multiple branches.

So each branch is simple; you can traverse it by looking at the parents of the
head of the branch.

Creating and destroying branches is as easy as creating and destroying
directories.

More over, thanks to bzr's extensibility it's possible to have temporal
branches and use them properly with the rebase plugin.

== Git ==

In Git, each branch has only one head. So it's similar to bzr; branches are
simple.

A repository can have a multitude of branches which are simply points to a
commit. That means you can have as much branchs as you like, some only local,
some only remote, and some linked.

= Popularity =

The SCM used by a project is a key issue. It restricts, or allows different
workflows, and makes it easier or harder for new contributors to get started.

The more popular the SCM of choice is, the higher the chances of a new
contributor already familiar with it, and so, the more chances of getting new
contributors up to speed quickly.

Also, the more popular the SCM, the more tools available to it. For example;
GUIs, web services, hosting facilities, project tracking plugins (Trac), etc.

The more popular DSCMs right now are git and bzr, with hg not so far away.
Choosing anything else doesn't seem to be a wise choice.

= Git recommendation =

In some conversations in #pidgin some developers (elb, deryini) made apparent
their dislike for git. Apparently some people think that only the Linux Kernel
guys are using git, and they are doing things the wrong way. Linux being
probably the most successfull open source software project, I find that
arrogant and unfounded; their workflow is much more efficient than anybody
else's.

They achieve 3.7 commits per hour, and always stable; that wouldn't be possible
without a decent DSCM.

Perhaps these projects are wrong too[1]:

 * Linux Kernel
 * X.org
 * Debian (build-tools, debpool, pbuilder)
 * Fedora
 * GNU autoconf, automake, core-utils
 * Compiz Fusion
 * PackageKit
 * ALSA
 * Cairo
 * HAL
 * D-Bus
 * Mesa3D
 * OLPC
 * Wine
 * fluxbox
 * Yum
 * Ruby on Rails
 * Samba
 * KVM
 * VLC
 * x264

And recently PulseAudio, and GStreamer is on the way too.

Both KDE and GNOME have git clones of their svn repositories, and are in the
discussions about moving to a DSCM with git as the strongest option.

The only big issue with git in the past was Windows support, but that's not an
issue anymore with msysgit[2].

There are many GUIs, web services, hosting sites, a plugin for Trac, screen
casts, blogs, tutorials, etc. There is a reason for that; git is awesome.

I'm not interested in silly discussions; if you have a doubt about git, I'll
try to answer it.

Best regards.

[1] http://git.or.cz/gitwiki/GitProjects
[2] http://code.google.com/p/msysgit/

-- 
Felipe Contreras