Subversion – the Void

I’ve just restarted working on Compressed Pristines feature for Subversion (a separate post on that later). I thought it’d be interesting to first have a quick summary of getting Subversion built from the sources in as little effort as possible.

Prerequisites

Ubuntu doesn’t come preinstalled with all the necessary development tools. So first we need to install all the tools necessary to build Subversion. One easy trick is to ask apt-get to do all the hard work for us. But first, let’s make sure we have an up-to-date system:
sudo apt-get update sudo apt-get upgrade

Now, let’s get all the relevant dependencies:
sudo apt-get build-dep subversion

On a clean install of Ubuntu 12.10 64-bit the above installs 158 new packages. These include build-essential, autoconf and autotools-dev. The majority of the remaining are libraries, languages and frameworks that Subversion depends on one way or another. Apache, for example, is necessary to build the server modules. Similarly, Python and Ruby bindings depend on their respective systems.

After downloading, unpacking and installing a few 100 MBs, we’re finally ready.

Getting the Code

Since we need Subversion to checkout Subversion sources, we will download the source archive instead. The project download page lists recommended release download links. From there, we get the tar.gz or tar.bz2 of our choice. This time around, it’s 1.7.7 for me.

wget http://apache.mirror.nexicom.net/subversion/subversion-1.7.7.tar.bz2 tar -xjvf subversion-1.7.7.tar.bz2 cd subversion-1.7.7

It’s best to do this in /usr/local/src as that’s the “default” folder where user-generated sources should go. For convenience, we can ln -s -t $HOME/ /usr/local/src to create a symbolic link in our home directory called src that really points to /usr/local/src. One inconvenience about /usr/local/src is that we must be superuser to write to it.

Configuring the Source

Once we have the source code extracted, we need to prepare it. Since Subversion is part of the Apache Software Foundation, it naturally uses the APR library (Apache Portable Runtime) which has been the cornerstone of Apache server and many other high-availability projects. To get all core libraries that Subversion depends on to build, there is a very convenient shell script in the root of the sources aptly named get-deps.sh. Once the APR and Surf and other libs are downloaded and extracted, we need prep them for configuration. The following will do all that with much simplicity.

./get-deps.sh cd apr/; ./buildconf; cd .. cd apr-util/; ./buildconf; cd .. cd apr-util/xml/expat/; ./buildconf.sh; cd ../../.. ./autogen.sh

Now we can finally configure.

./configure

All went as planned, except I got this harmless warning:

configure: WARNING: we have configured without BDB filesystem support

You don't seem to have Berkeley DB version 4.0.14 or newer
installed and linked to APR-UTIL.  We have created Makefile which will build
Subversion without support for the Berkeley DB back-end.  You can find the
latest version of Berkeley DB here:
  http://www.oracle.com/technology/software/products/berkeley-db/index.html

Build and Install

And finally:

make && make check

And if all goes well, the last line should read:

SUMMARY: All tests successful.

Now, let’s install it our shiny Subversion build:

Alternative/Minimalistic Configure

If we don’t want or need all those dependencies, and would rather settle for a fast build of a client-only, without Apache and whatnot, here is a trimmed-down config, make and check that gives a minimalistic working client.

./configure --disable-mod-activation --without-gssapi --without-apxs --without-berkeley-db --without-serf --without-swig --without-ctypesgen --without-kwallet --without-gnome-keyring --disable-javahl --disable-keychain -C

make mkdir-init external-all fsmod-lib ramod-lib lib bin

make check TESTS="`echo subversion/tests/cmdline/{basic_tests.py,merge_tests.py}`"

sudo make install

$ svn --version
svn, version 1.7.7 (r1393599)
   compiled Nov  3 2012, 13:53:08

Copyright (C) 2012 The Apache Software Foundation.
This software consists of contributions made by many people; see the NOTICE
file for more information.
Subversion is open source software, see http://subversion.apache.org/

The following repository access (RA) modules are available:

* ra_neon : Module for accessing a repository via WebDAV protocol using Neon.
  - handles 'http' scheme
  - handles 'https' scheme
* ra_svn : Module for accessing a repository using the svn network protocol.
  - with Cyrus SASL authentication
  - handles 'svn' scheme
* ra_local : Module for accessing a repository on local disk.
  - handles 'file' scheme
* ra_serf : Module for accessing a repository via WebDAV protocol using serf.
  - handles 'http' scheme
  - handles 'https' scheme

Summary

Building Subversion can be a real pain, with all the dependencies and configuration options. However, with the right steps, it’s actually a breeze. To summarize, here the full script of commands as we executed them in one chunk:

sudo apt-get update
sudo apt-get upgrade

sudo apt-get build-dep subversion

wget http://apache.mirror.nexicom.net/subversion/subversion-1.7.7.tar.bz2
tar -xjvf subversion-1.7.7.tar.bz2
cd subversion-1.7.7

./get-deps.sh
cd apr/; ./buildconf; cd ..
cd apr-util/; ./buildconf; cd ..
cd apr-util/xml/expat/; ./buildconf.sh; cd ../../..
./autogen.sh

./configure
make && make check
sudo make install

Every so often a piece of technology comes along and changes everything. Once we experience this new way of doing things, we can no longer understand how we survived without it. After we sent our very first emails, walking to the post office to drop mail seemed unearthly. And who’d replace an IDE with a text-editor?

Image via Wikipedia

Git ¹ didn’t seem the answer to my needs. I’ve been using Subversion (SVN) since 2006 and I’ve been a very happy camper indeed. Before that I used CVS and, although inexperienced with Version Control Systems (VCS), it was a major improvement over MS Source Safe (which I had used for almost 6 years before that.) I use SVN at home and at work. I’ve grown used and dependent on version control so much that I use SVN for my documents and other files, not just code. But Git? Why would I need Git?

When Git came to the scene there were already some Distributed VCS (DVCS) around (as opposed to centralized VCS, such as CVS and SVN.) But Linus made an impression with his Google Talk. I wanted to try this new piece of technology regardless of my needs. It was just too tasty to pass up. At the first opportunity, I installed the core tools and Git Extensions to ease my way with some visual feedback (I learn by reading and experimenting.)

Now that I’ve played around with Git for a while, and I’ve successfully moved some of my projects from SVN to Git, I can share my experience. Here is why I use Git even when not working with a team (where it’s infinitely more useful.)

Commit Often, Commit Many

Commits with half a dozen of -unrelated- changes is no stranger to us. A developer might add some new function, refactor another and rename an interface member all in the same change-set. This is counter-productive, because reviewing such unrelated code-change is artificially made more difficult than necessary. But, if the review unit is the commit unit, then developers combine multiple changes to reduce overhead and push them onto their colleagues. This is unfortunate, because the code should evolve in the best way possible, uninfluenced by unrelated artificial forces, such as tooling nuances. But more than reviewing, combined commits cause much headache and lost productivity when we need to go back in time and find a specific line of code, rollback or merge. But what if the changes were related? What if we need to make a few KLOCs of change for the code to even build successfully? The centralized VCS would recommend a branch. But unless the sub-project is long-term, branching is yet another overhead that developers try to avoid.

With Git, these problems are no more, thanks to local commits. With local commits, one can (and should) commit as often as possible. The change log no longer is anything more than a single sentence. The changes aren’t reflected anywhere, until we decide to push the changes onto the server. There is no longer a distinction between major changes and minor changes. All changes can be subdivided as much as necessary. No longer does one need to do local backups², create personal branches or make every change visible company-wide or publically. Local commits are full-fledged VCS that doesn’t introduce new or extra work. When we’re done, we just update the repository in one push command.

If you need to keep some piece of code around, but do not wish to send it for review and commit, you need to copy it somewhere. With local commits, you can indeed commit it, with relevant commit-log. In a subsequent change-set, you can delete it, with full guarantee that you can retrieve it from Git later. Since this is done locally, no one is complaining and no one needs to review it. The code will be forever preserved in the repository when we push it. Later when we resurrect it, it will be reviewed as it becomes part of the current code. Indeed, with local commits, you can experiment with much freedom, with both the advantage of version-control and the subsequent repository preservation of your bits for posterity.

Notice that all this applies equally-well to private projects, single-developer public projects and multi-developer projects. The organizational advantages are only more valuable the more the participants.

Easy Merging

Even with local commits, sooner or later we’ll need to branch off and work on a parallel line of code. And if our project is useful to anyone, the branches will diverge faster than you can checkout. Merging code is the currency of branching. Anyone who’s tried merging should know this is more often than not painful. This is typically because what’s being merged are the tips/heads of the branches in question. These two incarnations of our code are increasingly more difficult to reconcile the more changes they had experienced in their separated lives.

But any VCS by definition has full history, which can be leveraged to improve merging. So why is this a Git advantage? Git has two things going for it. First and foremost, it has full history locally. That’s right. Your working-copy (WC) is not a copy of what you checked-out, rather it’s a clone of the repository. So while centralized VCS can take advantage of the repository’s history, for Git this information is readily in your WC. The second is that with local commits, the commit unit is typically very small, this helps merging quite a bit, as it can have higher confidence regarding where the lines moved and what was changed into what.

Overall, merging with Git is otherworldly. So far, no centralized VCS can even match the accuracy of Git’s merge output.

Explicit Exclusion

With Source Safe, CVS and SVN it’s not rare to get broken builds because of missing files. After some point in a project’s life, adding new files takes a sporadic pattern. It’s common to forget to add the new files under the VCS, only to be reminded by colleagues and broken build emails to the humiliation of the developer who missed the files, of course. If reviews are mandatory, then fixing this error involves at least another developer, who need to sign-off the new patch for committing.

This problem arises from the fact that with these traditional, centralized VCSs, files are excluded implicitly (by default) and they are opted-in when necessary. With Git, the opposite is the case: everything under the root is included by default, exclusion is the exception. This sounds very trivial, but the consequences are anything but. Not only does this save time and avoid embarrassing mistakes, but it’s also more natural. Virtually always a file within the project tree is a file necessary for the project. The exceptions are few indeed. If you think about it, most of the exceptions are files generated by tools. These are excluded by file extension and folder names in the ignore file (.gitignore for Git.) Rarely do we add any files that shouldn’t be stored and tracked by the VCS. If it’s not automatically generated during build, then it should be in the repository.

Conclusion

Git is a paradigm shift in version-control. It’s not just a set of new features, it’s a different way of organizing change-sets, and by extension writing code. Git gives us better automation and tooling, at the same time it encourages us to employ healthy and useful practices. In fact, the features outlined above, do make a good use of the distributed architecture of Git. So it’s not a coincidence that it’s so much useful even for the single-user project.

If you’re using SVN, consider copying it over to Git using git-svn and playing around. Git can synchronize with the SVN, until you decide to abandon one or the other. In addition, GitHub is a great online repository. As a learning tool, consider forking any of the countless projects and play around.

¹ Git has no exclusive monopoly on the discussed advantages, however I’m reviewing my experience with Git in particular. Hg, Bazaar and others will have to wait for another time.
² Here I’m concerned with code back up that we don’t want to discard yet, but don’t want to commit either. Data backup is still necessary.