Up: Lectures

Version Control

This lecture introduces version control, which helps people collaborate, and keeps track of who did what when.

Requires: nothing

Introduces: version control

Motivating Questions:

  1. How can my colleagues and I edit shared documents safely and efficiently?
  2. How can I keep track of who made what changes to a document, and when, safely and reliably?
  3. What are the limitations of version control systems, and what are the alternatives?

Problem: Writing a Paper With Several Other People

Your lab has been working with groups at two other universities to analyze anomalies in the trajectories of deep space probes, and the time has come to write a paper summarizing your findings. The last one produced by this collaboration was 35 pages long, and had 15 authors, 20 figures, and 400 references. It took six weeks to write, half of which was spent tracking down and reconciling bits and pieces that had gone astray in email or been overwritten accidentally. Everyone would like to find a less painful way to get this one written; since you weren’t able to attend the organizational meeting, you’ve been put in charge of figuring out how.

Lectures:

  1. Introduction (pdf, ppt)
    • Version control systems
    • Why version control is useful
    • Shortcomings
  2. Basic Operation (pdf, ppt)
    • Checking out a repository
    • Viewing the change log
    • Adding, changing, and committing files
    • Updating
  3. Handling Conflicts (pdf, ppt)
    • Merging of differences
    • Manual conflict resolution
  4. Rolling Back (pdf, ppt)
    • Reversing local changes
    • Reversing committed changes
    • Overview of branching and merging
  5. The Command Line (pdf, ppt)
    • Why use the command line?
    • Updating and committing
    • Resolving conflicts
    • Undoing changes

Exercises:

Frequently Asked Questions

  1. I would like to work on several different computers. All have internet access, but none are “networked”. Where do I put my repository, and how do I access it? Should I put it on a portable hard drive and carry it around with me?The best thing to do is set up a repository on an external server, so that you can access it from any computer with an internet connection. You should talk to your go-to “computer person” at your university about getting a repository set up. They will provide you with a URL for your repository, and then you can access it the way we showed you in the Version Control lectures.

  1. Aronne
    September 20th, 2010 at 04:29 | #1

    Some comments from one of the ‘beta-testers’ at Hacker Within:

    Overall this was a nice clear explanation of how the process works. Trying to explain branching at the end might be too much to fit the short video – perhaps make that a separate video with a more detailed explanation? I’m not completely sold on using the GUI client for this, I personally do not use one (except for displaying diffs). The GUI is a bit hard to see at the video resolution. Also, two extra points I think are worth mentioning: First, svn revert can be destructive – if you revert changes that you actually did want, I think they are totally lost unless you have a file system backup that happened to save it at the right time. Second, a successful automatic merge doesn’t necessarily mean every thing is OK in a source code file – the two edits could conflict in terms of runtime errors, etc.

  2. October 4th, 2010 at 20:55 | #2

    I found this lecture clear and easy to follow (I knew SVN already). I am also using it on command line but I don’t know what’s best to show. A GUI may be useful for someone starting SVN. Someone using command line should be able to figure out how it translates from the GUI examples to the command line. May be useful to mention it and how to find command names and options.

  3. james
    October 29th, 2010 at 16:15 | #3

    I think there are good arguments for using version control even in isolation, when not sharing code with others, which might be important for converting a fair number of scientists to version control. Maybe this could be added to the motivation by someone more qualified than I?

  4. January 6th, 2011 at 14:55 | #4

    I agree with James – I used Subversion during my PhD work to store experimental data, papers and my thesis. I tend to use Git now as it is so easy to set up new repositories, share them without a centralized server and keep them in sync. Working in physics and chemistry it is sad to see how rarely version control is used, even in quite large software projects. Thanks for putting together these screencasts, I hope it encourages more people in science to use version control.

  5. Nick
    January 12th, 2011 at 03:46 | #5

    Marcus – Could you expand on how you used it? I am a lonely phd student writing code for computational fluid dynamics simulations. I have one computer, and I am the only one who uses it. I am puzzled as to where to put the repository. Is there some reason that it can’t be on my own local machine?

    p.s. did I miss somewhere in these lectures how to create the repository?

    I have never used version control before, but it seems that it would help me.

  6. Henry
    January 16th, 2011 at 04:43 | #6

    @Nick
    You can put the repository on your local machine. BUT I always keep my “personal” repository (which only I use) on a different offsite computer. This adds an additional level of backing up of my important code, papers, etc. There are numerous inexpensive server services out there within reach of even a grad student’s budget. You really don’t want to have to rewrite even part of your thesis code or the is itself if your building burns down with your computer in it!

  7. January 16th, 2011 at 14:32 | #7

    I completely agree with @Henry and I recommend Beanstalk for this sort of situation. They have a “trial” version, which is free forever for a single user, setting things up is trivially easy, and the web interface for looking through commits is a great added bonus.

  8. January 16th, 2011 at 14:34 | #8

    Oops, forgot the “http://”, here’s the correct link to Beanstalk

  9. Ben Racine
    March 16th, 2011 at 23:27 | #9

    I think that Git (or Mercurial) are worth mentioning. I think their lower overhead in setting up makes people more likely to start and keep using these tools. Just my two cents though, purely opinion.

    • Greg Wilson
      March 17th, 2011 at 14:31 | #10

      In our experience, Git and Mercurial are no easier for single-user setup (you can create a Subversion repository on your local hard drive with a single command using the ‘file://’ protocol), and novices seem to find their “eventual consistency” model harder to follow than the “master copy” model that Subversion and other systems use. One example of this is revision numbers: SVN gives people a strict linear ordering of events, while the hashes of distributed VCS’s don’t. I agree that Git/Hg/whatever have momentum on their side, and deliver benefits to experienced users, but by definition, our users aren’t (yet) experienced :-)