Teaching basic lab skills
for research computing

Fork, Merge, and Share

As George Pòlya said, sometimes the best way to solve a problem is to solve a more general one. In that spirit, this post was originally going to be about the mechanics of helping thousands of people a year (which is the first of our five-year goals). After getting feedback from a few people on early drafts, though, it has morphed into a discussion of something that I hope you'll find more interesting [1].

Let's start with version control. As this potted history explains, what really made version control invaluable wasn't its "infinite undo". Instead, it was the ability to merge things, which meant that many people could work independently and then bring what they'd done together when it made sense to do so. CVS was the first system built on this model, but its latest incarnations, like Mercurial and Git, have pushed the idea even further. With them, there is no "master copy"; instead, every copy is a peer of every other, so that anyone can merge with anyone at any time. Yes, it can be chaotic, but the last couple of years have proven that the benefits—particularly the increased freedom to tinker that this model supports—outweigh the risks.

GitHub is the poster child for this. Like SourceForge before it, GitHub allows anyone to create a repository for an open project. Crucially, though, it also makes it easy for people to clone projects, make changes, and then offer those changes back to the author of the original. This was always possible with earlier system, but GitHub has made it routine. And when I said "open project", I didn't just mean software: there are books being developed through GitHub as well. Admittedly, most are on technical topics, but there's no reason the model couldn't be used for other kinds of content [2].

Could it be used for learning materials? I.e., would it be possible to create a "GitHub for education"? Right now, I think the answer is "no", because today's learning content formats make merging hard. PowerPoint remains the tool (and format) most commonly used for individual lessons, but there aren't good open tools to merge PowerPoint files [3]. As a result, if someone takes the Software Carpentry lecture on regular expressions, moves a few slides around, and cleans up a few examples, it can take me almost as long to merge their changes back into my copy as it would take me to make those changes myself.

Shifting from micro to macro, the closest thing we have to a standard format for lessons is SCORM, but it's as clumsy and expensive to work with as SOAP. What's more, to the best of my knowledge there aren't any tools out there to help people find differences between two SCORM packages, much less merge them. And having the kind of metadata that's in SCORM really does matter if we want to reach lots of people. There's more to teaching that putting facts in front of people; when it's done well, teaching is about organizing those facts into a coherent narrative so that learners can see how the facts fit together. Using open source software as an analogy once again, learning plans are like architectural documentation; you don't have to have it, but people will find it a lot easier to understand, use, adapt, and contribute to your project if you do.

Whatever a "GitHub for education" would look like, it would not be yet another repository of open learning materials. There are lots of those already, but almost all their content is write-once-and-upload, i.e., they seem to be thinking in terms of re-use rather than collaboration. Sites like the Khan Academy and P2PU don't do this either: both are free, but the first isn't open (I can't hack their content), and the second is about setting up courses, rather than sharing course content in a reusable, remixable way.

And that, I think, is going to be the key to reaching our goal of helping thousands of people a year. Research has shown that blended learning—the combination of traditional synchronous classroom instruction with its online asynchronous counterpart—works better than either on its own. Its concrete realization for Software Carpentry would be to combine intensive two- or three-day workshops with weeks of slower self-directed exploration [4]. Since every group's needs will be slightly different, we need to make it easy for people to clone material (each other's as well as ours), customize it, and then share those changes. The third is currently missing, which is why this project's bus factor is still 1. We don't have the resources to build the tools, hub, and community that would solve this problem, but other interested parties do. As I said at the outset, maybe the way to solve Software Carpentry's problem is to solve one that's more general...

[1] And less despondent. It's hard to talk about the online teaching tools that are available today without sinking into an epic grump of nearly Scottish magnitude.

[2] This description makes GitHub sound a lot like some weird kind of wiki. It certainly does share some of the social aspects of things like Wikipedia, but version control works a lot better for complex content (like source code or high-quality learning materials).

[3] An attempt to get some built as part of GSoC 2011 led nowhere; there are some closed source options, but those are mostly aimed at Word and Excel.

[4] Combined with desktop sharing and crowdsourced assessment, but those are subjects for a future post.

Dialogue & Discussion

You can review our commenting policy here.