Home > Education > Fork, Merge, and Share

Fork, Merge, and Share

As George Pòlya said, sometimes the best way to solve a problem is to solve a more general one. In that spirit, this post was originally going to be about the mechanics of helping thousands of people a year (which is the first of our five-year goals). After getting feedback from a few people on early drafts, though, it has morphed into a discussion of something that I hope you’ll find more interesting [1].

Let’s start with version control. As this potted history explains, what really made version control invaluable wasn’t its “infinite undo”. Instead, it was the ability to merge things, which meant that many people could work independently and then bring what they’d done together when it made sense to do so. CVS was the first system built on this model, but its latest incarnations, like Mercurial and Git, have pushed the idea even further. With them, there is no “master copy”; instead, every copy is a peer of every other, so that anyone can merge with anyone at any time. Yes, it can be chaotic, but the last couple of years have proven that the benefits—particularly the increased freedom to tinker that this model supports—outweigh the risks.

GitHub is the poster child for this. Like SourceForge before it, GitHub allows anyone to create a repository for an open project. Crucially, though, it also makes it easy for people to clone projects, make changes, and then offer those changes back to the author of the original. This was always possible with earlier system, but GitHub has made it routine. And when I said “open project”, I didn’t just mean software: there are books being developed through GitHub as well. Admittedly, most are on technical topics, but there’s no reason the model couldn’t be used for other kinds of content [2].

Could it be used for learning materials? I.e., would it be possible to create a “GitHub for education”? Right now, I think the answer is “no”, because today’s learning content formats make merging hard. PowerPoint remains the tool (and format) most commonly used for individual lessons, but there aren’t good open tools to merge PowerPoint files [3]. As a result, if someone takes the Software Carpentry lecture on regular expressions, moves a few slides around, and cleans up a few examples, it can take me almost as long to merge their changes back into my copy as it would take me to make those changes myself.

Shifting from micro to macro, the closest thing we have to a standard format for lessons is SCORM, but it’s as clumsy and expensive to work with as SOAP. What’s more, to the best of my knowledge there aren’t any tools out there to help people find differences between two SCORM packages, much less merge them. And having the kind of metadata that’s in SCORM really does matter if we want to reach lots of people. There’s more to teaching that putting facts in front of people; when it’s done well, teaching is about organizing those facts into a coherent narrative so that learners can see how the facts fit together. Using open source software as an analogy once again, learning plans are like architectural documentation; you don’t have to have it, but people will find it a lot easier to understand, use, adapt, and contribute to your project if you do.

Whatever a “GitHub for education” would look like, it would not be yet another repository of open learning materials. There are lots of those already, but almost all their content is write-once-and-upload, i.e., they seem to be thinking in terms of re-use rather than collaboration. Sites like the Khan Academy and P2PU don’t do this either: both are free, but the first isn’t open (I can’t hack their content), and the second is about setting up courses, rather than sharing course content in a reusable, remixable way.

And that, I think, is going to be the key to reaching our goal of helping thousands of people a year. Research has shown that blended learning—the combination of traditional synchronous classroom instruction with its online asynchronous counterpart—works better than either on its own. Its concrete realization for Software Carpentry would be to combine intensive two- or three-day workshops with weeks of slower self-directed exploration [4]. Since every group’s needs will be slightly different, we need to make it easy for people to clone material (each other’s as well as ours), customize it, and then share those changes. The third is currently missing, which is why this project’s bus factor is still 1. We don’t have the resources to build the tools, hub, and community that would solve this problem, but other interested parties do. As I said at the outset, maybe the way to solve Software Carpentry’s problem is to solve one that’s more general…

[1] And less despondent. It’s hard to talk about the online teaching tools that are available today without sinking into an epic grump of nearly Scottish magnitude.

[2] This description makes GitHub sound a lot like some weird kind of wiki. It certainly does share some of the social aspects of things like Wikipedia, but version control works a lot better for complex content (like source code or high-quality learning materials).

[3] An attempt to get some built as part of GSoC 2011 led nowhere; there are some closed source options, but those are mostly aimed at Word and Excel.

[4] Combined with desktop sharing and crowdsourced assessment, but those are subjects for a future post.

Categories: Education Tags:
  1. Jon Pipitone
    December 31st, 2011 at 07:15 | #1

    You’ve identified the “merging problem” as central to ramping up SWC’s reach. This seems like progress of a sort.

    But why is the merging problem so central, again? Is it because SWC is to be some sort of authoritative source for teaching materials? It’s what you seem to implicitly suggest in your post (or else, why the need to merge at all?), but you don’t come out and say it and I think you need to.

    The reason I keep harping on the need to clarify your vision is because I really don’t yet have a clear idea for what form SWC will actually take. I don’t think you do either. You’ve identified the need to improve programming competence, identified your goals of measurably helping thousands of scientists in five years and no longer being the sole person responsible for the project.

    Now, what activities will those involved in Software Carpentry actually carry out? Is it to be an authoritative resource for teaching materials curated by a community (much in the way wikipedia is)? Or is it a social networking site of sorts to coordinate and establish campus groups dedicated to co-learning computer skills? Or is it a train-the-trainers initiative whereby a core group of “experts” are deployed to campuses to run week long intensives? Or…

    You see? I don’t think this is an easy question to answer yet because there are a bunch of options you’ve partially considered. But, I fear that if we skip answering this question (the “what”), and jump to answering the “how”, we’ll end up in a muddle. We’ll have no way to assess the suitability of any of the bits of mechanics we consider (what would we be assessing their suitability /for/?)

    My other concern is that, even if the merging problem turns out to be a relevant hindrance to the aims of this project (say we do decide to be an authoritative source)… the merging problem so deep and ill-defined that, despite it maybe being the right problem to solve (whatever /that/ means), I have no reason to believe the payoff from tackling it will arrive in any practical amount of time. If we were to make solving this problem the sole purpose of SWC, I’d worry we’d miss out on the opportunity to help scientists in so many other, more direct ways.

  2. January 2nd, 2012 at 21:24 | #2

    Quite the coincidence, I just put my course on github this weekend and speaking to your concerns here: http://reagle.org/joseph/blog/career/teaching/fork-merge-share

  3. January 5th, 2012 at 20:38 | #3

    @Joseph Reagle I think the key phrase in your blost post is “most every _text_ I author”. If your slides are primarily textual, Markdown or something similar works fine. But you can’t mix diagrams and text — really mix them the way you would on a whiteboard (or in PowerPoint), not just put them side by side with a PNG or SVG here, and text there. I think it’s a tradeoff between ease of management and richness of presentation; I’d like to have both.

  1. December 31st, 2011 at 20:44 | #1
  2. April 17th, 2012 at 18:01 | #2