Teaching basic lab skills
for research computing

Grant Proposal

Rebuilding the Software Carpentry course is too big to do in anyone's spare time, so I've submitted a small funding application (included below). If you know any venues that might welcome a similar proposal, I'd be grateful for pointers.


A Self-Paced Software Skills Bootcamp for Research Scientists

Proposed Dates and Venue

Trial run to take place online in July 2010.

Type of Activity

Online learning community built around a web-based self-paced course.

Executive Summary

The aim of this work is to convert a highly-successful three-week training course into an online learning community where graduate students and researchers in science and engineering can learn, improve, and share the fundamental software development skills that are directly relevant to their work. Participants will learn how to create, use, and share software that does what they need reliably and efficiently, so that they can spend more time doing leading-edge science.

Academic/Scientific Objectives

Background

It is commonplace to observe that computers are as important to modern science as test tubes and whiteboards. What is pointed out less often is how poorly scientists use them. From high school onward, scientists are expected to calibrate their equipment and take careful notes when doing experiments. When those same scientists use computers for simulations and data analysis, however, they have very different standards: many do not keep track of software and parameters accurately enough to be able to reproduce their results, and have no idea how reliable their software is.

To date, scientists have not worried about these issues because there has been no incentive for them to do so. Journal or conference reviewers rarely ask how (or whether) code was tested, and grant reviewers rarely ask whether how much of the time spent writing software was well spent.

As the pace of discovery accelerates, however, there is increasing pressure for scientists to build things right and quickly. A wealth of empirical software engineering research over the past 30 years has demonstrated that the best-in fact, the only-way to improve quality and productivity is to improve the way in which software is built. As in manufacturing and medicine, investments here repay themselves several times over because mistakes are more expensive to fix than to prevent. This realization is at the heart of modern software development processes, but to date, these have only been adopted by a small minority of scientists.

Prior Work

Since 1997, I have been teaching software engineering to scientists and engineers at national laboratories, companies, and universities in the US and Canada. My aim has not been to turn them into professional programmers, but rather to equip them with the skills they need to design, build, maintain, and use software effectively in their research. The materials for my course, which have been available under an open license since August 2006, have been viewed by over 135,000 distinct visitors from 70 countries, and have been used at universities and companies in both Americas, Europe, and the Far East. Topics covered include:

  • Version control
  • Basic object-oriented design
  • Automated builds
  • Unit testing and test-driven development
  • Basic scripting
  • Agile development processes
  • Maintenance and integration
  • Working with plain text, XML, and relational data

Thanks to sponsorship from MITACS and Cybera, this material was offered to graduate students from several Canadian universities as a condensed three-week "crash course" in July 2009. For the first time, the course was run in a distributed fashion: half the students taking part were in Toronto, while the other half were in Edmonon, and lectures were webcast in both directions. Students did not collaborate directly on programming projects during the course, but all took part in the same interactive question and answer sessions after each lecture.

While there were a few technical hiccups, response from participants was extremely positive. As the letters of support attached to this proposal show, both they and their supervisors felt that this training would make them significantly more productive, and allow them to tackle problems that were previously out of reach.

Proposal

My term goal is to bring this kind of training to the widest possible audience. Building on this past summer's success, and on the experiences of colleagues who have begun to teach online, I believe the time has come to create a self-paced version of this material that would use video lectures and screencasts to present the material, and Web 2.0 collaboration tools to foster an online learning community around it. This would allow students to focus on topics that were most directly relevant to their needs, and to absorb material at whatever pace suited them best, but at the same time give them somewhere to turn with questions. It would also indirectly help foster ties between young Canadian researchers in science and engineering without requiring them to take three weeks out of their lives (something that participants singled out as their major complaint about the course in the post mortem held on July 31).

The proposed schedule for this work is:

  • May'10-Jun'10: Update the existing course material and create the first six video lectures. Based on colleagues' experiences, I estimate between a 20:1 and 50:1 ratio of production time to content length during this trial period.
  • Jul'10: Make these initial lectures available online to students who agree to participate in follow-up interviews to identify areas for improvement, and to help bootstrap the online learning community-as many studies have shown, such communities are far more likely to take off if they are seeded with some initial content.
  • Aug'10-Sep'10: Analyze interviews, make improvements to initial lectures, and draw up a detailed proposal for full implementation to submit to MITACS and other agencies.

To jumpstart the online learning community, the lecture notes and examples will be converted to MediaWiki format (the same one used in Wikipedia). The first wave of students will correct, clarify, and extend these notes under the supervision of a course instructor; they, and the instructor, will then provide feedback on changes proposed by students doing the course remotely at their own pace. This will in turn be combined with user-contributed video content of the kind hosted at ShowMeDo.com, which already features some screencasts of Software Carpentry material, and with the by-now-usual mix of online forums and collaborative link curation.

Dialogue & Discussion

You can review our commenting policy here.