Archive

Archive for November, 2011

Three Short Thoughts

November 29th, 2011 2 comments
  1. A BBC article title “Coding – the New Latin” resonated: Latin was the language of learned discourse in the formative years of modern science, but not something most people spoke day-to-day. I think that’s a good model for computing in the sciences; like statistics, it requires familiarity, not expertise.
  2. Jorge Aranda’s review of Codermetrics talks about the limitations to quantification in software engineering. I’ve said before that we need to measure how much time is lost due to poor computing skills in order to get people to take this kind of training seriously, but I’m very conscious of just how much measurement can’t tell us.
  3. Another thought-provoking post by Cameron Neylon asks what it’s reasonable to expect from scientists, and their software. I’m sure he’d enjoy hearing what you think…

Categories: Opinion Tags:

Building a Bibliography

November 25th, 2011 1 comment

With help from several of our regular readers, we have assembled a bibliography of research related to software engineering and computational science to go with our recommended reading list for students. We hope you find it useful, and we would welcome corrections and additions.

2011-11-27: bibliography updated.

Categories: Research Tags:

Knowledge of the Second Kind

November 19th, 2011 No comments

Over the last three years, a group of students has quietly been converting snacks and enthusiasm into scientists who can program.

The Hacker Within is a student club at the University of Wisconsin – Madison which came about when a number of nuclear engineering graduate students needed a forum to exchange tools and share best practices for their increasingly software intensive research. The success that followed provides an example of an educational model that has fostered necessary software skills among science and engineering graduate students.

Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information upon it.

–Samuel Johnson

Since 2008, we’ve met every other week for an hour to discuss some useful computing tool (and eat snacks). We cover a broad range of topics, from peer-taught fundamental skills to more technical invited talks. The meetings attract students mostly from engineering, biology, and physics, but also have regular members from less predictable fields, such as psychology and limnology (the study of lakes!). We also pool the skills of our members to teach three and four day intensive, example-driven bootcamps. These attempt to impart fundamental programming skills such as C++, UNIX, and Python, and focus on a great deal of curriculum inspired by Software Carpentry. The bootcamps have received great praise from attendees (who hail from a staggering array of disciplines, see this cool chart).

This bootcamp educational model has the advantage over a traditional course in that the time intensive nature of scientific coursework limits the feasibility of formal curriculum in software skills for scientists. That is to say, even if the right course were offered (and what would that be, exactly?), scientific curriculum leaves no room for a software development course (or worse, many).  For this reason, students in scientific disciplines typically lack the software skills with which to conduct computational research effectively, but are unwilling or unable to invest time in formal training.

The current state of affairs in academic research is often one in which students and researchers are programming in a vacuum, teaching ourselves computational tools unfamiliar to peers in our fields, and then using those tools to do our ‘peer reviewed’ research. This toxic situation demands a real change in the way we educate students in preparation for scientific computing.

The Hacker Within community model has the potential to alleviate this situation in any institution that has a few individuals to spearhead it. A few snacks and some enthusiasm can replace a disconnected collection of researchers scattered across disciplines, with an inter-departmental forum in which those researchers can find and share knowledge efficiently with their peers.

Categories: Uncategorized Tags:

Accessible to All?

November 18th, 2011 2 comments

I just posted an article on my personal blog about the (in)accessibility of online educational material (including Software Carpentry’s). As I said there, there aren’t any easy answers, but if we do find funding to keep this project going, I’d like to find ways to make our content easier for everyone to use.

Categories: Opinion Tags:

Quantifying Installation Costs

November 18th, 2011 No comments

A few months ago, I tried to quantify the cost of poor software skills. A recent post from Adam Klein gives is a good excuse to try to do something similar for the cost of installing software. In his post, Klein describes the 17 steps he went through to set up a Python data hacking environment on a new machine. If we assume that each step has a 5% chance of failing for some reason (packages have moved on, the compiler isn’t exactly the same version as Klein was using, whatever), then the chances of the whole process working are (1-.05)17, or roughly 42%. In other words, his process will fail to work the first time for over half of the people who try it. In some cases, they’ll be able to figure out why, fix the problem, and move on, but in many others they won’t—as I said earlier this week on my personal blog, we’ve taken something that may or may not be intrinsically hard (programming), and made it much harder by burying under layer upon layer of grief. The end result is that when a scientist sits down to try something new, s/he has no way of knowing whether it will take an hour, a day, or forever. It’s hard to build a career on top of that kind of uncertainty…

Categories: Opinion Tags:

Show Me the Data

November 18th, 2011 6 comments

I got mail from a colleague at a prominent US university yesterday saying (in part, and elided to protect the guilty):

…the graduate student representative to the curriculum committee reported that the students did not want a scientific computing course, that they would instead figure it out themselves…. How does one respond to statements like this…that have…basically frozen skill levels? The options I see are formal (“in curriculum”) training, bootcamps and /workshops, and letting them “figure it out themselves”. Are there arguments about the successes of each?

There are certainly arguments: the problem is, there’s practically no data. After 14 years, the conclusion I’ve reached is that we will be ignored until we do empirical field studies to show people just how many potential research hours are being wasted due to inadequate computational skills. Surveys won’t tell us: we need to get someone out in the field to shadow grad students for a few weeks, watching what they actually do and how they do it, so that we can compare the median with the 90th percentile (or 75th, or whatever). I estimate it would take one person 4-5 months to do a preliminary version, and then another 15-20 researcher-months to collect enough data to show senior faculty just how bad things are. Of course, many would ignore the results (just look at how many doctors smoke), but I’d like to think it would change at least a few minds, and I frankly don’t know what else will.

We know that such studies are possible, but I haven’t found anyone willing to fund one in this particular area: I asked NSERC—Canada’s equivalent of the NSF—twice in the three and a half years I was a professor; they said “no” both times, and I’ve had no more success elsewhere. As scientists, shouldn’t we study the effectiveness of training just as rigorously as we’d study the effectiveness of a new treatment for diabetes? And if we’re not going to do that, shouldn’t we stop calling ourselves scientists?

Categories: Opinion, Research Tags:

Clearing Up Code

November 14th, 2011 1 comment

The November/December 2011 issue of IEEE Software has a good article by the Climate Code Foundation’s Nick Barnes and David Jones titled “Clear Climate Code: Rewriting Legacy Science Software for Clarity“. In it, they describe how and why they rewrote a program used to calculate and compare global surface temperatures based on historical data. The original had been attacked by climate change denialists, first because it wasn’t publicly available, and then because it was tangled and hard to run. Their rewrite produced something smaller, faster, and much easier to understand; most importantly, though, it validated the results of the initial program.

Which made me wonder: what scientific program would you most like to see rewritten? To keep the question realistic, it has to be something small enough that two good programmers could do it in six months or less. What would you like rebuilt, and what do you think the benefit would be?

Categories: Opinion Tags:

Surviving the Tsunami

November 14th, 2011 No comments

The October 2011 issue of ACM Queue features an article by Bruce Berriman and Steven Groom titled “How Will Astronomy Archives Survive the Data Tsunami?” The figures are scary: astronomers already have a petabyte of publicy-available data, and are adding half a petabyte per year, a rate which will increase dramatically as new instruments come online. The only way to avoid this all becoming write-only is to bet on emerging technologies, from general-purpose GPUs to cloud computing. The problem, of course, is that “emerging” usually means “flaky”, both because the tools haven’t had time to mature, and because we, their users, don’t have the years of experience needed to know how best to use them.  (As far as I’m concerned, we’re still trying to figure out how best to use object-oriented programming in science, and we’ve been at it for thirty years…)

But here’s the good news. Instead of just the usual perfunctory nod toward education and training, Berriman and Groom put a spotlight on it:

An archive model that includes processing of data on servers local to the data will have profound implications for end users, who generally lack the skills not only to manage and maintain software, but also to develop software that is environment-agnostic and scalable to large data sets.  Zeeya Merali [...] and Igor Chilingarian and Ivan Zolotukin [...] have made compelling cases that self-teaching of software development is the root cause of this phenomenon…

Berriman and Groom go on to recommend that we “…make software engineering a mandatory part of graduate education, with a demonstration of competency as part of the formal requirements for graduation.” As I’ve discussed before, there’s little chance of this happening in the short or medium term: everyone’s curriculum is already over-full, and senior professors who only know what they taught themselves a generation ago are unlikely to push aside core courses in stellar dynamics or planetary physics to make room for version control and design patterns. What we can do, I think, is make resources like Software Carpentry more usable, and implement some sort of badging system to give students recognition for having completed the training themselves, and for passing it on to others (which would in turn encourage the formation of self-help groups like the University of Wisconsin’s Hacker Within). All we need is funding for a couple of people for a couple of years…

Categories: Noticed Tags:

Successful Bootcamp

November 11th, 2011 No comments

Our 2011 Software Carpentry bootcamp, hosted with help by The Hacker Within and Scinet, was a huge success. We hosted 25 students for two very full days of hands on introductions to Python, The Shell, Nose tests, SVN, and sqlite.

So what did we learn? We’re still waiting on participant feedback, but a few things come to mind.

First, wow, how did we screw software installation up so very badly? Most of our technical problems can from inconsistencies between various Python builds. Some students installed numpy separately from python and ran into version mismatches. The demo packages were missing from some people’s iPython installations (including my own!) That doesn’t even mention Cygwin. We got by thanks to the help of the wonderful volunteers who were able to sort out most of the problems, but it’s hard to pitch a produce that is that difficult to get up and running. Imagine if we tried to publish papers as inaccessible as our code.

Second, this bootcamp was a success because we got people using software on their machines from the very start. People have the programs, they have example code, and they are ready to use the thing we taught them. It’s no small thing to meld good exercises with lectures, especially when people get confused or ask questions and get behind. It takes a team of people who can jump in and help participants through error messages, lost connections, typos, and bugs.

So in the end, it was all about the people. Thanks to Jonathan Dursi, Jonathan Deber, Keven Brown, David Wolever, Katy Huff, Orion Buske, and Greg Wilson for making this bootcamp a success.

Here’s picture proof!

Categories: Community, University of Toronto Tags:

The Best vs. the Good

November 8th, 2011 No comments

Cameron Neylon recently posted another thought-provoking piece, this one titled, “Building the perfect data repository…or the one that might get used“. In it, he talks about why big institutional efforts to create scientific data repositories have mostly failed to take off, and points at simpler grassroots efforts that scientists might actually adopt because they immediately and obviously solve problems that scientists actually realize they have. One that he points to is DataStage, “a secure personalized ‘local’ file management environment for use at the research group level, appearing as a mapped drive on the user’s PC.” Another is If This Then That, which is the simplest “dataflow” tool you could imagine, and which I’ve been using regularly since it burst on the scene a couple of months ago.

Categories: Noticed Tags: