Home > Opinion > Doing the Math

Doing the Math

Let’s do some math. Suppose that working through the Software Carpentry course takes the average scientist five full-time weeks. It doesn’t matter whether that’s one five-week marathon, or whether the time is spread out over several months; the cost is still roughly 10% of the scientist’s annual salary (if you’re thinking like an administrator) or 10% of their annual published output (if you’re thinking like the scientist herself). How big a difference does it have to make to her productivity to be worthwhile?

Well, the net present value of n annual payments of an amount C with an interest rate of i is P=C(1-(1+I)-n)/i. If we assume our scientist only keeps doing research for another 10 years after taking the course (which I hope is pessimistic), and depreciation at 20% (which I also hope is pessimistic), then the present value works out to 4.2 times the annual savings. Doing a little long division, that means this training only has to improve the scientist’s productivity by 2.4% in order to pay for itself. That works out to just under an hour per week during those ten years; anything above that is money (or time) in the bank.

Now suppose the feedback we get from former students is right, and that this training saves them a day per week or more. Let’s assume the average scientist (whatever that means) costs $75,000 a year. (That’s a lot more than a graduate student, but a lot less than the fully-loaded cost of someone in an industrial lab.) 20% of their time over the same ten years, at the same 20% discount rate, works out to roughly $63,000; at a more realistic discount rate of 10%, it’s roughly $93,000. That’s roughly a ten-fold return on $7500 (five weeks of their time right now at the same annual salary).

So my question is, why do scientists—who are certainly supposed to be able to do basic math—ignore this? More to the point, why do the people who organize conferences one-science” persist in ignoring two facts:

  1. The biggest bottleneck for the overwhelming majority of scientists (90% or more if you believe our 2008-09 survey) is development time, not CPU cycles. Faster machines can improve turnaround times a bit, but mastering a few basic skills will make a much bigger difference.
  2. Even those scientists who really need supercomputers to do their work would get more done faster if they were wasting less time copying files around, repeating tasks manually, and reinventing sundry wheels. They are trying to solve two open problems at once: whatever is intrinsic to their science, and high-performance parallel programming. Tackling the latter without a solid foundation is like trying to drive an F1 race car on the highway before you’ve learned to change lanes in a family car. I know from personal experience that the crash and burn rate is comparable…

I will believe that computational science is finally outgrowing its “toys for boys” mentality when I see an e-science conference that focuses on process and skills: on how scientists develop software at the moment-by-moment, week-by-week, and year-by-year scales. I will believe that people really care about advancing science, rather than in the bragging rights that come from having the world’s biggest X or its fastest Y, when supercomputer centers start requiring courses on software design, version control, and testing as prerequisites to courses on GPUs and MPI. I’ll believe it when journals like Nature and Computing in Science & Engineering require every paper they publish to devote a section to how (and how well) the code used in the paper was tested.

And I’ll believe in Santa Claus when I see him up on my roof saying, “Ho ho ho.” What I won’t do is take bets on which will happen first.

Categories: Opinion Tags:
  1. James Howison
    June 20th, 2011 at 15:58 | #1

    Bravo, Greg, Bravo. I think this captures the argument very well. I might be tempted to add in the multiplicative effects of working together with others: if one is dependent on others for components, and they are slow, then even those who have made the investment of time are slowed down. Moreover still, if the depended upon component has to have re-work done to it, these dependencies can become nasty and reciprocal.

    Teaching understanding of existing code is also an important component, because if that is hard then people re-implement rather than adapt (or just use). That’s another NPV calculation, but the startup investment cost can be reduced through teaching understanding existing code bases.

    But neither really add that much to your already excellent argument. Thanks!

  2. June 20th, 2011 at 17:46 | #2

    What percentage of scientists will go off and get a job in computing? You need to factor that in.

    • Greg Wilson
      June 20th, 2011 at 18:06 | #3

      How does that factor in? I’m assuming that these scientists and engineers keep doing science and engineering, and that we’re trying to make them more productive at that, rather than re-training them. Or are you thinking, “If we teach them how to program, they’ll quit doing science and engineering and go build web sites/be sys admins for a living?” If the latter, that actually hasn’t been our experience: the more computational skills they have, the more successful S&E’s are at S&E, so if anything, they’re less likely to switch fields. (But I have no hard data to back up either that claim or its reverse.)

  3. June 20th, 2011 at 19:27 | #4

    I am reflecting on my PhD of 15 years ago on the molecular systematics of the genus Rhododendron and the fact that I have been pretty much full time in IT for the past 10 years because I developed some software skills.

    I guess it depends how long people’s contracts are. If they have tenure or a ‘real’ job then they won’t walk but as the conversion rate to tenured posts is not that great making the assumption that someone who isn’t tenured will still be in science in ten years is a big assumption. Basically it is a pyramid. Lots of undergrads, fewer masters, fewer doctoral students, fewer post-docs, very few tenured professors. I am not sure what the profile of your students is but if it includes the people lower down the pyramid the chances are they will be out of science within 10 years no matter how good they are – just because of the numbers involved – and this will mess up your sums. Also they may be more likely to go if they have transferable skills to sell rather than hang on to do yet another post-doc.

    I don’t mean to by cynical. I think what you are doing is great. Keep it up.

  4. Greg Wilson
    June 20th, 2011 at 19:30 | #5

    @Roger Hyam Thanks, but I guess I’m still puzzled: if grad students are going to switch into IT of some kind, doesn’t that make the course even more cost-effective (for them, if not their supervisors)?

  5. June 20th, 2011 at 19:31 | #6

    p.s. As your strap line is “Helping scientists make better software since 1997″ you should be ideally placed to check this. What % of your students from 2001 still working as scientists?

  6. Greg Wilson
    June 20th, 2011 at 19:35 | #7

    @Roger Hyam Any stats I could collect would only be meaningful if people’s email addresses stayed active and if they responded to random inquiries about courses they took 10 years ago. I think there’s significant sampling skew in both… :-)

  1. June 24th, 2011 at 00:40 | #1
  2. December 20th, 2011 at 11:29 | #2