Archive

Archive for June, 2011

It Will Never Work in Theory

June 29th, 2011 No comments

Inspired in part by Lambda the Ultimate, which reports on what’s new in programming language research, Jorge Aranda and I have started a new blog called “It Will Never Work in Theory” to bring you the latest results in empirical studies of software engineering. The first posts discuss:

  • Rahman and Devanbu‘s “Ownership, Experience, and Defects: A Fine-Grained Study of Authorship”, which found that code worked on by one developer (rather than many) is more often implicated in defects, but that a developer’s experience with a particular file (rather than the project in general) reduces defect rates.
  • Stolee and Elbaum‘s “Refactoring Pipe-like Mashups for End-User Programmers”, which applies the “code smells” meme to Yahoo! Pipes (and by implication shows that refactoring ideas can be applied to other end-user programming systems).
  • Mockus‘s “Organizational Volatility and its Effects on Software”, which found that an influx of newcomers into a project doesn’t increase fault rates (since they’re usually given simple tasks to start with), but that organizational change can still account for about 20% of faults.

Our aim in starting this blog is to continue the work begun in Making Software: to let practitioners know what researchers have discovered, and what kinds of questions they can answer, and to give researchers feedback on what’s useful, what isn’t, and what they ought to look at next. We look forward to your feedback.

Categories: Noticed Tags:

Michael Nielsen Talks About Open Science in San Francisco on June 29

June 22nd, 2011 No comments

As per his blog post, the inimitable [1] Michael Nielsen will be talking about “Why the net doesn’t work for science—and how to fix it” next Wednesday in San Francisco. It’s sure to be both informative and enjoyable—hope you can make it.

[1] Well, you try to imitate an Australian quantum physicist turned open science advocate…

Categories: Noticed Tags:

Doing the Math

June 20th, 2011 7 comments

Let’s do some math. Suppose that working through the Software Carpentry course takes the average scientist five full-time weeks. It doesn’t matter whether that’s one five-week marathon, or whether the time is spread out over several months; the cost is still roughly 10% of the scientist’s annual salary (if you’re thinking like an administrator) or 10% of their annual published output (if you’re thinking like the scientist herself). How big a difference does it have to make to her productivity to be worthwhile?

Well, the net present value of n annual payments of an amount C with an interest rate of i is P=C(1-(1+I)-n)/i. If we assume our scientist only keeps doing research for another 10 years after taking the course (which I hope is pessimistic), and depreciation at 20% (which I also hope is pessimistic), then the present value works out to 4.2 times the annual savings. Doing a little long division, that means this training only has to improve the scientist’s productivity by 2.4% in order to pay for itself. That works out to just under an hour per week during those ten years; anything above that is money (or time) in the bank.

Now suppose the feedback we get from former students is right, and that this training saves them a day per week or more. Let’s assume the average scientist (whatever that means) costs $75,000 a year. (That’s a lot more than a graduate student, but a lot less than the fully-loaded cost of someone in an industrial lab.) 20% of their time over the same ten years, at the same 20% discount rate, works out to roughly $63,000; at a more realistic discount rate of 10%, it’s roughly $93,000. That’s roughly a ten-fold return on $7500 (five weeks of their time right now at the same annual salary).

So my question is, why do scientists—who are certainly supposed to be able to do basic math—ignore this? More to the point, why do the people who organize conferences one-science” persist in ignoring two facts:

  1. The biggest bottleneck for the overwhelming majority of scientists (90% or more if you believe our 2008-09 survey) is development time, not CPU cycles. Faster machines can improve turnaround times a bit, but mastering a few basic skills will make a much bigger difference.
  2. Even those scientists who really need supercomputers to do their work would get more done faster if they were wasting less time copying files around, repeating tasks manually, and reinventing sundry wheels. They are trying to solve two open problems at once: whatever is intrinsic to their science, and high-performance parallel programming. Tackling the latter without a solid foundation is like trying to drive an F1 race car on the highway before you’ve learned to change lanes in a family car. I know from personal experience that the crash and burn rate is comparable…

I will believe that computational science is finally outgrowing its “toys for boys” mentality when I see an e-science conference that focuses on process and skills: on how scientists develop software at the moment-by-moment, week-by-week, and year-by-year scales. I will believe that people really care about advancing science, rather than in the bragging rights that come from having the world’s biggest X or its fastest Y, when supercomputer centers start requiring courses on software design, version control, and testing as prerequisites to courses on GPUs and MPI. I’ll believe it when journals like Nature and Computing in Science & Engineering require every paper they publish to devote a section to how (and how well) the code used in the paper was tested.

And I’ll believe in Santa Claus when I see him up on my roof saying, “Ho ho ho.” What I won’t do is take bets on which will happen first.

Categories: Opinion Tags:

Health Informatics Resources

June 18th, 2011 No comments

Via William Hopper, a list of online healthcare informatics resources that might be of interest to some readers.  If you have others, I’m sure he’d enjoy hearing from you.

Categories: Noticed Tags:

New Episode: MATLAB Structs and Cell Arrays

June 15th, 2011 No comments

The title says it all: thanks to the tireless Tommy Guy, we have a new episode on MATLAB structs and cell arrays.

Categories: Content, Version 4.1 Tags:

A New Look

June 14th, 2011 2 comments

I’m fond of the Software Carpentry logo, but the blue-to-white color fade is difficult to print on coffee mugs, and impossible to embroider on shirts. Thanks to the talented Veronica Wong, we have a new one:

We’ll be converting things over piece by piece as we rebuild the website over the summer.

Categories: Version 5.0 Tags:

Audio Processing in Python

June 10th, 2011 No comments

Thanks to Becky Stewart, we now have a 12-minute episode on audio processing in Python. We hope you find it useful—as always, feedback is very welcome.

Categories: Content, Version 4 Tags:

Practical Computing for Everyone (not just biologists)

June 7th, 2011 No comments
Steven Haddock and Casey Dunn:
Practical Computing for Biologists.
Sinauer Associates, 2010, 0878933913.

My copy of Practical Computing for Biologists arrived last week, and I’ve been very impressed. It is a well-written, well-paced guide to basic computing skills for scientists and engineers of all stripes (not just biologists).  Many of the topics will be familiar:

  • editing text files (including how to use regular expressions in an editor)
  • the Unix shell
  • basic Python programming (including debugging strategies)
  • relational databases
  • SSH
  • installing and configuring software

There are also a few that we don’t cover, such as interacting with hardware, and some that are covered in more depth than we give them, like image manipulation.  The pace is gentler than Software Carpentry, but the last couple of years have convinced me that’s a good thing: I think Haddock & Dunn have it right for this target audience. And it’s beautifully produced: full-color printing and great graphical design make this book a joy to read.  If I ever do turn Software Carpentry into a book, I might skip the topics PCB covers and just tell people to go and buy it.

Recommended.

Categories: Noticed, Opinion Tags:

Programming for Scientists at Newcastle University: June 20, 2011

June 4th, 2011 No comments

From the announcement:

Programming is becoming an increasingly important part of scientific research, yet many scientists are self-taught programmers with little formal training. This means that we are often unfamiliar with simple tools that can make programming and dealing with data faster, more reliable and more reproducible. This event is a day-long workshop to develop awareness of the skills and tools that help make computing more efficient and provide results that are less prone to error. If you’ve ever thought “Surely there must be a better way to do this”, then this is the event for you!

There is also a fuller description—check it out.

Categories: Noticed Tags:

Five on Systems Programming

June 2nd, 2011 No comments

Thanks the Software Sustainability Institute‘s Mike Jackson, we now have five episodes on how to inspect and manipulate files and directories from inside a Python program—many thanks. If you would like to contribute to this project as well, please get in touch.

Categories: Content, Version 4 Tags: