Titus Brown just posted "Big Data Biology — why efficiency matters", in which he explains the academic, practical, and algorithmic reasons why efficient computation is good for science. His opinions are all grounded in his extensive personal experience working in the overlap between biology and computing, and, combined with a couple of things Michelle Levesque showed off at our Oakland workshop last week, have me thinking that we ought to include half an hour on performance analysis and tuning in the Software Carpentry core. Damn you, Titus—I thought I had this curriculum figured out :-)
Titus's post has reminded me of something I've realized about big data. (Caveat: I've never done "big data" myself, just watched other people wrestle with it.) Titus and others don't think about individual bytes and records any more often than chemical engineers think about individual atoms. Instead, they think in terms like "percentage yield for such and such parameters" and "cost-yield tradeoff for such and such a process". Yes, they can relate their rules back to their respective atoms, but that's like saying (to switch analogies for a moment) that a physicist studying fluid mechanics can relate the Navier-Stokes equations back to quantum mechanics.