Up: Software Engineering

Empirical Results

February 1st, 2011 Leave a comment Go to comments
slide 001 Hello, and welcome to the second episode of the Software Carpentry lecture on software engineering. In this episode, we’ll have a look at the science behind some of the claims we make in this course.
slide 002 Our story starts with the Seven Years War
slide 003 Which was actually nine years long (proving that it’s not just programmers who have trouble counting).
slide 004 During that war, the British lost about 1500 sailors to enemy action
slide 005 And almost 100,000—the population of a small city—to scurvy.
slide 006 The irony is, they didn’t need to lose any.
slide 007 Before the war even started, a Scottish surgeon named James Lind had done the first controlled medical experiment in history. He was intrigued by the fact that vegetables don’t go bad if they’re pickled, and he wondered: could the same thing somehow be done for people?
slide 008 So he took twelve sailors, divided them into six pairs, and gave each pair something different: cider, sea water, vitriol (which is a weak solution of sulfuric acid—bad day to be those guys), oranges, vinegar, and barley water.
slide 009 Lo and behold, the sailors who were given oranges were back on their feet in just a few days, while the others continued to sicken.
slide 010 It was a long time before the British Admiralty paid attention to his results, but when they finally did, it allowed British ships to stay at sea for months, which may have turned the tide of history during the Napoleonic Wars.
slide 011 It took even longer for the medical profession to start paying attention, but they finally did too. One of the turning points was Hill and Doll’s landmark study in 1950 that compared cancer rates in smokers and non-smokers. Their study proved two things:
slide 012 First, smoking causes lung cancer.
slide 013 Second, many people would rather fail than change. Even when confronted with overwhelming evidence, many people will cling to creationism, refuse to acknowledge that human beings are at least partly responsible for global climate change, or insist that vaccines cause autism.
slide 014 Unfortunately, this is still largely true in software engineering, where many people act as if a few pints and a quotation from some self-appointed guru constitute “proof” of claims that X is better than Y.
slide 015 The good news is, things are finally changing.
slide 016 Empirical studies of real programmers and real software were a rarity in software engineering before the mid 1990s.
slide 017 Today, though, papers describing new tools or working practices routinely include results from some kind of empirical study to back up their claims.
slide 018 Particularly papers written by younger researchers, which bodes well for the future.
slide 019 Many of these studies are still flawed or incomplete, but the standards of major journals and conferences are constantly improving.
slide 021 Here’s an example of the kind of question researchers are tackling. Does it matter if your developers are sitting together…
slide 022 …or can they be spread out all over the globe?
slide 023 Two scientists at Microsoft Research tried to find out by looking at data collected during the construction of Windows Vista.
slide 024 It turns out that geographical separation didn’t have much of an impact on software quality.
slide 025 What did was how far apart team members were in the org chart: basically, the higher up you had to go to find a common boss, the more bugs there would be in the software they built.
slide 026 In retrospect, this result isn’t actually surprising: if programmers have different bosses, the odds are that they’ll also have conflicting orders.
slide 027 The beauty of this result is that it’s actionable: all other things being equal, you can improve the quality of a piece of software by restructuring the team. (I would have said “by simply restructuring the team”, but of course, that kind of thing is never simple…)
slide 028 Here’s another neat result, also from Microsoft: what goes wrong for developers in their first job?
slide 029 A detailed qualitative study of eight new hires, none of whom had previous industry experience, found that technology was never the biggest problem.
slide 030 Where everyone actually stumbled was group dynamics: when to ask for help, how to ask, how to contribute to meetings, and so on. These skills are usually not part of a technical education, but in every case, this was what hurt new hires’ productivity the most.
slide 031 Again, in retrospect this finding isn’t surprising, but it’s also actionable: by investing a little in team skills early on, companies (and presumably research labs as well) can reduce both their hidden costs and their new hires’ frustrations.
slide 032 This second study highlights something important about empirical studies in software engineering: a lot of the best ones are not statistical in nature.
slide 033 Instead, a lot of first-rate work draws on techniques from anthropology…
slide 034 …and business studies.
slide 035 This is partly because controlled experiments large enough to be statistically significant are very expensive to run.
slide 036 The real reason, though, is that qualitative techniques are often the right ones to use, because controlled laboratory studies would all too often eliminate the real-world effects that we actually want to study.
slide 037 In fact, one of the biggest obstacles to wider adoption of evidence-based software engineering is the resistance of scientists and programmers, many of whom dismiss qualitative methods as “soft” without actually knowing anything about them.
slide 038 Another reason for resistance is that people don’t like finding out that their cherished beliefs might be wrong. One example is test-driven development: the practice of writing tests before writing code.
slide 039 Many programmers believe quite strongly that this is the “right” way to program, and that it leads to better code in less time.
slide 040 However, a meta-analysis of over thirty studies found no consistent effect.
slide 041 Some of the studies reported benefits…
slide 042 …some found that it made things worse…
slide 043 …and some were inconclusive.
slide 044 One clear finding, though, was that the better the study, the weaker the signal. This result may be disappointing to some people (it certainly was to me), but progress sometimes is. And even if these studies are wrong, figuring out why, and doing better studies, will advance our understanding.
slide 045 Here’s another useful result, one that dates all the way back to the 1970s…
slide 046 …and has been replicated many times since.
slide 047 First, most errors in software are introduced during requirements analysis and design, not during coding.
slide 048 Second, the later a bug is removed, the more expensive the fix is. What’s more, that curve actually is exponential: as we move from analysis to design to coding to testing to deployment, fixing a bug is five to ten times more expensive at each successive stage, and these costs are multiplicative.
slide 049 The beauty of this result is that it explains why programmers disagree about how to run projects.
slide 050 Pessimists look at these curves and say, let’s tackle the hump in the bug creation curve by doing more analysis and design up front.
slide 051 Meanwhile, optimists say, if we do many short iterations instead of a few long ones, the total cost of fixing bugs will go down…
slide 052 …because the total area under the sawtooth curve is less than the area under the original curve. Both sides are right: they’re just looking at different aspects of the problem.
slide 053 Here’s another classic result, also from the mid-1970s.
slide 054 It turns out that reading code carefully is the most effective way to find bugs—and the most cost-effective as well. In fact, reading code carefully can find 60-90% of all the bugs in it before it’s run for the first time.
slide 055 Thirty years on, Cohen and others refined this result by looking at data collected at Cisco. They found that almost all of the value of code reviews came from the first reviewer, and the first hour they were reviewing code. Basically, having more than one person review the code doesn’t find enough bugs to make it worthwhile, and if someone spends more than an hour reading code, they become fatigued and stop finding anything except trivial formatting errors.
slide 056 In light of this, it’s not surprising that code review has become a common practice in most open source projects: given the freedom to work any way they want, most top-notch developers have discovered for themselves that having someone else look over a piece of code before it’s committed to version control makes development faster, not slower.
slide 057 Books like Robert Glass’s Facts and Fallacies of Software Engineering, and a recent collection from O’Reilly called Making Software, present these results and many more in a digestible way.
slide 058 Does your choice of programming language affect your productivity?
slide 059 Does using design patterns make your code better?
slide 060 Can data mining techniques help us predict how many bugs are in a piece of software, and where they’re likely to occur?
slide 061 Is up-front design cost-effective, or should software evolve week by week in response to immediate needs?
slide 062 Why do so many people find it so hard to learn how to program?
slide 063 Is open source software actually higher quality than closed source alternatives?
slide 064 And are some programmers ten times more productive than others (or 28 times, or a hundred times—you’ll see all these numbers quoted on the web, and more).
slide 065 We actually have answers to some of these questions now, and if you’re going to spend any significant time programming, or arguing about programming, it’s easier than ever to find out what we know and why we believe it’s true.

  1. No comments yet.