Up: Testing

Introduction

slide 01 Hello, and welcome to the first episode of the Software Carpentry lecture on testing. In this episode, we’ll look at what testing can actually do, and start exploring how you can go about it systematically.
slide 02 Nobody actually enjoys testing software.
slide 03 So if you’d like to skip this lecture, you can, provided that:
slide 04 Your programs always work correctly the first time you run them, or
slide 05 You don’t actually care whether they’re doing the right thing or not, as long as their output looks plausible.
slide 06 You can also skip this lecture if you enjoy wasting time and taking longer to get things to work.
slide 07 You might not think that has much to do with testing, but study after study has shown that the more you invest up front in quality, the sooner your program will be ready to use. Just as in manufacturing and medicine, slowing down a little is the best way to speed things up a lot.
slide 08 Testing actually serves two purposes.
slide 09 It tells you whether your program is doing what it’s supposed to do.
slide 10 But if it’s done right, it will also tell you what your program actually is supposed to be doing.
slide 11 Tests are runnable specifications of a program’s behavior.
slide 12 Unlike design documents or comments in the code, you can actually run your tests, so it’s harder for them to fall out of sync with the program’s actual behavior. In well-run projects, tests also act as examples to show newcomers how the code should be used, and how it’s supposed to behave under different circumstances. We’ll explore this idea in detail in a later episode.
slide 13 Before we go on, though, it’s important to understand that there’s a lot more to software quality than testing. Testing doesn’t create quality, it measures it.
slide 14 As Steve McConnell said, trying to improve the quality of software by doing more testing is like trying to lose weight by weighing yourself more often.
slide 15 But a good set of tests will help you track down bugs more quickly, which in turn speeds up development.
slide 16 It’s also important to understand that testing can only do so much. For example, suppose you’re testing a function that compares two 7-digit phone numbers.
slide 17 There are 107 such numbers…
slide 18 …which means that there are 1014 possible test cases for your function.
slide 19 At a million tests per second, it would take you 155 days to run them all.
slide 20 And that’s only one simple function: exhaustively testing a real program with hundreds or thousands of functions, each taking half a dozen arguments, would take many times longer than the expected lifetime of the universe.
slide 21 And how would you actually write 1014 tests? More importantly, how would you check that the tests themselves were all correct?
slide 22 In reality, “all” that testing can do is show that there might be a problem in a piece of code. If testing doesn’t find a failure, there could still be bugs lurking there that we just didn’t find. And if testing says there is a problem, it could well be a problem with the test rather than the program.
slide 23 So why test? Because it’s one of those cases where something that shouldn’t work in theory is surprisingly effective in practice. It’s just like mathematics: any theorem proof might contain a flaw that just hasn’t been noticed yet, but somehow we manage to make progress.
slide 24 The obstacle to testing isn’t actually whether or not it’s useful, but whether or not it’s easy to do. If it isn’t, people will always find excuses to do something else.
slide 25 It’s therefore important to make things as painless as possible. In particular, it has to be easy for people to:
slide 26 add or change tests
slide 27 understand the tests that have already been written
slide 28 run those tests
slide 29 and understand those tests’ results.
slide 30 And test results must be reliable.
slide 31 If a testing tool says that code is working when it’s not, or reports problems when there actually aren’t any, people will lose faith in it and stop using it.
slide 32 Let’s start with the simplest kind of testing. A unit test is a test that exercises one component, or unit, in a program.
slide 33 Every unit test has five parts. The first is the fixture
slide 34 …which is the thing the test is run on, such as the inputs to a function, or some data files to be processed.
slide 35 The second part is the action
slide 36 …which is what we do to the fixture. Ideally, this just involves calling a function, but some tests may involve more.
slide 37 The third part of every unit test is its expected result
slide 38 …which is what we expect the piece of code we’re testing to do or return. If we don’t know the expected result, we can’t tell whether the test passed or failed. As we’ll see in a couple of episodes, defining fixtures and expected results can be a good way to design software.
slide 39 The first three parts of the unit test are used over and over again. The fourth part is the actual result
slide 40 …which is what happens when we run the test on a particular day, with a particular version of our software.
slide 41 The fifth and final part of our test is a report..
slide 42 …which tells us whether the test passed, or whether there’s a failure of some kind that needs human attention. As with the actual result, this could be different each time we run the test.
slide 43 So much for terminology: what does this all look like in practice? Suppose we’re testing a function called dna_starts_with.
slide 44 It returns True if its second argument is a prefix of the first, i.e., if one sequence starts with another.
slide 45 And it returns False otherwise.
slide 46 For example, 'actggt' does start with 'act'
slide 47 …but not with 'ctg'.
slide 48 We’ll build a simple set of tests for this function from scratch to introduce some key ideas.
slide 49 Then later introduce a Python library that can handle the things that are done the same way each time.
slide 50 Let’s start by testing our code directly using assert. Here, we call the function four times with different arguments, checking that the right value is returned each time.
slide 51 This is much better than nothing, but it has several shortcomings.
slide 52 First, there’s a lot of repeated code: only a fraction of what’s on each line is unique and interesting.
slide 53 That repetition makes it easy to overlook things, like the not used to check that the last test returns False instead of True.
slide 54 This code also only tests up to the first failure. If any of the tests doesn’t produce the expected result, the assert statement will halt the program. It would be more helpful if we could get data from all of our tests every time they’re run, since the more information we have, the faster we’re likely to be able to track down bugs.
slide 55 Here’s a different approach. It requires a bit more typing to set up, but after that, makes testing a lot easier. First, let’s put the inputs and output of each test in a table. The first two entries in each row are the argumets to our function, and the third is what the function should return for those arguments.
slide 56 Right away, this is easy to read than line after line of assert and function calls.
slide 57 It’s also easy to add new tests: just insert a line with the right values.
slide 58 Of course, those tests won’t run themselves, so here’s five lines of Python (plus a comment) to do that. This code simply loops over the entries in the table, calling the function with the arguments provided, and counting how many times the function returned the right result. When the loop finishes, this code prints out a summary to tell us how many of our tests passed.
slide 59 This is better than the previous pile of assert statements and function calls because no runnable code has to be copied to add a new test. That makes the pattern in our tests clearer, and reduces the chances of us introducing a bug by copying and pasting incorrectly. This code also runs all of our tests every time, so we always get a complete picture of how well we did.
slide 60 However, if any of our tests fail, this code won’t tell us which ones. If we had a hundred tests, and two were failing, figuring out which two would take some time.
slide 61 This slightly modified version of our code solves that problem.
slide 62 The built-in function enumerate takes a list (or any other sequence) as an argument, and produces one pair for each entry in that list. The first half of each pair is an element index, and the second is the element itself.
slide 63 In our case, the elements of Tests have three parts. We can extract the index and those three parts in a single step as shown here. The first time through the loop, i will be assigned 0, while seq, prefix, and expected will be assigned the three parts of our test. The next time through the loop, i will be assigned 1, and so on.
slide 64 The two lines that call dna_starts_with, check its result, and increment the counter of successful tests are exactly the same as before.
slide 65 So is the line after the loop that summarizes how many tests passed.
slide 66 But these two lines are new. If a test fails, we immediately print out its index to make it easy to find in the Tests table.
slide 67 This pattern—creating fixtures, acting on them, and collecting and reporting results—is the heart of almost all testing tools.
slide 68 Many good libraries have been written in many languages to help programmers write tests that follow this pattern.
slide 69 We’ll look at one such library for Python in a couple of episodes.
slide 70 First though, we’ll have a look at how you should go about handling errors in your programs.
slide 70

  1. July 27th, 2010 at 05:21 | #1

    Quite a nice intro, especially calling out the fixture to avoid boilerplate.

  2. October 6th, 2010 at 01:09 | #2

    This is a great introduction. The “Testing with a Table” part is particularly insightful. This might be jumping ahead a couple of lectures, one can get similar behavior to the testing table in nose with test generators (http://somethingaboutorange.com/mrl/projects/nose/0.11.2/writing_tests.html#test-generators). The makes a whole series of tests using only a couple of functions and yield.

  3. Greg Wilson
    October 6th, 2010 at 01:17 | #3

    @Anthony Scopatz
    Generators do make table-like tests easier, but only if your language provides generators, and you understand how they work. We’re trying to stick to ideas that will work in most modern languages, and even in Python, generators are a second-tier concept. We hope to include a lecture later on about advanced ideas (closures, generators, Python’s with statement, etc.) — if you’d like to help write it, we’d welcome assistance.

  4. tissit
    October 10th, 2011 at 12:03 | #4

    “10^6 million tests / sec” vs “At a million tests per second”

  1. No trackbacks yet.