Up: Program Design

Testing

slide 01 Hello, and welcome to the next episode of the Software Carpentry lecture on program design using invasion percolation as an example. In this episode, we’ll take a closer look at how we test our invasion percolation program.
slide 02 If you recall, in an earlier episode we found one bug…
slide 03 …which makes us wonder, how many others are still lurking in our code?
slide 04 More generally, how do we validate and verify a program like this?
slide 05 Verification is the question, “Is our program free of bugs?” I.e., have we built it the right way?
slide 06 Validation is the question, “Have we built the right program?” I.e., are we using a good model?
slide 07 The second is a question for the scientists…
slide 08 …so we’ll concentrate on the first.
slide 09 This is the first test case that we want to try. The grid shown on the left has 2′s everywhere…
slide 10 …except for three 1′s that run in a straight line from the middle directly to the edge.
slide 11 It should fill in as shown here.
slide 12 And if it doesn’t, it should be pretty easy for us to figure out what’s gone wrong.
slide 13 We restructured our program as shown here in order to make it easier to construct and run test cases.
slide 14 If we take a closer look at the main body of the program, the first command-line argument specifies the scenario. If the name of the scenario is “random”, then we get parameters, create a grid, fill it with random values, fill that grid to the edge, and report. Otherwise, we just say that we don’t know what the scenario is.
slide 15 Let’s add another clause, so that if the scenario is “5×5 line”, we will create a 5×5 grid, fill a line of cells from the center to the edge with lower values, and then check that fill_grid does the right thing.
slide 16 Let’s expand that English description into a few lines of code. We want to create a 5×5 grid, initialize it with the values shown earlier (i.e., 2′s everywhere except for 1′s from the center to the edge), call the fill_grid function that we’re testing, and then check that we get the right result.
slide 17 The grid creation and fill_grid functions already exist—they’re part of our regular program.
slide 18 So we need write functions to initialize the 5×5 grid with the values that we need to test, and then check after filling that it has been filled correctly.
slide 19 We’re going to have to write a similar pair of functions for each of our tests.
slide 20 We’ll write the first pair, and then use that experience to guide us when we refactor to make it easier to add more tests later.
slide 21 Here’s the function that initializes a grid of N×N cells with a line running from the center to the edge.
slide 22 It’s just as easy to write this function for the N×N case as for the 5×5 case, so we generalize early.
slide 23 This part of the function is easy to understand. We find the value of N by looking at the grid, and then fill all of the cells with the integer 2.
slide 24 This part, that fills the cells from the center to the edge in a straight line with the lower value 1, isn’t as easy to understand. It’s not immediately obvious that i should go in the range from 0 to N/2+1, and it’s not immediately obvious either that the X coordinate should be N/2 and the Y coordinate should be i for the cells that we want to fill.
slide 25 When we say “it’s not obvious,” what we mean is, “There’s the possibility that it will contain bugs.” If there are bugs in our test cases, then we’re just making more work for ourselves.
slide 26 We’ll refactor this code later so that it’s easier for us to see that it’s doing the right thing.
slide 27 Here’s the code that checks that an N×N grid with a line of cells from the center to the edge has been filled correctly.
slide 28 Again, it’s as easy to check for the N×N case as the 5×5 case, so we’ve generalized the function.
slide 29 But take a look at this condition. Are we sure that the only cells that should be filled are the ones with X coordinate equal to N/2 and Y coordinate from 0 to N/2? Shouldn’t that be N/2+1? Or maybe it’s 1 to N/2.
slide 30 Or maybe the X coordinate should be N/2+1.
slide 31 In fact, these two functions are correct…
slide 32 …and when they’re run, they report that fill_grid behaves properly.
slide 33 But writing and checking two functions like this for each test won’t actually increase our confidence in our program…
slide 34 …because the tests themselves might contain bugs.
slide 35 We need a simpler way to create and check tests, so that our testing is actually helping us create a correct program rather than giving us more things to worry about. How do we do that?
slide 36 Well, let’s go back to our example. The grid on the left should fill in as shown on the right.
slide 37 Why don’t we just draw our test cases exactly as shown? The reason is that modern programming languages, including Python, don’t actually let you draw things. But we can get close with a little bit of work.
slide 38 Here are the values that we want to put in our test grid: 2′s everywhere, except for 1′s from the center to the edge. We’ve represented it as a multiline string, which is easy to read…
slide 39 …and also easy to write…
slide 40 …which means it’ll be easy for us to create lots of other test cases. We won’t have to write code: we can just write strings.
slide 41 The word “fixture” is the technical term for “the thing that the test is run on”. It’s the thing you set up in order to check whether a piece of code is working. We’ll see this term a lot more in future lectures.
slide 42 Here’s the result that we expect when we fill in this grid. Again, it’s a multiline string, so it’s easy to write, and easy to read.
slide 43 The ‘*’ character means “this cell should be filled”.
slide 44 The ‘.’ character means “this cell should hold whatever value it had at the start”, i.e., it shouldn’t have changed.
slide 45 Here’s how we would actually use these two strings in test code.
slide 46 First, we’re going to put the strings holding our fixtures and the expected results in a list of pairs. We can then loop over this list to check each fixture and result in turn. Again, this makes it very easy to add more tests: we just define two multiline strings, and then add one more pair to this list called TESTS.
slide 47 We write a function called run_tests, and as the doc string says, it runs all of our tests at once.
slide 48 Inside the loop, we get fixture and result
slide 49 …we use the values in fixture to initialize a grid by breaking that multiline string into pieces and converting those pieces into integers.
slide 50 We then call the fill_grid function that we actually want to test…
slide 51 …and then we take the actual result, which is in grid, the number of cells that were filled, the initial fixture, and the expected result, and we pass it into a function that checks to make sure everything is right. We only have to write create_fixture_grid and check_result_grid once.
slide 52 Doing that is left as an exercise for the viewer.
slide 53 Describing the fixtures and the results as strings is easy, but writing those two new functions might seem like a lot of work. The question is, when you say it’s a lot of work…
slide 54 …what are you comparing it to?
slide 55 Are you comparing it to the time it would take to inspect printouts of real grids, or step through the program over and over again in the debugger?
slide 56 And did you think to include the time it would take to re-do this after every change to your program?
slide 57 Or are you comparing it to the time it would take to retract a published paper after you find a bug in your code? Because that’s what we’re trying to prevent.
slide 58 In real applications, it’s not unusual for test code to be anywhere from 20% to 200% of the size of the actual application code.
slide 59 And yes, 200% does mean more test code than application code.
slide 60 But that’s no different from physical experiments. If you look at the size and cost of the machines used to create a space probe, it’s many times greater than the size and cost of the space probe itself.
slide 61 The good news is that there are frameworks to help you do this…
slide 62 …and we will look at those in future lectures.
slide 63 The other good news is, once your tests have been written, changing the program itself becomes much easier. In particular, we’re now in a position to replace our fill_grid function with one that is harder to get right, but which will run many times faster. If our tests have been designed well, they shouldn’t have to be rewritten because they’ll all continue to work the same way. This is a common pattern in scientific programming. You create a simple version first, check it, and then replace the parts one by one with more sophisticated parts that are harder to check, but give you better performance.
slide 64 We’ll take a look at how to do this in the next episode.
slide 65

  1. No comments yet.
  1. No trackbacks yet.