Up: Testing

Unit Testing

slide 01 Hello, and welcome to the third episode of the Software Carpentry lecture on testing. In this episode, we’ll look at how to do unit testing a bit more systematically.
slide 02 Our research group is studying the impact of climate change on agriculture.
slide 03 We have several thousand aerial photographs of farms taken in the early 1980s.
slide 04 We want to compare those with photographs of the same fields taken since 2007 to see what has changed.
slide 05 The first step is to find regions where fields overlap.
slide 06 Luckily, the area we’re looking at is in Saskatchewan…
slide 07 …where fields actually are rectangular.
slide 08 A student intern has written a function that finds the overlap between two rectangles.
slide 09 Having used student code before, we want to test it before putting it into production.
slide 10 We also think we might have to make the function faster, to handle larger data sets…
slide 11 …so we want to have tests in place so that our optimizations don’t break anything.
slide 12 We’re going to use a Python library called Nose to organize our tests.
slide 13 In Nose, each test is a function…
slide 14 …whose name begins with the letters test_.
slide 15 We can group tests together in files…
slide 16 …whose names also begin with the letters test_.
slide 17 To execute our tests, we run the command nosetests.
slide 18 This automatically searches the current directory and its sub-directories for test files, and runs the tests they contain.
slide 19 To see how this works, let’s look at how we’d use it to test the dna_starts_with function from the previous episode.
slide 20 Here are our first three tests.
slide 21 To help us understand our own tests, we give each function a meaningful name—something better than “test 1″ or “test 2″.
slide 22 Each test function uses assert to check the result of a single call to dna_starts_with.
slide 23 Test functions can create and use local variables, just like other functions. It’s particularly helpful to put temporary values, or values that are used in several places, in variables to avoid typing mistakes.
slide 24 Of course, the Nose library can’t think of test cases for us. We still have to decide what to test, and how many tests to run.
slide 25 This brings up an important point. We know that we should test lots of different cases…
slide 26 …but how many is “lots”?
slide 27 It turns out that’s not actually the right question to ask.
slide 28 A better question is, how can we choose tests that are worth writing and running?
slide 29 For example, if dna_starts_with('atc', 'a') works, there’s probably not much point testing dna_starts_with('ttc', 't'): it’s hard to think of a bug that would show up in one case, but not in the other.
slide 30 We should therefore try to choose tests that are as different from each other as possible, so that we force the code we’re testing to execute in all the different ways it can.
slide 31 Another way of thinking about this is that we should try to find boundary cases. After all, if a function works for zero, one, and a million values, it will probably work for eighteen values.
slide 32 Let’s apply this idea to our overlapping rectangles problem.
slide 33 Here’s a “normal” case: two rectangles that overlap by half in each direction.
slide 34 What other tests would be useful?
slide 35 Take a moment and see what other tests you can think of.
slide 36 Welcome back. Here’s our first test case: two rectangles that overlap by half in each direction..
slide 37 Here’s our second: the rectangle on the left extends above and below the one on the right, so none of the corners of the left rectangle are involved.
slide 38 Here’s a third case: the two rectangles are exactly the same width, but have different vertical extents. This will tell us whether the overlap function behaves correctly when rectangles intersect along entire lines, rather than just crossing at points.
slide 39 And here’s a fourth case: the second rectangle is contained entirely within the first, so their edges don’t actually cross at all.
slide 40 But what do we expect in this case? How should the function behave if the two rectangles share an edge, but their areas don’t overlap?
slide 41 And what if they only share a corner, like this? Should the function we’re testing tell us that these rectangles don’t overlap? Should it return a point, rather than a rectangle? Or should it return a rectangle with zero area?
slide 42 Thinking about tests in terms of boundary cases helps us find examples like this, where it isn’t immediately obvious what the “right” answer is. Writing those tests forces us to define how the function we’re testing is supposed to behave—i.e., what correct behavior actually is.
slide 43 Let’s turn all of this into working code.
slide 44 Here’s a test for the case where rectangles only touch at a corner.
slide 45 As you can see, we’ve decided that this doesn’t count as overlap. Our test is an unambiguous, runnable answer to our question about how the function is supposed to behave.
slide 46 Here’s our second test: two rectangles that have exactly the same extent, so their overlap is the same again.
slide 47 This wasn’t actually in the set of test cases we came up with earlier, but it’s still a good test.
slide 48 And here’s a third test, where one rectangle is skinnier than another.
slide 49 This test case actually turned up a bug in the first version of the overlap function that we wrote.
slide 50 Here’s the function. It takes the coordinates of each rectangle as input, unpacks them to get the high and low X and Y coordinates of each rectangle, checks to make sure that the rectangles actually overlap, then calculates the coordinates of the overlap and returns the result as a new rectangle.
slide 51 Take a few moments and see if you can spot the bug.
slide 52 It’s here—we’re comparing the low Y coordinate of one rectangle with the high X coordinate of the other. This bug is probably the result of copying and pasting.
slide 53 Stepping back, the most important lesson in this episode isn’t the details of the Nose library. It’s that your time is more valuable than the computer’s, so you should spend it doing the things the computer can’t, like thinking of interesting test cases and what your code is actually supposed to do.
slide 54 Nose and other libraries like it are there to handle all the things that you shouldn’t have to re-think each time.
slide 55 They will also help guide you toward good practices, to make your testing and programming more productive.
slide 56 In the next episode, we’ll look at some of the other ways Nose can help you.

  1. Jochen
    August 20th, 2010 at 09:31 | #1

    Hi!

    Nice lecture. Looking at it just with the screenshots given, it is hard to spot the bug as one cannot read the code. If it doesn’t make too much work it would be nice to have the images as links to bigger versions.

    Thanks!
    Jochen

  2. November 25th, 2010 at 20:49 | #2

    I guess on slide 20 in the “Simple example: testing dna_starts_with”, the 3rd function should also starts with test_ (ie test_does_not_start_with_single_base_pair)

    @Jochen, in case you have not seen, the pdf and ppt are also available. You can read the code fine on those I guess.
    http://software-carpentry.org/4_0/test

  3. Terri Yu
    February 6th, 2011 at 22:13 | #3

    The picture of the overlapping rectangles doesn’t match the coordinates given: red = ((0,3),(2,5)) and blue = ((1,0),(2,4)) Also, the colors of the rectangles are blue and green, not blue and red. I found this very confusing.

  4. Terri Yu
    February 6th, 2011 at 22:17 | #4

    The slides don’t explain what the assert function does. I kind of inferred from usage that assert checks for true and false, but that was only a guess. I think it would be helpful to stop and talk about what values the assert function takes and what it returns.

  5. klahnb
    February 10th, 2011 at 03:56 | #5

    Jochen :
    . . . If it doesn’t make too much work it would be nice to have the images as links to bigger versions . . .

    For those who may not know, you can enlarge text and some images in a browser (at least Chrome, Firefox, and IE) by ‘Ctrl +’. It looks like there is enough resolution in these images to see them better (without pixelation) by stepping up the size at bit. Step down the size with Ctrl -’ and’ return straight back to the normal size with ‘Ctrl 0′. The pdf versions only show the presentation slides. Some the above have comments not present in the slide and it is nice to not have to coordinate viewing both this page and the pdf.

  1. No trackbacks yet.