Up: Program Design

Assembly

slide 01 Hello, and welcome to the seventh episode of the Software Carpentry lecture on program design using invasion percolation as an example. In this episode, we’ll see how to assemble the bits and pieces we’ve already developed to create a working program.
slide 02 As you recall, we now know how to:
slide 03 create a grid…
slide 04 fill it with random numbers…
slide 05 mark cells that have been filled…
slide 06 find cells that might be filled next…
slide 07 …and choose one of them at random.
slide 08 It’s time to put everything together.
slide 09 We’re going to show you this code in exactly the order we would write it.
slide 10 We start at the top and work down…
slide 11 …introducing functions and variables as we need them…
slide 12 …and tidy up a little bit along the way. In the next-but-one episode of this lecture, we’ll do a bit more tidying up as well.
slide 13 Here’s the first code that we write. We start with a documentation string to remind ourselves of what this program does, we import the libraries we need, and then we create the main driver.
slide 14 Note that we’re importing the entire random number generation library instead of just the functions that we need. This will make things easier to understand in future when we come back to revise the program.
slide 15 The first thing we need to do in our main driver is get parameters from the command line. We get all but the first element of sys.argv and store it in the variable arguments. If you recall, sys.argv[0] is the name of the program, and we don’t need that. We then convert the first three values in arguments to integers, and assign them to grid_size, value_range, and random_seed. If we get an IndexError, it means that one of the indices 0, 1, or 2 wasn’t valid, so we don’t have enough arguments. If we get a ValueError, it means that one of our attempts to convert a string to an integer failed, so again we print an error message.
slide 16 Notice that we’ve used a function called fail to report errors. We haven’t written this yet, so now’s a good time to go back and do that.
slide 17 Here it is. We give it a docstring, because every function should have one; we print the error message to standard error, so that it will appear on the user’s console, and then we exit from the program.
slide 18 The box in the lower right shows the structure of the program so far. We’ve got a documentation string, our fail function, and then the main driver of our program.
slide 19 The next step is to run the simulation. We do that by seeding the random number generator, creating a random grid, marking the center cell as filled, and then filling the rest of the grid.
slide 20 In this code, we’ve used three functions that don’t exist yet, so we’re going to have to go back and write them. Before we do that, though, let’s finish off the main body of the program.
slide 21 The last task we have is to report results.
slide 22 But we haven’t actually decided what to do about this. Nothing in our specification told us whether we were supposed to draw the fractal, calculate some statistics, or do something else entirely.
slide 23 For now, we’ll just count the number of cells we’ve filled in, and print that.
slide 24 Here’s the code. We will change fill_grid so that it returns the number of cells it filled in, and then we’ll print that number out…
slide 25 …except we’ll add one to the value returned by fill_grid because we marked the center cell as being filled manually. This is a little bit clumsy: someone who hasn’t read our code carefully might think—reasonably—that fill_grid returns the total number of cells that are filled, not one less than that. We should go back and tidy that up later.
slide 26 Here’s the code to create a random grid. It checks that the parameters it’s been passed make sense, and then it builds a list of lists of random values.
slide 27 A little documentation would help here—let’s go back and add a docstring.
slide 28 create_random_grid returns an N×N grid of random values in 1 to Z. It assumes that the random number generator has already been seeded, i.e., it is not going to seed the random number generator itself.
slide 29 The box in the lower right shows where we put this function in our program file.
slide 30 The next step is the function mark_filled, which marks a grid cell as being filed. The docstring says that, and then we use assertions to test that the X and Y coordintaes we’ve been given are actually in bounds. You might think we don’t need this code, because if the X or Y coordinate is out of bounds, Python will fail and print its own error message, but there are three reasons to put these assertions in. The first is documentation: this says (better than a comment) what we expect of X and Y. The second is, these error messages are more meaningful that Python’s generic “IndexError: index out of range” message. The third reason to put these assertions in is that some values of X and Y that we think of as illegal will actually work as indices in the grid. If X or Y is negative, then they will count down from the end of the grid, rather than up from the beginning. Python is quite happy to do this, but it almost certainly indicates an error somewhere in our program. The last line in this function assigns -1 to grid[x][y].
slide 31 We’re using -1 to indicate filled cells, but we don’t know if people are going to remember that when they’re reading our code. If you say “grid at X, Y assigned -1″, it’s not immediately clear what you’re doing. So let’s make a small change right now.
slide 32 Near the top of our program we’ll create a variable called FILLED, and give it the value -1, so that in our function, we can say “grid at X, Y is assigned FILLED”. FILLED is written in capital letters because we think of it as a constant, and by convention, constants are normally written in all caps.
slide 33 Once again, the box in the lower right shows us the structure of our file. The constant goes near the top of the program, because that’s where people will expect to find it, and our mark_filled function goes in the middle.
slide 34 The next function in our to-do list is fill_grid. The docstring says that it fills an N×N grid until the filled region hits the boundary. It assumes that the center cell has been filled before the function is called.
slide 35 We begin by setting up the variables N and num_filled. N is the grid size, and num_filled is the number of cells that this function has filled so far. We’ve squeezed these two variables onto one line so that our function will fit on a slide. In a real program, we’d put each one on a separate line.
slide 36 We then go into what looks like an infinite loop. The loop guard is “while TRUE”, and since True is always true, this will never exit. However, in Python and most other languages, anything that looks like a deliberate infinite loop almost always signals a loop that exits in the middle.
slide 37 And in fact that’s the case. Down near the bottom of this loop we test a condition and use break to break out of that loop if this condition is true. This is a very common idiom.
slide 38 Inside the loop, we call another function, find_candidates, to find the set of candidates for filling. This function hasn’t been written yet, so we add it to our to-do list.
slide 39 We then check that the set of candidates is non-empty, because if we haven’t found any candidates for filling, something has probably gone wrong with our program.
slide 40 And then, as discussed in a previous episode, we convert the set of candidates to a list, make a random choice to find the (x,y) coordinates of the cell we’re going to fill, and then we mark that cell as filled, and increment our count of the number of cells we’ve filled so far.
slide 41 We then check to see whether either the X or the Y coordinate is on the boundaries of the grid. If it is, we break out of our loop…
slide 42 …and return the number of cells that we’ve filled.
slide 43 Again, the box in the lower right shows where this function fits in the file.
slide 44 One more function. find_candidates finds low-valued neighbor cells and returns them as a set. N is assigned the size of the grid, min_val and min_set are initialized as discussed in an earlier episode, and then we loop over the X and Y coordinates, checking to see if the neighbors of the cell we’re looking at are already filled.
slide 45 We’re going to stop right there.
slide 46 This code is already hard to read.
slide 47 In fact, it contains a bug.
slide 48 That y+1 should be y-1, but you probably didn’t notice that because there was too much code to read at once.
slide 49 A good rule of thumb is, “Listen to your code as you write it.” If the code is difficult to understand when read aloud, then it’s probably going to be difficult to understand when you’re debugging, so you should try to simplify it.
slide 50 This version of find_candidates introduces a helper function called is_candidate.
slide 51 This is much clearer when read aloud.
slide 52 If the cell at (x,y) is a candidate, then we check to see whether it ties with the current lowest value or is a new lowest value. If it ties with the lowest value, we add its coordinates to the set. If it’s a new lowest value, then we re-initialize min_val and min_set as discussed in an earlier episode.
slide 53 As you can see, the find_candidates function fits right above fill_grid in our file.
slide 54 We can then insert the is_candidate function, which has the tests we discussed in the previous episode, right above find_candidates.
slide 55 It’s finally time to run our program. But actually, it isn’t.
slide 56 It’s finally time to test our program.
slide 57 Because there’s a bug lurking in the code that we just put together.
slide 58 Take a moment now, read over the code, and try to find the bug before moving on to the next episode.
slide 59

  1. No comments yet.
  1. No trackbacks yet.