 |
Welcome to the second episode of the Software Carpentry lecture on program design. In this episode, we’ll take a closer look at how we implement a 2D grid in Python. |
 |
If you recall from the previous episode, we need a grid of two dimensions, filled with random values… |
 |
…and we need a way to mark the cells in that grid as being filled with pollutant. |
 |
Once a cell has been filled, we don’t care about its value any longer… |
 |
…so we can use any integer that isn’t going to appear in the grid otherwise as a marker to show which cells have been filled. In this case, we’re going to use -1. |
 |
Note that this means we’re using integers in two ways. |
 |
The first is as actual data values. |
 |
The second is as flags to represent the state of a cell. |
 |
This is simple to do… |
 |
…but if we ever get data who values happen to contain the numbers that we’re using to mark filled cells, our program will misinterpret them. |
 |
Bugs like this can be very hard to track down. |
 |
Before we go any further, we also have to make some decisions about the shapes of our grids. First, do grids always have to be square, i.e., N×N, or can we have rectangular grids like the one shown? |
 |
Second, do grids always have to be odd-sized, so that there’s a unique center square, or can we have a grid that is even in size along one or both axes? |
 |
The real question is, how general should we make the first version of this program—indeed, of any program? |
 |
Some people believe, “Don’t build it to you need it,” i.e., and then worry about tomorrow when it comes. |
 |
Other people believe that, “A week of hard work can sometimes save you an hour of thought,” i.e., a little bit of forward planning—particularly planning for growth and change—can save a lot of re-work and un-work later. |
 |
Like many generalizations, these rules are: |
 |
True, but: |
 |
Not particularly useful. |
 |
As in any other intellectually demanding field, knowing what rules to apply when comes with experience… |
 |
…and the only way to get experience is to work through many examples. |
 |
Now, Python doesn’t actually have a built-in type for representing two-dimensional arrays. |
 |
But it does have one-dimensional lists… |
 |
…that can refer to other lists. |
 |
So we can build a 2D grid as a list of lists. This gives us double subscripts to refer to elements… |
 |
…which is really what we mean by “two dimensional”. |
 |
Here’s a piece of code that builds a grid of 1′s. (We’ll come back later and show how to fill those cells with random values instead.) |
 |
The first thing we do is check that N, the grid size, is a sensible value. We use assertion statements to do this, so that if anybody ever gives us a strange value for N, our program will fail and print an error message. This can save us a lot of debugging later. |
 |
We then assign an empty list to the variable grid. |
 |
The first time through the outer loop, we insert an empty list into the outer list. |
 |
Each pass through the inner loop, we append the value 1… |
 |
…to that inner list. We’ll then go back through the outer loop, append another sub-list, and so on until we get the grid that we wanted. |
 |
|
Greg, I hate to say it, but this is lengthy, slow, and teaches the terrible habit of looping over data. Do this instead:
import numpy as np
shape = (45, 35)
grid = np.ones(shape, dtype=int)
or if you want a random grid,
import numpy as np
import numpy.random as npr
shape = (45, 35)
lo = 0
hi = 10
grid = np.int64(npr.uniform(lo, hi, shape))
Any scientist using Python is going to use numpy anyway, so don’t avoid it, teach it. It’s way more important for the average FORTRAN-hugging scientist to learn to do implicit array math than it is for them to use dictionaries — which they should of course also learn to use. Also, operations on numpy arrays execute hundreds of times faster than operations that loop over lists of lists.
–jh–
Joe,
I hate to say it, but you have missed the fact that this lesson is part of a greater whole which attempts to teach a deep truth, which is far greater and more important than numpy.
Pointing out the merits of some particular package (even if it *does* happen to be the most relevant one for scientists using the language which Greg has chosen to demonstrate the point) is completely irrelevant (and distracting) to the task of teaching a deep point about program development. The argument that Greg develops (throughout the *whole* series of episodes) presents a deep truth about program design, which will remain true long after numpy and Python have been forgotten (which I don’t expect to be any time soon).
It is way more important for the average FORTRAN-hugging scientist to learn deep truths about programming, that it is for them to learn the existence of any single package.
As for slow, my experience of teaching scientists to program suggests that, if anything, Greg is going a bit too quickly
-jg-
In addition to jg’s comments, I have just “learned” Python by going through the previous lectures on Python in this series (I have a background in C++ and I do a fair amount of Matlab, but first time with Python). I completely understand Greg’s example as it is obviously readable.
It is also how I, a new Python programmer, would start this problem.
That seems like a more natural way to discuss program design that, “[teach the] prerequisite skills needed to build, maintain, share, and use software efficiently.” (http://software-carpentry.org/about/three-minute-pitch/)
Great lecture.
How did you do a graphical representation of grid in slides?
I used a table with cell borders turned on.
@Greg Wilson
Oh… now I feel stupid.
Anyway, thank you for great lectures, I hope you and your colleagues will continue to make this wonderful contribution.