Up: Sets and Dictionaries

Nanotech Example

slide 01 Hello, and welcome to the final episode of the Software Carpentry lecture on sets and dictionaries. In this episode, we’ll build a solution to our original nanotechnology problem.
slide 02 If you recall, our goal is to find out how many molecules of various kinds we could make using the atoms in our warehouse.
slide 03 The formulas for the molecules we know how to make are stored in a file that’s formatted like this.
slide 04 And our inventory is stored in a file that’s formatted like this.
slide 05 Using what we’ve learned in the previous few episodes, we can now write a simple, efficient program to solve our problem.
slide 06 Let’s start with our inventory. Our input consists of pairs of strings and numbers, which naturally suggests using a dictionary for storage.
slide 07 The keys will be atomic symbols…
slide 08 …and the values will be the number of atoms of that kind we currently have in stock.
slide 09 The result will look like this.
slide 10 We can use this same structure to represent individual molecules…
slide 11 …such as water.
slide 12 So let’s store our formulas as a dictionary of dictionaries.
slide 13 The keys in the formulas dictionary will be the name of molecules…
slide 14 …and the values will be dictionaries storing the formulas for particular molecules.
slide 15 Here, for example, we have an outer dictionary that maps the word ‘water’ to a dictionary storing the number of oxygen and hydrogen atoms in a single water molecule, and the word ‘ammonia’ to a dictionary storing the number of nitrogen and hydrogen atoms in a single ammonia molecule.
slide 16 The number of molecules of any particular type we can make is limited by the scarcest atom that molecule requires. In mathematical terms, we want the minimum over the atoms used in the molecule of the ratio of how many of that atom we have to how many of that atom we need.
slide 17 As a special case, if the atom isn’t explicitly listed in our inventory, its count is implicitly zero.
slide 18 We’ll store the results of our calculation in yet another dictionary, this one mapping…
slide 19 … the names of molecules…
slide 20 …to how many of that molecule we can make.
slide 21 The main body of the program is simple: read in the input files, do our calculation, and print the result.
slide 22 Reading the inventory file is simple: take each interesting line in the file, split it to get an atomic symbol and a count, and store them together in a dictionary.
slide 23 For clarity’s sake, we’ll use this helper function to read a file and strip out blank lines and comments.
slide 24 Using that same function, reading in a file of molecular formulas is only slightly more complex.
slide 25 We create the dictionary we’re going to store the results in.
slide 26 And then for each line in the file that has some data…
slide 27 …we split on the colon ‘:’ to separate the molecule’s name (which may contain spaces) from its formula.
slide 28 We then split the formulas into a list of strings, which alternate between atomic symbols and counts…
slide 29 …and loop over those values, moving forward two elements at a time…
slide 30 …storing the atomic symbol and count in a dictionary.
slide 31 Once we’re done, we store that dictionary as the value for the molecule name in the main dictionary.
slide 32 Now it’s time to figure out how many molecules of each kind we can make.
slide 33 inventory maps atomic symbols to counts, and so does formulas[name]. Let’s loop over all the molecules in our formulas and “divide” those two dictionaries.
slide 34 It might seem overkill to write yet another helper function in this case, but experience shows that if you have a choice between big functions in which nothing is obviously wrong…
slide 35 …or little functions in which obviously nothing is wrong, you should choose the latter.
slide 36 Here’s the function that “divides” one dictionary by another. We loop over all the atoms in the molecule we’re trying to build, see what limits the available inventory puts on us, and return the minimum of all those results. This function uses a few patterns that come up frequently in many kinds of programs, so let’s have a closer look.
slide 37 The input arguments have identical formats: each one maps atomic symbols to counts.
slide 38 The first pattern is to initialize the value we’re going to return to None, then test for that value inside the loop to make sure we re-set it to a legal value the first time we have real data. In this case, we could just as easily use -1 or some other “impossible” value as an “uninitialized” flag for number.
slide 39 Since we’re looping over the keys of molecule, we know that we can get the value stored in molecule[atom]. However, that atom might not be a key in inventory, so we use inventory.get(atom, 0) to get either the stored value, or a sensible default—in this case zero, because if the atom’s symbol isn’t in the dictionary, we don’t have any of it. This is our second pattern.
slide 40 The third is using calculate, test, and store to find a single value—in this case, the minimum—from a set of calculated values. We could calculate the list of available over required values, then find the minimum of the list, but doing the minimum test as we go along saves us having to store the list of intermediate values. It’s probably not a noticeable time saving in this case, but it would be with larger data sets.
slide 41 Finally, we need to show how many molecules of each kind we can make. We could just loop over our result dictionary, printing each molecule’s name and the number of times we could make it, but to keep things interesting, let’s print the results by descending count, with the molecules we can make most of at the top. First, we invert the dictionary using a helper function (which we’ll write in a second). Then, for each molecule name associated with each count, we print out the count and name.
slide 42 Here, we get the keys from the inverted dictionary—i.e., the counts of how many molecules we can make—and sort them. Notice that we pass in reverse=True to sort in descending order, i.e., with the greatest values first. We’ve also sorted the molecule names, just to be tidy.
slide 43 And here’s our function to invert a dictionary. For each key/value pair in the source dictionary, we use the value as the key in the new dictionary, and the original key as the value.
slide 44 Here, we’re relying on another common pattern: when the values in a dictionary are some kind of collection, such as a set or list, we store an empty collection of that with the key if the key isn’t already present, so that we can add our new value whether the key was previously present or not.
slide 45 And here’s one afterthought: let’s go back to our show_counts function and only print out molecules that we can actually make, i.e., ones whose counts are greater than zero. If we have a lot of possible molecules in our database, but not much inventory, this will save us a lot of less-than-useful output.
slide 46 Time to test. Let’s try an empty inventory against a single formula. There’s no output, which is what we expect.
slide 47 Add one atom of helium to our inventory: that seems right.
slide 48 Add some hydrogen, but don’t give the program any formulas that use hydrogen: still correct.
slide 49 Add the formula for water, which does use hydrogen, but don’t provide any oxygen: no water in the output, but helium is still appearing as it should.
slide 50 Add the formula for molecular hydrogen, and sure enough, we can make a couple of those.
slide 51 Put some oxygen in the warehouse, and sure enough, we can make a couple of water molecules.
slide 52 There are quite a few other interesting tests still to run, but things are looking good so far.
slide 53 And our code is a lot simpler than it would be if we used lists of pairs of atom names and counts, or some other data structure.
slide 54 This is the end of our lecture on sets and dictionaries. We hope you’ll take a few minutes to try a few of the exercises.

  1. No comments yet.