Up: Sets and Dictionaries

Examples

slide 01 Hello, and welcome to the fourth episode of the Software Carpentry lecture on sets and dictionaries. In this episode, we’ll show you a few examples of how you can use dictionaries in your programs.
slide 02 Let’s start with the question, “How early in the day did we see each kind of bird?”
slide 03 Our data looks like this: the date and time of the observation, the bird’s name, and an optional comment.
slide 04 We want the minimum of all the times associated with each bird name.
slide 05 We’ll use a dictionary with the bird name as the key…
slide 06 …and the earliest observation time as the value.
slide 07 First, though, let’s read our data file and create a list of tuples, where each tuple has the date and time as strings, and the bird name.
slide 08 Our function follows the pattern we’ve seen many times before. We set up by opening the input file and creating an empty list that we’ll append records to.
slide 09 We then process each line of the file in turn. Splitting the line on the ‘#’ character and taking the first part of the result gets rid of any comment that might be present; stripping off whitespace and then splitting breaks the remainder into fields.
slide 10 To prevent trouble later on, we check that there actually are three fields before going on. An industrial-strength version of this function would also check that the date and time were properly formatted, but we’ll skip that for now.
slide 11 Once we’ve done our check, we append the triple containing the date, time, and bird name to the list we’re going to return.
slide 12 Here’s the function that processes that list of triples.
slide 13 This time, we set up by creating an empty dictionary.
slide 14 Our loop handles one triple at a time. Notice here that we’re splitting the triple into its component fields in the loop header: the variable date is automatically assigned the first field of the current tuple, the variable time the second, and the variable name the third.
slide 15 If this bird’s name is not already a key in our dictionary, this must be the first time we’ve seen it. The value of time in this record is therefore the earliest we’ve seen the bird, so we record it in a new entry in the dictionary.
slide 16 Otherwise, if there’s already an entry for this bird in the dictionary, we record the minimum of the stored time and the new time.
slide 17 All right, that wasn’t too bad. What if we want to find out what birds were seen on each day that we were observing?
slide 18 This is similar to the problem we just solved, so the function we’ll write has a similar structure.
slide 19 However, since we probably saw more than one kind of bird each day, the values in our dictionary need to be some sort of collection. Since we’re only interested in which birds we saw, we can use a set.
slide 20 Here’s our function.
slide 21 Again, we set up by creating an empty dictionary…
slide 22 …and process each record in turn, unpacking it automatically in the loop header.
slide 23 Since we’re recording birds by date, the keys in our dictionary are dates rather than bird names. If the current date isn’t already a key in the dictionary, we add it with an empty set as the associated value.
slide 24 It is then safe to add the current bird to the set associated with the current date, since the previous two lines guaranteed that there would be such a set, even if this is the first observation for the date in question.
slide 25 Let’s watch this function in action. We start with an empty set.
slide 26 After reading the first observation, we add an entry to our dictionary with the date ’2010-07-03′ as its key, and an empty set as its value. We then immediately add the name ‘loon’ to that set, leaving the structure shown here.
slide 27 Our next observation is a goose on the same day, so we put ‘goose’ in the set we just created.
slide 28 Our third observation is another loon. Since adding a value to a set that’s already present has no effect, our data structure doesn’t change.
slide 29 Next, though, we have the first observation for July 4th. Since it’s the first time we’ve seen this date, our function adds a new entry to the main dictionary with '2010-07-04' as the key and a set as the value, then adds 'ostrich' to that set.
slide 30 Finally, we have another damn loon, which goes into the second of our sets.
slide 31 For our last example, we’ll find which bird we saw least frequently.
slide 32 Actually, we’ll find which birds, since two or more may be tied for the low score.
slide 33 One way to solve this problem is to do two passes over our data.
slide 34 In the first pass, we find the minimum value in the dictionary.
slide 35 In the second, we find all the keys that have that value.
slide 36 It’s relatively easy to combine these operations, though, so we’ll do the whole thing in one pass.
slide 37 We’ll assume that we already have a dictionary of bird names and total observation counts.
slide 38 This is the skeleton for our function: as usual, it contains some setup code and a loop. The setup code creates a set to hold the names of the birds we saw least frequently. It also creates a helper variable called number, which holds the current low score. These values are just placeholders; we’ll see in a moment that they are overwritten the very first time we process any real data.
slide 39 Inside the loop, we have to handle three cases.
slide 40 If our result set is empty, then this must be the very first bird we’ve processed. In that case, we record its name, and set number to the number of times we saw this bird. This case should only execute once, since after it runs, the set called result should never be empty, and number should always be greater than zero.
slide 41 In our second case, the number associated with the bird we’re processing is a new minimum. When this happens, we throw away the set of birds we’ve seen so far and replace it with a set containing only the name of this bird, and assign the new lowest score to number.
slide 42 Finally, if the count for this bird is the same as the current lowest score, we add this bird’s name to the set of birds with that score. There isn’t a branch of the if to handle the case where the count for the current bird is greater than the current minimum, since there’s nothing for us to do in that case.
slide 43 Let’s watch this function run. Our input dictionary has three entries; we initialize number to 0 and result to an empty set.
slide 44 Processing the first entry in the dictionary takes us into the first of our three cases: number is assigned 3, and 'loon' is put in our set of birds.
slide 45 The second entry has a new minimum count, so we replace both the value of number and our set. This is case #2 from above.
slide 46 Finally, the third bird’s count is the same as the current minimum, so we just add it to the set.
slide 47 We hope these examples give you an idea of what you can do with dictionaries, and how useful they can be. In the next episode, we’ll take a look at a slightly larger example. Before watching it, please give a few of the exercises a try.

  1. No comments yet.