Up: Sets and Dictionaries

Dictionaries

slide 01 Hello, and welcome to the third episode of the Software Carpentry lecture on sets and dictionaries. In this episode we’ll have a look at dictionaries, which are almost as widely used as lists, and often more useful.
slide 02 Let’s go back to the summer we spent counting birds in northern Ontario.
slide 03 Our supervisor would like to know how many birds we saw of each type.
slide 04 Our input is a list of several thousand observations.
slide 05 The output we want is a list of names and counts.
slide 06 Using what we’ve learned so far, we could do this using a list of pairs, where each pair is a sublist storing a bird name and the number of times we’ve seen it so far.
slide 07 Here’s a function to add another observation to such a list.
slide 08 Its first argument is the list of pairs.
slide 09 Its second is the bird name being added.
slide 10 Inside the function, we loop over the pairs that are already in the list.
slide 11 If the first element of the pair is the same as the bird name we’ve been given…
slide 12 …we add one to the associated count and return from the function immediately—there’s no need to look at the other pairs in the list.
slide 13 Otherwise, if we reach the end of the list without finding a matching name, we add a new pair to the list consisting of the bird’s name and ’1′ (since we have now seen the bird one time).
slide 14 This is an instance of a common programming pattern: either we find pre-existing data in a loop and return right away, or take some default action if we fall off the end of the loop without finding a match.
slide 15 Let’s take a look at this function in action. We start with an empty list.
slide 16 If our first bird name is ‘loon’, we find no match (since the list is empty), so we add a pair to the list.
slide 17 The next name is ‘goose’. This doesn’t match ‘loon’, so we add another pair.
slide 18 Our third record is another loon. This matches the first entry in the list, so we add one to its count and return immediately, leaving the list as shown here.
slide 19 This “list of pairs” approach works, but there’s a much solution to this problem, one that is simpler and many times efficient.
slide 20 That solution uses a dictionary instead of a list.
slide 21 A dictionary is a unordered collection of key/value pairs.
slide 22 Like the elements in a set, keys are:
slide 23 immutable
slide 24 unique
slide 25 and not stored in any particular order.
slide 26 There are no restrictions on the values stored with those keys.
slide 27 In particular, they don’t have to be immutable or unique.
slide 28 You can create a new dictionary by putting key, colon, value pairs inside curly braces.
slide 29 For example, here we’re creating a new dictionary with two entries and assigning it to the variable birthdays. The dictionary has two keys, which are the strings 'Newton' and 'Darwin'. The value associated with 'Newton' is 1642, while the value associated with 'Darwin' is 1809.
slide 30 We can get the value associated with a key by putting the key in square brackets.
slide 31 This looks just like subscripting a string or list, except dictionary keys don’t have to be integers—they can be strings, tuples, and so on.
slide 32 For example, the expression birthdays['Newton'] returns the value 1642, since that’s what we associated with the key 'Newton'.
slide 33 It’s just like using a phonebook or a real dictionary: instead of looking things up by integer index, we can look things up by name.
slide 34 If we want to add another key/value pair to a dictionary, all we have to do is assign something to it.
slide 35 Here, for example, the expression birthdays['Turing'] = 1612 adds a third pair to our dictionary with the key 'Turing' and the value 1612.
slide 36 Assignment also overwrites values.
slide 37 Since Alan Turing was actually born in 1912, we can fix the mistake in the previous line simply by assigning that value to the key 'Turing'.
slide 38 Keep in mind that just like the elements in a set, the keys in a dictionary are not stored in any particular order.
slide 39 That’s why we draw dictionaries like this: a cloud of pairs, each of which has a key and a value.
slide 40 If a key isn’t actually in a dictionary, trying to read its value is an error, just like trying to read off the end of a list.
slide 41 For example, if we try to find Florence Nightingale’s birthday with birthdays['Nightingale'], Python reports a KeyError.
slide 42 If we’re not sure whether a key is in a dictionary or not, we can test for it using the in operator.
slide 43 Given the current state of our birthdays dictionary, 'Nightingale' in birthdays is False, but 'Darwin' in birthdays is True.
slide 44 Finally, we can loop over the keys in a dictionary using for.
slide 45 This is slightly different from looping over a list: when we loop over a dictionary, we get the keys, which we can use to look up the values. When we loop over a list, we get its values right away, since its “keys” are just the integers 0, 1, and so on, which aren’t usually very interesting.
slide 46 Here, for example, the for loop assigns each key of the birthdays dictionary to the variable name in turn. Inside the loop, we use name to look up the value associated with that key, and print both out.
slide 47 Now, let’s go back and count those birds.
slide 48 Here’s the main body of our program.
slide 49 The first three lines read the input file into a list of strings.
slide 50 We then call a function count_names to turn that list into a dictionary of bird names and counts…
slide 51 …then use a loop like the one we just saw to print out the dictionary’s contents.
slide 52 Here’s the function that does the counting.
slide 53 As always, we start with a docstring to explain the function’s purpose to whoever has to read it next.
slide 54 We get set up by creating an empty dictionary to fill with data…
slide 55 …then loop over the strings in the input list one by one.
slide 56 After stripping off any leading or trailing whitespace as usual…
slide 57 …we check to see if we’ve seen this bird name before.
slide 58 If we have, we add one to the associated count.
slide 59 If it’s the first time we’ve seen this name, though, we add a new entry to the dictionary, using the bird name as the key and ’1′ as the value.
slide 60 Finally, we return the dictionary we’ve created.
slide 61 Let’s watch our counting function in action.
slide 62 Before we read any data, our dictionary is empty.
slide 63 After we see ‘loon’ for the first time, our dictionary has one entry. The key is 'loon', and the value is 1.
slide 64 When we see ‘goose’, we add another entry to the dictionary.
slide 65 And when we see ‘loon’ for the second time, we add one to its count.
slide 66 So why use a dictionary rather than a list of pairs? Because it’s much more efficient. Just like sets, dictionaries are stored using hash tables, which guarantee that finding or modifying values takes roughly constant time. This is a lot better than the list-based method, where the time grows in proportion to the number of pairs in the list.
slide 67 In our next episode, we’ll look at some other examples of dictionaries in action.

  1. Terri Yu
    February 19th, 2011 at 19:37 | #1

    In the count_names() function that uses dictionaries, there is a line of code: “result[name] = result[name] + 1″. Why not use “result[name] += 1″?