Software Carpentry logo

Python Sets and Dictionaries

April 24, 2010: We are pleased to announce that Version 4 of this course is now under development. For updates and an early peek at the content, please check out the Software Carpentry blog at http://www.software-carpentry.org/blog/.

1) Introduction

2) You Can Skip This Lecture If...

3) Sets

vowels = set()
for char in 'aieoeiaoaaeieou':
    vowels.add(char)
print vowels
Set(['a', 'i', 'e', 'u', 'o'])

4) Set Operations

5) Set Example

lines = [
    'canada goose',  'canada goose',  'long-tailed jaeger',  'canada goose',
    'snow goose',    'canada goose',  'canada goose',        'northern fulmar'
]

seen = set()
for line in lines:
    seen.add(line.strip())

for bird in seen:
    print bird
northern fulmar
snow goose
long-tailed jaeger
canada goose

6) How Set Values Are Stored

7) Immutability

values = set()
values.add('birds')
print values
values.add(('Canada', 'goose'))
print values
values.add(['snow', 'goose'])
print values

  
Traceback (most recent call last):
  File "mutable_in_set.py", line 8, in ?
    values.add(['snow', 'goose'])
  File "/usr/lib/python2.3/sets.py", line 521, in add
    self._data[element] = True
TypeError: list objects are unhashable

8) Frozen Sets

>>> birds = set()
>>> arctic = frozenset(['goose', 'tern'])
>>> birds.add(arctic)
>>> print birds
set([frozenset(['goose', 'tern'])])
>>> arctic.add('eider')
AttributeError: 'frozenset' object has no attribute 'add'

9) A Note on Language Design

10) Efficiency

11) Complexity Curves

12) Algorithmic Complexity

13) Motivating Dictionaries

14) Creating and Indexing

birthday = {
    'Newton' : 1642,
    'Darwin' : 1809
}
print "Darwin's birthday:", birthday['Darwin']
print "Newton's birthday:", birthday['Newton']
Darwin's birthday: 1809
Newton's birthday: 1642
birthday = {
    'Newton' : 1642,
    'Darwin' : 1809
}
print birthday['Turing']
Traceback (most recent call last):
  File "key_error.py", line 5, in ?
    print birthday['Turing']
KeyError: 'Turing'

15) Updating Dictionaries

birthday = {}
birthday['Darwin'] = 1809
birthday['Newton'] = 1942  # oops
birthday['Newton'] = 1642
print birthday
{'Darwin': 1809, 'Newton': 1642}
birthday = {
    'Newton' : 1642,
    'Darwin' : 1809,
    'Turing' : 1912
}

print 'Before deleting Turing:', birthday
del birthday['Turing']
print 'After deleting Turing:', birthday
del birthday['Faraday']
print 'After deleting Faraday:', birthday
Before deleting Turing: {'Turing': 1912, 'Newton': 1642, 'Darwin': 1809}
After deleting Turing: {'Newton': 1642, 'Darwin': 1809}
Traceback (most recent call last):
  File "dict_del.py", line 10, in ?
    del birthday['Faraday']
KeyError: 'Faraday'

16) Membership and Loops

birthday = {
    'Newton' : 1642,
    'Darwin' : 1809
}

for name in ['Newton', 'Turing']:
    if name in birthday:
        print name, birthday[name]
    else:
        print 'Who is', name, '?'
Newton 1642
Who is Turing ?
birthday = {
    'Newton' : 1642,
    'Darwin' : 1809,
    'Turing' : 1912
}
for name in birthday:
    print name, birthday[name]
Turing 1912
Newton 1642
Darwin 1809

17) Dictionary Methods

Method Purpose Example Result
clear Empty the dictionary. d.clear() Returns None, but d is now empty.
get Return the value associated with a key, or a default value if the key is not present. d.get('x', 99) Returns d['x'] if "x" is in d, or 99 if it is not.
keys Return the dictionary's keys as a list. Entries are guaranteed to be unique. birthday.keys() ['Turing', 'Newton', 'Darwin']
items Return a list of (key, value) pairs. birthday.items() [('Turing', 1912), ('Newton', 1642), ('Darwin', 1809)]
values Return the dictionary's values as a list. Entries may or may not be unique. birthday.values() [1912, 1642, 1809]
update Copy keys and values from one dictionary into another. See the example below.

Table 7.2: Dictionary Methods in Python

birthday = {
    'Newton' : 1642,
    'Darwin' : 1809,
    'Turing' : 1912
}

print 'keys:', birthday.keys()
print 'values:', birthday.values()
print 'items:', birthday.items()
print 'get:', birthday.get('Curie', 1867)

temp = {
    'Curie'    : 1867,
    'Hopper'   : 1906,
    'Franklin' : 1920
}
birthday.update(temp)
print 'after update:', birthday

birthday.clear()
print 'after clear:', birthday
keys: ['Turing', 'Newton', 'Darwin']
values: [1912, 1642, 1809]
items: [('Turing', 1912), ('Newton', 1642), ('Darwin', 1809)]
get: 1867
after update: {'Curie': 1867, 'Darwin': 1809, 'Franklin': 1920, 'Turing': 1912, 'Newton': 1642, 'Hopper': 1906}
after clear: {}

18) Counting Frequency

# Data to count.
names = ['tern','goose','goose','hawk','tern','goose', 'tern']

# Build a dictionary of frequencies.
freq = {}
for name in names:

    # Already seen, so increment count by one.
    if name in freq:
        freq[name] = freq[name] + 1

    # Never seen before, so add to dictionary.
    else:
        freq[name] = 1

# Display.
print freq
{'goose': 3, 'tern': 3, 'hawk': 1}

19) A Slight Simplification

freq = {}
for name in names:
    freq[name] = freq.get(name, 0) + 1
print freq
{'goose': 3, 'tern': 3, 'hawk': 1}

20) Imposing Order

keys = freq.keys()
keys.sort()
for k in keys:
    print k, freq[k]
goose 3
hawk 1
tern 3

21) Inverting a Dictionary

inverse = {}
for (key, value) in freq.items():
    seen = inverse.get(value, [])
    seen.append(key)
    inverse[value] = seen

keys = inverse.keys()
keys.sort()
for k in keys:
    print k, inverse[k]
1 ['hawk']
3 ['goose', 'tern']
Inverting a Dictionary

Figure 7.7: Inverting a Dictionary

22) Another Way to Do It

inverse = {}
for (key, value) in freq.items():
    if value not in inverse:
        inverse[value] = []
    inverse[value].append(key)

23) Formatting Strings with Dictionaries

birthday = {
    'Newton' : 1642,
    'Darwin' : 1809,
    'Turing' : 1912
}
entry = '\%(name)s: \%(year)s'
for (name, year) in birthday.items():
    temp = {'name' : name, 'year' : year}
    print entry \% temp
Turing: 1912
Newton: 1642
Darwin: 1809

24) Extra Keyword Arguments

def settings(title, **kwargs):
    print 'title:', title
    for key in kwargs:
        print '    %s: %s' % (key, kwargs[key])

settings('nothing extra')
settings('colors', red=0.0, green=0.5, blue=1.0)
title: nothing extra
title: colors
    blue: 1.0
    green: 0.5
    red: 0.0

25) Extra Positional Arguments

def sum(*values):
    result = 0.0
    for v in values:
        result += v
    return result

print "no values:", sum()
print "single value:", sum(3)
print "five values:", sum(3, 4, 5, 6, 7)
no values: 0.0
single value: 3.0
five values: 25.0

26) Summary