Home > Lectures, Version 4 > The Violas of Programming

The Violas of Programming

Orchestral musicians make jokes about violas and viola players. “What’s a string quartet? A great violin player, a mediocre violin player, a bad violin player, and a cellist.”  Or, “What’s the difference between a viola and a trampoline? You take off your shoes to jump on a trampoline.” But violas are as essential as they are unglamorous: hardly anyone plays them as a lead or solo instrument, but string quartets just don’t sound right without that third voice.

I realized today that Python’s sets [1] are sort of like the violas of programming. They come up quite naturally all over the place—just flip through any text on algorithms and count how often they’re used. But they’re rarely used alone, which makes it hard to come up with well-motivated examples when teaching them. Consider:

  • “What vowels are present in this string?” Sure, but show me an application where that comes up: every one I can think of wants the frequency of the vowels, not just their presence or absence.
  • “Find out whether these photos have some tags in common.” Sure, but (a) intersection and union are built in, so it’s a one-liner, and (b) you’d almost certainly use a dictionary with photo IDs for keys, and sets of tags as values.
  • Anything with graphs: again, the nodes reachable from X are naturally stored as a set, but the graph as a whole will be a dictionary of nodes to reachable sets.

I didn’t worry about this too much in the Version 3 lecture on sets and dictionaries: I used a couple of completely abstract examples (including “which vowels”), and moved quickly into a discussion of how sets are stored and why their values have to be immutable. I’d like to do better in Version 4—I’d like every new tool or technique to be well motivated at the time of its introduction—but I’m damned if I can figure out how.

[1] Disclaimer: As the author of the original Python Enhancement Proposal (PEP) on sets, I have certain biases.

Categories: Lectures, Version 4 Tags:
  1. June 29th, 2010 at 21:21 | #1

    I wish I had an immediate and shining example to provide, but I will have to watch my own coding over the next week or two and see where sets crop up next. But I can offer this thought: for me, sets do NOT tend to come into play in the contexts where I would use a dictionary. Instead, they tend to replace lists when I realize that the data I am dealing with has no inherent order, and that saving it in a set instead of a Python list avoids creating a kind of information in my program — specifically, the list’s order — that in fact is not a legitimate kind of information being created by the computation. I like avoiding information that’s not essential to correctness; I suppose, because the time I saw Dijkstra lecture he showed an algorithm that surprisingly can be done in linear time (comparing two linked-list loops for equivalence, if I recall) only if one avoids making commitments ahead of time.

    Anyway, that’s my first thought: sets correctly replace lists when there’s really no order involved.

  2. Titus Brown
    June 30th, 2010 at 15:54 | #2

    Words in common between two texts.

    Merging/comparing two (x, y) data sets with potentially different collections of x (I use dictionaries for this, but sets should be usable).

    In fact, more generally — identifying missing keys, or generating a complete key set, for pairs (or more) of data sets that have non-identical sets of keys.

    Any good?

    –titus

  3. July 6th, 2010 at 00:33 | #3

    This may fail the same simplicity test as your #2, but one option from ecology would be the calculation of similarity indices. We are often interested in how similar two locations are with respect to their species composition. There are a number of different ways of calculating this, but one of the classics is the Jaccard index which involves a union and an intersection of two sets of species. Since this is calculated on the simple presence/absence of a species you could accumulate the species list in exactly the way you do in #5 of the version 3 lecture. You could even go so far as to gets lists of species from one of the online bird databases like Breeding Bird Survey.

  1. No trackbacks yet.