Up: Python

Lists

slide 001 Hello, and welcome to the fourth episode of the Software Carpentry lecture on Python. In this episode, we’ll have a look at lists.
slide 002 While loops let us do things many times…
slide 003 …collections let us store many values together, so that we don’t have to define new variables for each piece of data we want to work with.
slide 004 The most popular kind of collection in Python is the list, which takes the place of arrays in languages like C and Fortran.
slide 005 To create a list, just put some values in square brackets with commas in between.
slide 006 To fetch the element at a location, put the index of that location in square brackets.
slide 007 For example, we can create a list of the atomic symbols of the first four noble gases…
slide 008 …and then print out the list element at location 1.
slide 009 And yes, Python indexes lists starting at 0, not at 1.
slide 010 There actually was a reason for this back in 1970, when the language C was invented; today, we just have to put up with it.
slide 011 And just as it’s an error to try to get the value of a variable that hasn’t been defined, it’s an error to try to access a list element that doesn’t exist.
slide 012 For example, if our list of noble gases has four elements, legal indices for the list are 0, 1, 2, and 3, so trying to access element 4 produces an error.
slide 013 If we don’t know how long a list is, we can use the built-in function len to find out.
slide 014 As you’d expect, it returns 4 for our list of gases.
slide 015 And it returns 0 for the empty list, which is written as a pair of square brackets with nothing in between.
slide 016 We said earlier that list indices start at 0, but in fact, some negative indices work as well.
slide 017 In Python, values[-1] is the last element of the list, values[-2] is the next-to-last, and so on, counting backward from the end of the list.
slide 018 For example, here’s our list of gases again.
slide 019 As you can see, element -1 is krypton (the last in the list), and element -4 is helium.
slide 020 This notation is easier to read than the long-winded alternative…
slide 021 …which means programmers are less likely to make mistakes with it.
slide 022 Lists have two important characteristics. First, they are mutable, i.e., they can be changed after they are created.
slide 023 For example, suppose we misspell the last entry in our list of gases.
slide 024 We can correct our mistake by assigning to that element of the list as if it were any other variable.
slide 025 Sure enough, our list has been updated in place.
slide 026 As you probably expect by now, the location must exist before a value can be assigned to it.
slide 027 If our list has four elements…
slide 028 …then assigning to index 4 produces an error, because the legal indices are 0 to 3 (or -1 to -4 if we’re counting from the end).
slide 029 The second important characteristic of lists is that they are heterogeneous, i.e., they can store values of many different types. This makes them different from arrays in C and Fortran, whose entries all have to be the same type.
slide 030 Here for example, we have created two lists…
slide 031 …each of which contains both a string and an integer.
slide 032 This picture shows what’s in memory after the second list is created: each list stores a reference to a string, and a reference to an integer.
slide 033 Lists can even store references to other lists. We can, for example, create a list gases whose two entries are references to the lists helium and neon.
slide 034 There’s nothing magical about this: if we update our picture of what’s in memory, we simply have another two-element list that stores references to other things we’ve already created.
slide 035 Nesting data structures like this allows us to do some very powerful things. It can also be a rich source of bugs, so we will delay discussion of the details to a later episode.
slide 036 Lists and loops naturally go together: we almost always use a loop of some kind to operate on all the list’s elements.
slide 037 For example, we can use a while loop to step through the indices of a list to get each of its elements in turn.
slide 038 Here’s a short program that prints the noble gases one by one.
slide 039 We start the loop variable i at 0, which is the first legal list index.
slide 040 Each time through the loop, we add 1 to it, so that we move through the set of legal list indices in order.
slide 041 We keep going as long as i is less than the length of the list, i.e., as long as it’s a legal index.
slide 042 And sure enough, this loop prints out each list element in order.
slide 043 This works, but it’s tedious to type it all in time after time.
slide 044 And it’s all too easy to forget to increment the loop index, or to get the loop control condition wrong.
slide 045 To make things simpler, Python provides a second kind of loop called a for loop that gives the program each list element in turn.
slide 046 Here, for example, we do in one line (for gas in gases) what took three lines in the previous program.
slide 047 As you can see, the for loop variable is assigned each element of the list in turn…
slide 048 not each index.
slide 049 Python does this because it’s the most common case: most of the time that a program wants to do something with each list element, it doesn’t care what that element’s location is.
slide 050 As we said a few slides ago, lists are mutable: their elements can be changed in place. We can also delete elements entirely, which shortens the list.
slide 051 Let’s set up our noble gas list again…
slide 052 ..and then tell Python to delete element 0 using the del operator.
slide 053 If we print gases out afterward, it only has three elements.
slide 054 If we delete element 2 of this list (which is now the last element, since the list’s length is 3)…
slide 055 …we’re left with a two-element list.
slide 056 And yes, deleting an index that doesn’t exist is an error.
slide 057 We can lengthen lists, too, by appending new elements.
slide 058 Let’s assign an empty list to gases
slide 059 …then append the string 'He'
slide 060 …and the string 'Ne'
slide 061 …and finally the string 'Ar'.
slide 062 Our list now has three elements.
slide 063 dot-append is an example of a method, and most operations on lists (and other things) are expressed this way.
slide 064 A method is a function that “belongs to” (and usually operates on) a specific chunk of data.
slide 065 If the data is stored in thing, then we call the method using the notation “thing dot methodname”, passing in any arguments it takes inside parentheses.
slide 066 To show you how this works, here are a few useful list methods.
slide 067 Let’s create the gases list again, but with 'He' duplicated at the front.
slide 068 gases.count('He') tells us that 'He' occurs twice in the list.
slide 069 gases.index('Ar') tells us that the index of the first occurrence of 'Ar' is 2. (Remember indexing starts at zero, so element 2 is the third element of the list.)
slide 070 gases.insert takes two arguments: the index where we want to insert something, and the something we want to insert. It doesn’t return any value…
slide 071 …but if we print out the list after calling it, we can see that 'Ne' has been put at location 1, and everything above that has been bumped up to make room, leaving us with a list of five elements.
slide 072 Here are two methods that are often used incorrectly.
slide 073 Let’s re-set the gases list…
slide 074 …and then print the result of gases.sort(). As you can see, the sort method returns None, which is the special value Python uses for “nothing here”.
slide 075 However, if we now print gases, it has been sorted alphabetically.
slide 076 Similarly, gases.reverse() returns nothing…
slide 077 …but reverses the list in place.
slide 078 People often expect sort and reverse to return the sorted or reversed list, which leads to a common bug:
slide 079 gases = gases.sort() does sort the list that gases refers to, but then assigns None to the variable gases, effectively throwing away the data that has just been sorted.
slide 080 List’s find method tells us where something is in a list, but if we just want to know whether something is there or not, we can use the in operator.
slide 081 Here’s our list of gases again.
slide 082 As expected, the expression 'He' in gases is true.
slide 083 in is most often used in if statements, as in this example.
slide 084 Since 'Pu' is not in the list gases, this tells us that the universe is well ordered.
slide 085 The last thing we will introduce in this episode is the range function, which constructs sequences of integers.
slide 086 The expression range(5) produces the list of numbers from 0 to 4…
slide 087 …while range(2, 6) produces 2, 3, 4, 5…
slide 088 …and range(0, 10, 3) produces 0, 3, 6, 9, i.e., starts at the first argument, and goes up to but not including the second argument, using the third argument as the step size.
slide 089 range(10, 0) does not produce a list in reverse order: instead, it starts at 10, and tries to go “up to” 0. Since nothing fits that description, it produces the empty list.
slide 090 Well, if len(list) is the length of a list, and range(N) is the integers from 0 to N-1, then range(len(list)) is the integers from 0 to 1 less than the length of the list, i.e., all the legal indices of the list.
slide 091 An example will make this clearer. Here’s our list of gases.
slide 092 Its length is 4.
slide 093 So range(len(gases)), or range(4), is 0, 1, 2, and 3.
slide 094 If we use range(len(gases)) in a for loop, it assigns each index of the list to the loop variable in turn…
slide 095 …so we can print out (index, element) pairs one by one.
slide 096 This is a very common idiom in Python for those cases where we really do want to know each element’s location as well as its value.
slide 097 We’ll see an even better way to do it later.

  1. No comments yet.
  1. No trackbacks yet.