Up: Matrix Programming

Indexing

August 10th, 2011 Leave a comment Go to comments
slide 001 Hello, and welcome to another episode of the Software Carpentry lecture on matrix programming. In this episode, we’ll have a look at some of the ways you can index arrays. It may not seem important at first, but as we’ll see, clever indexing allows you to avoid writing loops, which both reduces the size of your code, and makes it more efficient.
slide 002 Arrays are subscripted by integers, just like lists and other sequences, so they can be sliced like other sequences as well.
For example, if ‘block’ is the array shown here…
…then ‘block[0:3, 0:2]‘ selects its first three rows and the first two columns.
slide 003 Please take a moment and ask yourself whether a slice like this is an alias for the original data, or a copy of it. If you have a Python prompt handy, please type in a couple of lines of code to test your guess.
slide 004 As with other sliceable types, it’s possible to assign to slices.
For example, we can assign zero to columns 1 and 2 in row 1 of ‘block’ in a single statement.
slide 005 Slicing on both sides gives us a way to shift data along the axes.
If ‘vector’ is a one-dimensional array as shown here, then ‘vector[0:3]‘ selects slots 0, 1, and 2…
…while ‘vector[1:4]‘ selects the values in slots 1, 2, and 3.
Doing the assignment overwrites the lower three values (but note, it leaves the uppermost untouched).
slide 006 As an exercise, write a loop that does the same thing, and then write a loop that shifts values up by one space instead. Which form do you find easier to understand?
slide 007 If we want to do more sophisticated things, we can use a list or an array as a subscript.
For example, here’s our four-element vector again…
…and a list with three legal subscripts: 3, 1, and 2.
The expression ‘vector[subscript]‘ creates a new array, whose elements are selected from ‘vector’ as you’d expect.
slide 008 This works in multiple dimensions as well…
…although the syntax and rules are not immediately obvious.
For example, if we have a 2×2 matrix…
…and subscript it with the list containing only the index ’1′…
…the result is its second row.
slide 009 The details are explained in the NumPy documentation. If you’re going to spend any time programming with arrays—in NumPy or anything else—it’s worth learning these rules so that you can get the most out of the tools your language or library.
slide 010 Remember, if you’re looping over the elements in an array, you’re probably doing the wrong thing.
slide 011 Let’s have a look at another way to subscript.
If we compare our vector’s elements to the value 25, we get back a vector with True where the element passed the test, and False where it didn’t.
As we saw in the previous episode, ‘dtype=bool’ is NumPy’s way of telling us what the array elements’ data type is.
slide 012 We can use a Boolean array like this as a mask to select certain elements from our original array.
Here, for example, the expression ‘vector[vector<25]‘ gives us a vector containing only the elements that passed the test.
slide 013 Again, take a moment and see if you can guess whether ‘result’ is a copy of the original data, or an alias, and why, and then test your guess.
slide 014 We can use Boolean masking on the left side of assignment as well, though we have to be careful about its meaning.
If we use a mask directly, elements are taken in order from the source on the right and assigned to elements corresponding to True values in the mask.
slide 015 The ‘putmask’ function works slightly differently: it pulls values corresponding to True’s in the mask from the source, and pushes them into corresponding slots in the destination.
In both cases, only locations corresponding to True values in the mask are affected; it’s what happens at the source that changes.
slide 016 Operators like ‘<’ and ‘==’ work the way you would expect with arrays, but there is one trick.
Python does not allow objects to re-define the meaning of ‘and’, ‘or’, and ‘not’, since they are keywords.
The expression ‘(vector <= 20) and (vector >= 20)’ therefore doesn’t actually select elements with the value 20.
Instead, it produces an error message.
slide 017 One way around this is to use functions like ‘loglcal_and’, which combine Boolean arrays in the way you’d expect.
Another is to use bitwise operators like ‘|’ and ‘&’, which operate on the bit-level representations of the Boolean values in the arrays.
slide 018 These can produce some surprising results.
For example, the bitwise or of anything with -1 is -1, since -1′s bit representation is 11111….
In contrast, logical_and and related functions treat any nonzero value as True.
slide 019 NumPy provides a whole-array alternative to ‘if’ and ‘else’ called ‘where’.
Its first argument is a Boolean mask. Where that is true, it takes the value from its second argument; where it is false, it takes its third.
For example, ‘where(vector < 25, vector, 0)’ produces an array with the values from the original that are less than 25, or 0. Similarly, ‘where(vector > 25, vector/10, vector)’ scales large values or leaves values alone.
slide 020 As an exercise, have a look at what the ‘choose’ and ‘select’ functions do, and try to think of cases where you would use them.
slide 021 To review…
Arrays can be sliced like lists…
…or subscripts with vectors of indices…
…or masked with conditionals.
slide 022 No matter what you do, if you are writing loops over array elements, you have probably missed something, or are doing something wrong.
slide 023 In our next episode, we’ll use the tools we have looked at so far to explore some linear algebra.

  1. No comments yet.
  1. No trackbacks yet.