Up: Python

Strings

slide 001 Hello, and welcome to the sixth episode of the Software Carpentry lecture on Python. This episode will show you more about how to work with strings.
slide 002 We’ve been using strings since our very first episode, and as you’ve probably guessed by now, a string is just a sequence of characters.
slide 003 There is actually no separate data type for individual characters: a character is just a string of length 1.
slide 004 Strings are indexed exactly like lists.
slide 005 If name holds the string 'Darwin', then name[0] is the first character, ‘D’, and name[-1] is the last one, ‘n’.
slide 006 for loops work the same way they do for lists too.
slide 007 If the program contains for c in name, Python assigns each character to the variable c in turn.
slide 008 Strings can be wrapped in either single quotes or double quotes, as long as the quotes match.
slide 009 Here, the first string is in single quotes, and the second in double quotes.
slide 010 It doesn’t matter which form is used: the string’s value is the same.
slide 011 We can use == to test this.
slide 012 And speaking of comparison, we can use “less than” or “greater than or equal to” to compare strings. When Python compares strings, it works character by character from left to right.
slide 013 As you’d expect, ‘a’ is less than ‘b’.
slide 014 And ‘ab’ is less than ‘abc’ (since ‘ab’ runs out of characters first).
slide 015 The digit characters are ordered in the natural way too.
slide 016 But if you put these rules together, the string ’100′ is less than the string ’9′, because ’1′ is less than ’9′.
slide 017 It may also surprise you to discover that an upper-case ‘A’ is less than a lower-case ‘a’. In fact, every upper case letter comes before every lower case letter.
slide 018 One more surprise is that strings are immutable, i.e., they cannot be changed in place.
slide 019 For example, if we try to overwrite the ‘D’ in ‘Darwin’ with a ‘C’, Python gives us an error. This is different from most languages, which allow strings to be changed in place.
slide 020 Python strings are immutable because it improves performance by allowing Python to do some internal optimization that wouldn’t be possible if strings could be changed arbitrarily.
slide 021 It also helps make programmers more productive by making some kinds of errors impossible—we’ll explore this in more detail in the next episode.
slide 022 But hang on a second: we’ve already seen that we can use + to concatenate strings.
slide 023 For example, we can “add” the strings ‘Charles’, space, and ‘Darwin’ to produce ‘Charles Darwin’.
slide 024 What happens is that concatenation always produces a new string.
slide 025 If the variable original refers to the string 'Charles'
slide 026 …and the variable name refers to the same string…
slide 027 …then when we add “space ‘Darwin’” to name, it actually creates a new string and assigns that to name, leaving original pointing at the original string. It does not modify the string 'Charles' in place.
slide 028 Novices often use string concatenation to format output.
slide 029 Here’s an example: we concatenate three constants strings and the string representations of two numbers to produce a single string of output.
slide 030 It works, but there’s a much better way.
slide 031 In Python, we can use the % operator to format output. On the left, we have a format string with placeholders where we want to insert values. On the right, we have the values we want to insert.
slide 032 Here’s a simple example: the format string is 'reagant: %d', and the value we’re inserting is 123. The format specifier '%d' in the format string means “decimal integer”, so Python creates a new string with the value 123 in place of the ‘%d’.
slide 033 We can control the width and precision of values too: in this example, '%6.2f' means “floating point number, six characters wide, two digits after the decimal point”.
slide 034 If we want to format multiple values at once, we have to put them in parentheses after the %.
slide 035 Here’s our earlier example re-done with string formatting: we have used '%d' to format an integer, '%f' to format a floating-point number…
slide 036 …and '%%' to format an actual percentage sign. We have to do this because when Python applies % to a string, it expects something after every actual percentage sign in that string. We’ll come back to this idea in a few moments.
slide 037 Even without the percentage operator, we sometimes use two characters in a program to put one character into a string. The most common example is probably \n, which means “a newline character”.
slide 038 We can also use \' to insert a literal single quote, or \" to insert a literal double quote.
slide 039 Here, for example, we have a single-quoted string that contains both a newline and a single quote.
slide 040 And here, we have a double-quoted string that contains a newline and a double quote.
slide 041 So if \ is used to start special two-character sequences, how do we represent an actual backslash? The answer is, with two backslashes.
slide 042 Here, for example, is a string that includes a single literal backslash character. It is written with two backslashes, but when Python reads the program, it only puts one in the string.
slide 043 This doubling up is a common pattern with so-called escape sequences.
slide 044 We use some character to mean, “What follows is special.”
slide 045 And then double up that character to mean, “The character itself.”
slide 046 There’s another way to get newline characters in strings. If we use three quotes of either kind to start and end a string, it can span multiple lines.
slide 047 Here, for example, we have a four-line string.
slide 048 There’s nothing magical about this: Python just puts the newline characters at the end of the first three lines into the string data it stores in memory.
slide 049 We could just as well write this as two “normal” strings, with embedded newline characters, and then concatenate them.
slide 050 Like lists, strings have methods.
slide 051 For example, the capitalize, upper, and lower methods return new strings that translate some or all of the characters of the original. These methods don’t modify the original string, though, because strings are immutable.
slide 052 Another method, count, returns the number of times a character occurs in the string.
slide 053 And find returns the index of the first occurrence of a character, or -1 if the character can’t be found.
slide 054 Another useful method is replace, which creates a new string with every occurrence of one character replaced with another.
slide 055 In fact, these will find or replace entire strings, not just single characters.
slide 056 One common idiom in Python and other languages is to chain method calls together.
slide 057 Here’s a rather contrived example.
slide 058 The first method call—the one that is invoked directly on the variable element—returns a string that is the upper-case version of the string 'cesium'.
slide 059 We then call center on this string to create yet another one that has the upper-case copy of ‘cesium’ centered in a field 10 characters wide.
slide 060 The result is shown here; the technique is no different from a mathematician writing f(g(x)).
slide 061

  1. No comments yet.
  1. No trackbacks yet.