Up: Python

Libraries

slide 001 Hello, and welcome to the tenth episode of the Software Carpentry lecture on Python. This episode will explain how Python libraries work, and introduce you to a couple that you may find useful.
slide 002 As we saw in the previous episode, a function is a way to turn a bunch of related statements into a single “chunk” that can be re-used.
slide 003 Modularizing code this way eliminates duplication…
slide 004 …and makes code easier to read.
slide 005 A library does for functions what functions do for statements: group them together to create more usable chunks.
slide 006 This hierarchical organization is similar in spirit to that used in biology:
slide 007 instead of family, genus, and species, we have library, function, and statement.
slide 008 Every Python file can be used as a library by other programs.
slide 009 To load it into memory, use the import statement.
slide 010 For example, suppose we have created a Python file called halman.py that defines a single function called threshold.
slide 011 If we want to call this function in another file, we write import halman to load the contents of halman.py, and then call the function as halman.threshold.
slide 012 When we run program.py, it does the right thing.
slide 013 A file that has been imported into another program is called a library or module. When a module is imported, Python:
slide 014 executes the statements it contains (which are usually, but not always, function defintions), and
slide 015 creates an object to store references to all the items defined in that module, and assigns it to a variable with the same name as the module.
slide 016 For example, let’s create a file called noisy.py that prints out a message and defines NOISE_LEVEL to be 1/3.
slide 017 When we import noisy, the first statement—the print—is executed, displaying a message on the screen…
slide 018 …and the variable NOISE_LEVEL is assigned a value, which we can access as noisy.NOISE_LEVEL.
slide 019 One important feature of modules is that each one is a separate namespace, i.e., variable names defined inside a module belong to that module, and if the same name is used in two different modules, each module gets its own.
slide 020 When Python sees a reference to a variable, it looks in the current function call stack frame to find its definition.
slide 021 If it can’t find it there, it looks in the module the function was defined in (assuming it was defined in a library).
slide 022 If it still can’t find it, it looks in the global namespace belonging to the top-level program as a whole.
slide 023 For example, let’s create a file called module.py that defines a variable called NAME and a function called func that prints it out.
slide 024 In our main program, we also define a variable called NAME……
slide 025 …then import our module.
slide 026 When we call module.func, it sees the NAME variable that was defined inside the module, not the one that was defined globally. This “module first” rule makes it safe to load libraries that were written independently, without worrying about whether their authors might have used the same names for things.
slide 027 Python comes with many standard libraries.
slide 028 One of the most useful is the math library…
slide 029 …which defines sqrt for square roots…
slide 030 hypot for calculating x2+y2
slide 031 …and values for e and π that are as accurate as the machine can make them.
slide 032 To help you find your way around libraries, Python provides a help function.
slide 033 If math has been imported, the call help(math) prints out the documentation embedded in the math library.
slide 034 Python also provides a few convenient alternatives for doing imports.
slide 035 For example, we can import specific functions from a library and then call them directly, rather than using the modulename.functionname syntax.
slide 036 We can also import a function under a different name, so that if two modules define functions with the same name, we can give one or the other a different name when we want to use them together.
slide 037 We can also use import * to bring everything in the module into the current namespace at once, which has the same effect as using from module import a, from module import b, and so on for every name in the module.
slide 038 This is almost always a bad idea, though.
slide 039 If someone adds a new function or variable to the next version of the module, that import * could silently overwrite something that you’re importing from somewhere else, leading to a hard-to-find bug.
slide 040 While the math library is useful, the sys library is even more so.
slide 041 Once it’s imported…
slide 042 …we can find out exactly what version of Python we’re using…
slide 043 …what operating system we’re running on…
slide 044 …and a few other things, like how large integers in this version.
slide 045 What may be more interesting is sys.path, which defines the list of directories Python searches in to find modules. When a program executes import X, Python looks at each of these directories in turn to see if it contains a file called X.py, and loads the first one it finds. If your program isn’t finding the definitions you think it should, try printing out sys.path to see if the problem is a missing directory.
slide 046 The most commonly-used element of sys is probably sys.argv, which holds the command-line arguments of the currently-executing program.
slide 047 In keeping with Unix conventions, the name of the script itself is put in sys.argv[0]; all the arguments given to the script when it was run are put in sys.argv[1], sys.argv[2], and so on..
slide 048 For example, here’s a program that does nothing except print out its command-line arguments.
slide 049 If it is run without any arguments, it just reports that sys.argv[0] is echo.py.
slide 050 When it is run with arguments, though, it displays those as well.
slide 051 sys also creates variables to connect programs to standard I/O channels. sys.stdin is standard input (which is usually connected to the keyboard).
slide 052 sys.stdout is standard output, which by default is connected to the screen.
slide 053 And sys.stderr is standard error, which is also usually connected to the screen.
slide 054 For more information on what these are for, and how to use them, please see the lecture on the Unix shell.
slide 055 Here’s a typical example of how these variables are used together. This little program looks at sys.argv to see if it was called with a filename as an argument or not.
slide 056 If there were no arguments, then sys.argv will only hold the name of the program, and its length will be 1. In that case, the program reads data from standard input.
slide 057 Otherwise, the program assumes its first command-line argument is the name of an input file, opens it, and reads from it instead.
slide 058 Sure enough, if we run the program with no command-line arguments, and send it the contents of the file a.txt using redirection, it tell us that its standard input has 48 lines.
slide 059 If we run it with a filename as an argument, on the other hand, it reads from that file and tells us it has 227 lines. Again, please see the lecture on the Unix shell for more information on standard input, standard output, and redirection.
slide 060 Here’s a more polite way to write the program we just created. The two significant changes are:
slide 061 the strings at the start of the module, and the start of the function count_lines, and
slide 062 the funny-looking conditional if __name__ == '__main__'. Let’s look at them in that order.
slide 063 If the first thing in a module or function other than blank lines or comments is a string that isn’t assigned to anything, Python saves it as the documentation string, or docstring, for that module or function.
slide 064 These docstrings are what online (and offline) help display.
slide 065 For example, let’s create a file adder.py with a single function add, and write docstrings for both the module and the function.
slide 066 If we then import adder, help(adder) will print out all of its docstrings, i.e., the documentation for the module itself and for all of its functions.
slide 067 We can also be more selective, and only display the help for a particular function instead.
slide 068 The second part of our “more polite” program was that funny if statement. The trick here is that when Python reads in a file, it assigns a value to a special top-level variable called __name__ (with two underscores before and after).
slide 069 If the file is being run as the main program, __name__ is assigned the string '__main__' (again with two underscores before and after).
slide 070 If the file is being loaded as a module by some other program, though, Python assigns the module’s name to the variable __name__ instead.
slide 071 So imagine the file contains some definitions, and then the conditional statement if __name__ == '__main__'.
slide 072 The definitions will always be executed…
slide 073 …but the code inside the conditional will only run if the file is the main program. Put another way, the statements inside the conditional will not be run if the file is being loaded as a library by some other program.
slide 074 Let’s see how this works. Here’s a file stats.py that defines a function average, and then runs three simple tests—but only if __name__ has the value '__main__'.
slide 075 And here’s another file, test-stats.py, that imports stats and runs two more tests.
slide 076 If we run stats.py directly, the three tests inside it are executed.
slide 077 If we run test-stats.py, though, those three tests aren’t executed—only the two in test-stats.py itself are run. This happens (or doesn’t happen) because the variable __name__ inside stats is assigned the string 'stats' instead of the string '__main__' when stats is loaded as a module.
slide 078

  1. Shy Guy
    January 31st, 2011 at 05:01 | #1

    Can’t pause the slideshow in this episode. Very difficult to take notes on one slide when audio/slides continue to proceed…

  1. No trackbacks yet.