Integrating Python with C and Fortran
April 24, 2010: We are pleased to announce that Version 4 of this course is now under development. For updates and an early peek at the content, please check out the Software Carpentry blog at http://www.software-carpentry.org/blog/.
1) Introduction
- Good programmers don't write programs: they assemble them
- Combine tools and libraries that others have written
- Thereby creating something that others can then recombine
- This lecture explores various ways to combine things
- Helps a lot to design with combination in mind

2) You Can Skip This Lecture If...
- You know how to run an external program from Python
- You know how to load a module dynamically
- You know how to inspect the contents of a module or class
- You know how to call a C function from Python...
- ...and a Python function from C

3) Running External Programs
- There are lots of old command-line programs in the world
- And lots of GUI programs that have command-line interfaces
- The older they are, the stronger the argument for leaving them alone
- The less you change, the less will break
- Instead, run the program as-is from Python (or some other high-level language)
- Talk to the web, create 3D graphics, etc., in Python
- Run the legacy program to do the calculation, and parse its output

4) The subprocess Module
- Python's
subprocess module lets you run external programs
- Connect to their standard input, output, and error (just like pipes)
- Capture their return codes
- Defines a single class called
Popen
- Takes up to 14 (!) options
- Common cases only use three or four of these
- Does its best to behave the same on Unix and Windows
- Read the documentation before doing anything particularly tricky

5) Running In Place
- Simple usage is
Popen("cmd"), where "cmd" is the program to be run
- New process created
- Inherits the parent's
stdin, stdout, stderr, working directory, and environment variables
import subprocess
subprocess.Popen("date")
Mon Apr 3 09:05:39 EST 2006

6) Running With Arguments
- Pass command-line arguments by giving
Popen a list
import subprocess
subprocess.Popen(["date", "-u"])
Mon Apr 3 13:06:27 EST 2006
- Can also:
- Specify a working directory for the child process (the
cwd parameter)
- Provide or override environment variables (the
env parameter)

7) Capturing Output
- Often useful to run a program and capture its output
- E.g., legacy program prints records from a database
- Do this by:
- Setting
Popen's stdout parameter to PIPE
- Reading from the object's
stdout member
import subprocess
SQL = 'select * from Person'
child = subprocess.Popen(['sqlite3', 'experiment.db', SQL],
stdout=subprocess.PIPE)
lines = child.stdout.readlines()
for line in lines:
line = line.strip().split('|')
print '%s %s (%s)' % (line[1], line[2], line[0])
Kovalevskaya Sofia (skol)
Lomonosov Mikhail (mlom)
Mendeleev Dmitri (dmitri)
Pavlov Ivan (ivan)
- Note: the SQL is passed as a single argument

8) Providing Input
- Can also pipe data to a child by setting
stdin to PIPE
- Example: compress output on its way to a file
- In reality, better to use
zlib, gzip, or bz2 libraries
def pipe_write(filename, lines):
child = subprocess.Popen(['gzip', '-c'],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE)
for line in lines:
child.stdin.write(line)
child.stdin.close()
result = child.stdout.read()
return result
- Create a child process to run
gzip -c
-c meaning "write result to standard output"
- Send data by writing to child's
stdin
- Once all data has been sent, read compressed result from child's
stdout

9) Deadlock
- Example above doesn't scale to large data sets
- Operating system can only buffer a limited amount of data
- You can increase this limit, but can't make it infinite
- Program can deadlock
- Parent and child are each waiting for the other to read
- So neither does
- Solution is to use
Popen.communicate
- Sends data to the child process's
stdin
- Reads from
stdout and stderr at the same time

10) Pros and Cons
- Pro:
- Often the quickest thing to set up
- Involves fewest changes to legacy code (i.e., least risk)
- Con:
- Legacy application may not expose the needed functionality
- Managing parent/child interactions is tricky
- I.e., easy to break, and hard to debug

11) Plan B: Integrating with C
- Been saying since the first lecture that you should:
- Write the first version in Python (or some other very high-level language)
- Find out whether it's fast enough, and if not, what's slowing it down
- Optimize only those parts that you need to
- The most effective way to optimize programs written in high-level languages is to find more efficient algorithms
- The second most effective way is often to rewrite core modules in a low-level language like C
- Also a good way to handle inherited code: wrapping tried-and-trusted C or Fortran in Python is safer and easier than rewriting
- And faster
- If you don't speak C, [Kernighan & Ritchie 1998] is the standard introduction

12) How Python Represents Objects
- Python represents things using a C structure of type
PyObject
- Include
python.h to get its definition
- The type code tells the interpreter knows how to interpret the rest of the structure
- The union is large enough to hold any basic value
- In Python's case, the two 64-bit values needed to store a complex number
- The reference count keeps track of how many other objects are pointing to this one
- When a thing is created, its reference count is initialized to 1
- When its count drops to 0, Python can garbage collect it

13) Calling Conventions
- Every C function that the interpreter calls must take two arguments:
self is NULL for pure functions, and an object for methods
args is a variable-length list of arguments
- Use the function
PyArg_ParseTuple to extract arguments' values
- Returns
NULL to signal error
- Otherwise, uses
Py_BuildValue to build a Python structure with the result value
- Example: take an integer and return three times its value
/* Triple an integer value. */
static PyObject * triple(PyObject * self, PyObject * args)
{
int val;
if (!PyArg_ParseTuple(args, "i", &val)) {
return NULL;
}
val = val * 3;
return Py_BuildValue("i", val);
}

14) Boilerplate
- Need some boilerplate to bring this function to the Python interpreter's attention
/* Table of module contents (handed back to Python at initialization). */
static PyMethodDef contents[] = {
{"triple", triple, METH_VARARGS},
{NULL, NULL}
};
/* Initialization function. */
void inittriple()
{
Py_InitModule("triple", contents);
}
- The array
contents has one entry for each function
- The initialization function:
- Has a name of the form
initXYZ for the module XYZ
- Calls
Py_InitModule to pass the table of module contents to the interpreter

15) Loading and Calling
- Compile the C to create a shared library
.dll on Windows
.so on Unix)
- Put the shared library in a directory that's on Python's search path
- Then import and use as if it were written in Python
import triple
print triple.triple(11)
33

16) What About C++?
- Connecting Python and C++ is harder
- C++ has many features that don't have Python equivalents (e.g., templates)
- Many of the analogous features have different semantics (e.g., exceptions)
- Every compiler and platform has its own Application Binary Interface
- Much wider variation than there is for C libraries
- [Boost.Python] does the best it can (which is pretty good)

17) SWIG
- It's simpler with [SWIG] (the Simple Wrapper Interface Generator)
- Parses a description of the functions in a module, and generates the C bindings
- Example: if you already have a plain old C function called
triple, put the following in triple.i
/* triple.i */
%module triple
%{
extern int triple(int n);
%}
- [SWIG] can also generate wrappers for Perl, Java, and other high-level languages
- The more you plan for change, the less often you'll have to change your plan
- Similar tools exist to connect Python to Fortran (e.g., [F2PY] and [PyFort])

18) Integrating the Other Way
- Can also go the other way, and embed Python in C/C++
- Every large application eventually needs an interactive command interpreter
- To embed:
- Initialize a Python interpreter object
- Convert application values into Python objects
- Pass the interpreter a string containing the code to be executed, and the values to execute it on
- Unwrap the result
- Much less common than wrapping
- Multilanguage programming isn't simple
- And multilanguage debugging is downright hard
- But both are often simpler (as well as more efficient) than the alternatives

19) Loading Modules

20) Plugin Frameworks
- All modern languages let you do this programmatically
- Load code (pre-compiled or not) from a file on disk
- Add it to your program
- Call it as if it had always been there
- Which is why most modern programs are built as frameworks
- The "program" knows how to load modules and pass data between them
- Modules provide different image processing operations, alternative ocean circulation models, etc.
- This modularization development as well as usability
- Better testability: well-defined interfaces between self-contained objects
- Easier maintenance: can replace things one at a time

21) Manual Loading
- Use the
__import__ function to load a file
- Resulting module object behaves like a dictionary
- Note that the module must be on Python's search path
- Use
vars to find out what a module object contains
- Can also be applied to classes and class instances
- Using code to examine other code is called reflection
- Example: list the contents of a Python file
import sys, os
def list_contents(module_name):
print module_name
if os.path.dirname(module_name) not in sys.path:
sys.path.append(os.path.dirname(module_name))
try:
module = __import__(module_name)
for name in vars(module):
print '\t' + name
except ImportError:
print >> sys.stderr, 'Unable to import %s' % module_name
if __name__ == '__main__':
for module_name in sys.argv[1:]:
list_contents(module_name)
$ python lister.py lister
lister
__builtins__
__name__
__file__
list_contents
__doc__
- Note: adding a module's path to
sys.path not something you should do in general...

22) Using Manual Loading
- Have several different user interfaces, or finite difference grids, or...
- Load code based on specification in a configuration file
- Makes it easy to add new options after the fact
def loader(config_file):
result = {}
imported = {}
infile = open(config_file, 'r')
for line in config_file:
name, module, func = line.split()
if name in result:
raise LoaderError('Trying to set name %s twice', name)
if module not in imported:
imported[module] = __import__(module)
if func not in imported[module]:
raise LoaderError('Function %s not in module %s', func, module)
result[name] = func
return result

23) Manipulating Namespaces
- Can take this one step further and make dynamically-loaded objects look like "normal" variables
- Function
globals returns a dictionary of global variables
- Adding items to this "creates" new global variables
$ python
Python 2.4.1 (#1, May 27 2005, 18:02:40)
[GCC 3.3.3 (cygwin special)] on cygwin
>>> G = globals()
>>> G
{'__builtins__': <module '__builtin__' (built-in)>, '__name__': '__main__', '__doc__': None, 'G': {...}}
>>> a = 1
>>> G
- {'__builtins__': <module '__builtin__' (built-in)>, '__name__': '__main__', 'a': 1, '__doc__': None, 'G': {...}}
>>> del G['a']
>>> G
{'__builtins__': <module '__builtin__' (built-in)>, '__name__': '__main__', '__doc__': None, 'G': {...}}
>>> G['b'] = 2
>>> G
{'b': 2, 'G': {...}, '__builtins__': <module '__builtin__' (built-in)>, '__name__': '__main__', '__doc__': None}
>>> def double(x):
... return 2 * x
...
>>> G['d'] = double
>>> del G['double']
>>> d
<function double at 0x4d68b4>
>>> G
{'b': 2, 'd': <function double at 0x4d68b4>, 'G': {...}, '__builtins__': <module '__builtin__' (built-in)>, '__name__': '__main__', '__doc__': None}
- You can and should do all of this in C, Fortran, Java, C#, etc.
- Mechanics depend on language and operating system

24) Summary
- Re-using is usually more productive than rewriting
- Use new code to run old
- Open up the old code so that it can be called from the new
- Remember that programs are just data
- Program source is just text in a file
- A running program is just a data structure in memory
- Take advantage of this to make your programs leaner, cleaner, and easier to maintain
