Our latest interview is with Hans Petter Langtangen, author of two books for scientists about Python (and a lot more).
Tell us a bit about your organization and its goals.
I am on 80% leave from the computer science department at the University of Oslo, to work at Simula Research Laboratory. Simula is primarily funded by the government, with a commitment to do long-term fundamental research in three areas: software engineering, networks and distributed systems, and scientific computing. My work is in the latter area. The funding is dependent on a successful scientific evaluation by an international panel every five years. About 120 people work at Simula today.
Education is the second focus point at Simula. Many researchers have appointments at the University of Oslo and do regular teaching there, like myself. A separate unit, The Simula School of Research and Innovation, organizes the education of master, PhD, and postdoc candidates at Simula. The third focus point of Simula is innovation, in particular start-up companies based on promising research results.
Tell us a bit about the software your group uses.
Our daily main task is to solve partial differential equations and write papers about the results. We use both commercial, open source, and in-house software. Examples from the first category are MATLAB, Star-CD, and Fluent. Widely used open source packages are Vtk for visualization, VMTK for creating geometries of blood vessels from images, version control systems, LaTeX for mathematical text and slides, and almost all types of Python modules. Our simulations are to a large extent done with in-house software. This can be smaller, specialized, stand-alone programs, or more general programs written in a larger framework, primarily FEniCS.
Most of us are very down to the ground when it comes to computer tools: the main working horse is a plain text editor, Emacs or Vim, in combination with a terminal window. Automation via scripting is being done all day long. Now we are heavy users of Python, while we a decade ago depended much on Perl, Bash, Sed, Awk, Make, Autotools and that generation of software. For compiling and linking applications, SCons is a popular tool now, but Make or perhaps just some Python or Bash script may be sufficient. It goes without saying that we love command-based tools and hate GUIs. Integrated development environments, say Eclipse, are hardly used by anyone. We demand people to use version control systems, mainly Mercurial, Subversion, or Bazaar, for software development as well as paper and book writing.
Tell us a bit about what software your group develops.
Our research group has a long history in developing frameworks for solving partial differential equations. In the 1980s, Fortran 77 and Bourne shell were the languages. In the 1990s, C++ and Perl dominated, especially in the Diffpack development. The languages of choice in our group have in the 2000s been Python and C++. MATLAB is also popular, usually before new people discover that Python can do the same — and more.
At the moment we are heavily involved in the FEniCS project, which consists of a dozen software components, written in C++ and Python. Several institutions and an international user community participate in the development of FEniCS components, applications, and documentation.
Most FEniCS simulators are written in Python, but the Python program generates C++ code tailored to the problem at hand, and links this C++ code to general libraries for finite element computations, linear algebra packages, etc. Simula has the primary responsibility now for distributing FEniCS as an open source system. Building and testing FEniCS, with and all its dependencies, such as PETSc and Trilinos, can quickly be a nightmare. We have a dedicated scientific programmer working with a Builbot system for FEniCS as well as packaging FEniCS for Debian and other binary distribution repositories.
We also develop some other, smaller open source packages. I have recently been involved in three: scitools for my books, latexslides for Python-generated latex slides, and ptex2tex for extending latex. These are distributed through googlecode.
Tell us about your course and your books.
I guess you're interested in books related to scientific software? The Python Scripting for Computational Science book evolved from the need to teach my master and PhD students what scripting is. In the mid 1990s I used Perl a lot, but the only Perl book, the famous Camel book, didn't give my students the vaguest idea how Perl could be used to do our science in a more effective and reliable way.
Therefore, I started a course in 1999 at the University with the aim of teaching scripting and automation in science. Or, honestly speaking, the aim was to avoid reteaching this topic to every new master or PhD student that entered the group. The course notes initially explained how to do scripting in Perl. Over a couple of years, however, we experienced increasing use of Python in-house, as it was much easier to maintain Python code than Perl code. Students also learned Python more quickly than Perl, and all the entertaining side effects and "smart behavior" of Perl was actually found disturbing in teaching, at least when you see Perl and Python in action side by side.
The course notes evolved into a book in 2003, exclusively with Python-based material. At that time, there were very few Python books and little documentation of how to effectively do non-numeric administering work in the context of science and high-performance computing. The interest in these topics exploded over the next years. The book was quickly sold out, and a demand for new editions arose. Now the book is a best-selling one in its category. For the fun of it I like to mention that the publisher was not very fond of such a scripting book when I first suggested it in 2000, the market was simply considered too small. But I wrote the book, knowing that this material was important!
So, to my next book project, A Primer on Scientific Programming with Python. All over the world, you find computer science departments offering a first programming course with Java as language. This is not optimal for science, and experience shows that most students need to relearn programming in the context of MATLAB, Fortran, C, C++ and scripting languages. At the University of Oslo we introduced a major reform in science education in 2003, with the aim of using programming and simulation as tools for exploring mathematical models throughout all courses, also at the Bachelor level. As part of this reform, we needed a programming course the first semester which could target science problems, numerical methods, and programming styles for later science courses. The idea was to adopt Python as first language and focus both on MATLAB-style programs as well as on object orientation. Recall that OO was invented 300 meters from the building where this course is taught! We wanted an integrated approach so that programming could be learned via examples involving scientific applications, from physics, biology and finance, combined with numerical approaches to handle the mathematics.
No existing book offered an integrated approach, so it was again natural to develop a book along with the course. The "primer" book was published in 2009 and has been very well received. Although it targets newbies in programming, it seems that the easy-to-read style is also useful for experienced scientists and PhD students who want to do scientific computing with Python.
The "primer" book aims at numerical computing, while the "scripting" book essentially deals with all the non-numerical tasks you need when doing math on the computer. Since there are few days without an email with "thank you for your excellent book... I have a question..." from from people all around the world, these books seemingly prove useful. The example-oriented writing style with much code that can be directly copied to the reader's own problem area is probably the main feature for the popularity and large sales. It's also quite amusing that the "scripting" book is selling so well despite being available at various pirate sites. It seems that people still want hardcopies of books, not PDF files only.
How do you tell what impact the course has had?
We have educated over 1000 people in Python, both scientists and administrative software developers. When we started in 1999, Python has hardly used at all in Norwegian industry. Our candidates with Python knowledge have introduced this language in a lot of companies, and now there is a significant need for Python competence out there. And by "Python" I mean much more than the language, it's the way of working: automating manual operations for reliability, being more effective, knowing a lot of useful modules, seeing new ways to do things, etc.
Most of these students applied their Python and scripting knowledge in their work with master and PhD theses, which we believe has led to more effective and reliable research. Also, the courses we offer educate our own students with the right tools for doing a thesis in our group. However, when we recruit PhD students from elsewhere, without this education, we see a demand for a quick to-the-point course on what you need to know about effective working habits in a "terminal window". This is where your Software Carpentry hopefully comes to rescue!
What are your plans for future work?
The "scripting" course has been very popular for 11 years now. Unfortunately, nobody oustide our own research group sees any interest in maintaining and developing this course. The technology is rapidly evolving, and many of the tools in the first edition of the "scripting" book became quickly outdated. Since we are scientists with little time for teaching, it is hard to keep up with the technological improvements and incorporate them into new editions of the book and the course. We end up with doing minimal updates, which is not satisfactory. However, we also have strong interest in and need for other more science-oriented courses, so I anticipate that future book projects and courses will be on other subjects.