Software Carpentry

Helping scientists make better software since 1997

Three More Sets of Slides

with one comment

As promised in yesterday’s post, the slides for three more episodes on the Unix shell are now online, covering how to find things, job control, and shell variables. Next up: public/private keys and SSH.

Written by Greg Wilson

September 1st, 2010 at 9:06 pm

Posted in Content

Five Episodes on the Shell (and Three to Come)

with one comment

I’ve just posted the slides for the first five episodes of the lecture on the Unix shell, covering:

  1. Introduction (or, why would anyone do this to themselves, really?)
  2. Files and directories
  3. Creating and deleting things
  4. Pipes and filters
  5. Permissions

Minor things I have yet to cover:

  • Drive letters on Windows, and how Cygwin hacks around them
  • The ‘echo’ command
  • (Briefly) Windows access control lists
  • Wildcards in filenames (‘*’ has been covered)

Major things (the remaining three episodes):

  • Environment variables, the export thereof, and how to configure things with ‘.bashrc’
  • Job control (Ctrl-Z, ‘jobs’, ‘fg’, and the ‘ps’ command)
  • Finding things with ‘grep’ and ‘find’

I’ll do the other three slide decks tomorrow, then record and post the screencasts on Thursday and Friday. Please let me know if I’ve left out something absolutely essential…

Written by Greg Wilson

August 31st, 2010 at 7:21 pm

Posted in Content

Four More Screencasts on Testing

without comments

After far too many delays, the next four screencasts on testing are finally available, covering systematic unit testing with Nose, why you should test interfaces rather than implementations, why testing floating point is hard, and how to create and manage fixtures systematically. I hope you enjoy them—have a good weekend.

Written by Greg Wilson

August 27th, 2010 at 8:47 pm

Posted in Content,Version 4

Another Update on What You Want

with one comment

Responses have slowed, so here are the final scores for the topics we’re including (or thinking about including) in this course. It looks like the Unix shell just might go back in…

2.51 Automating Repetitive Tasks
2.50 Reproducible Research
2.50 Data Visualization
2.47 Version Control
2.44 Performance Optimization
2.41 Data Structures
2.39 Testing and Quality Assurance
2.39 Coding Style
2.38 Basic Programming
2.35 Using the Unix Shell
2.35 Parallel Programming
2.35 Debugging with a Debugger
2.29 Computational Complexity
2.22 Object-Oriented Programming
2.20 Working in Teams/on Large Projects
2.19 Designing a Data Model
2.15 Refactoring
2.10 Matrix Algebra
2.09 Static and Dynamic Code Analysis Tools
2.07 Systems Programming
2.04 Integrating with C and Fortran
2.03 Design Patterns
2.01 Packaging Code for Release
1.95 Functional Languages
1.94 Handling Binary Data
1.82 Image Processing
1.74 Build a Desktop User Interface
1.73 XML
1.65 Create a Web Service
1.39 Geographic Information Systems

Written by Greg Wilson

August 26th, 2010 at 1:46 am

Posted in Content

What Don’t You Understand That You’d Like To?

with one comment

What don’t you understand about computers and computing that you’d like to? I’m not talking about big stuff, like “How does it all work, anyway?” or, “How do I write a game that’s as addictive as BubbleSpinner?” What don’t you understand about smaller-scale stuff, like handwriting recognition or how Excel knows which cells to update when you change a number or how your machine decides that it’s time to update Java and what it does next. Please let me know, or if you’re feeling shy, point your friends at this page and have them ask for you—we have to use something as motivating examples for this course, and if we can explain something interesting while teaching you fundamental computing concepts, so much the better.

Written by Greg Wilson

August 23rd, 2010 at 5:29 pm

Posted in Content

Slides and Scripts for the Next Two Episodes

without comments

I have posted slides and transcripts for the next two lectures on testing, which cover unit testing with Nose, and testing interfaces instead of implementations. I was hoping to record the screencasts today, but setting up a new 64-bit desktop machine took longer than expected—hopefully they’ll be up early next week, along with episodes on floating point and setting up fixtures. Comments, as always, would be greatly appreciated.

Written by Greg Wilson

August 19th, 2010 at 9:18 pm

Posted in Lectures,Version 4

43% Independent

without comments

Cecilia d’Oliveira and colleagues recently wrote an essay in Science about MIT’s OpenCourseWare initiative, ten years after its inception. Among the stats:

OCW currently receives upwards of 1.5 million visits each month from ~900,000 unique individuals. Students have grown to 42% of the audience, and educators and independent learners now constitute 9% and 43% of visitors, respectively. Twelve percent of educators responding to a March 2010 visitor survey indicated that they do incorporate OCW materials into their own content as anticipated, but educators more frequently use OCW for personal learning (37%), to adopt new teaching methods (18%), and as a reference for their students (16%). Students were largely expected to use the site as a supplement to materials they received in their own classes, a use identified by 40% of students. Just over 43%, however, indicated that they also use OCW for personal learning beyond the scope of their formal studies, and a further 12% use it as an aid in planning their course of study. Independent learners use OCW in a variety of personal (41%) and professional (50%) contexts, including home-schooling children and keeping up on developments in their professional field. 66% of visitors indicate they are mostly or completely successful at meeting their educational goals for visiting the site.

I wish there was more detailed analysis of what’s worked and what hasn’t (I’d obviously like to imitate their successes and avoid their mistakes), but even without that, it’s clear that bums-in-seats is not the future of higher education…

Written by Greg Wilson

August 16th, 2010 at 8:09 pm

Posted in Noticed

Interview with Cameron Neylon

without comments

Today’s interview is with Cameron Neylon, a noted advocate of open science.

Tell us a bit about your organization and its goals.

I work for the UK Science and Technology Facilities Council. We are a research funder but although we provide some direct funding our main role is to build and run or subscribe to large scale research infrastructure on behalf of UK scientists. For instance we run telescopes, pay the UK subscription to CERN, as well as supporting and running synchrotrons, neutron sources, high powered lasers, microfabrication facilities and large scale computing infrastructure.

I work at the ISIS Neutron Scattering Facility which hosts several thousand scientists a year doing hundreds of experiments on around 20 different instruments. We help to select which experiments get done, support sample preparation, assist with the planning and running of experiments, as well as data analysis, sometimes all the way to publication. My group focusses on support and development of new techniques for biological scientists.

Tell us a bit about the software your group uses.

We use a big mix of things. Like most experimental scientists Word and Excel figure a lot in basic analysis and record keeping. We use a blog based laboratory notebook system (biolab.isis.rl.ac.uk) developed in collaboration with the University of Southampton. The instruments are highly specialised and are run with software developed in house and first stage analysis is moving to a new framework called Mantid (mantidproject.org).

After the first stage we move to all sorts of tools based on what we need and the scientific problem. Specialist analysis software, usually built by individuals or groups, often requiring some sort of proprietary framework (MatLab is common and Igor from Wavemetrics is quite often used), is put together in ad hoc pipelines to attack a problem from several different directions. This is often quite haphazard.

Some examples include RaSCAL (MatLab: http://sourceforge.net/projects/rscl/), ATSAS suite (closed source mostly command line drive suite for scattering analysis: http://www.embl-hamburg.de/ExternalInfo/Research/Sax/software.html), and NIST SANS analysis tools (Igor Pro: http://www.ncnr.nist.gov/programs/sans/data/red_anal.html).

Tell us a bit about what software your group develops.

The Mantid project has provided us with a Python scripting and GUI environment which has made it possible to provide some simple tools for ourselves and some users and to help us integrate this with our blog based record keeping system. Most of what I do is based in the immediate needs of our group but with an eye to making it more useful to a wider community. Often it involves trying to make those disparate data analysis pipelines easier to use, more consistent, and to enable easier and better record keeping of the analysis process. We use our experience of problems to try and build things that are useful for our wider community.

What’s the typical background of your scientists, developers, and/or users?

Most of the scientists we deal with have no specific experience of programming. In rare cases they have a little experience of scripting or command line work. They are focussed on outcomes and getting results rather than tools. This leads to ad hoc procedures and pipelines that are usually inefficient and badly recorded. Most could look at simple scripts and manipulate those for their needs. However the lack of experience in programming “properly” and a lack of knowledge of best practice leads to messy and incomprehensible, often unusable tools. An understanding of test driven software design and versioning for safe development is rare.

Those scientists who do build software and are comfortable with programming rarely have any skill or experience in user interface design leading to difficult to use interfaces and GUIs that confuse users.

How do you hope Software Carpentry will help them?

Good practice, good testing, good documentation, and availability of code for checking. On top of this a good understanding of how to think about the design of a specific piece of software and some knowledge of common design patterns to aid in the more rapid development of good and re-usable software.

How will you tell what impact the course has had (if any)?

I’ll see some comments in people’s code and I’ll be able to get at it in an appropriate repository. When I get this code and read the comments I’ll be able to understand how I might re-use it for my own purposes. If the course can achieve that or steps towards that I’ll be very happy!

Written by Greg Wilson

August 12th, 2010 at 11:25 pm

Posted in Interviews

Software Carpentry for Audio and Music Researchers

without comments

I will be teaching a version of Software Carpentry tailored for audio and music researchers at Queen Mary University in London from November 1st to 5th, 2010.  From the full announcement:

We are seeking nominations for a small number (up to 15) of UK-based PhD students or early career researchers to attend the Autumn School. The SoundSoftware.ac.uk project will pay for reasonable travel and accommodation costs for attendees.

Due to the level of interest we anticipate in this initial Autumn School, and to obtain a balance of attendees from several UK research groups, we are asking Research Groups to nominate potential attendees. If you are a PhD student or researcher in a UK research group, please contact your PhD supervisor, line manager, or Head of Group to ask them to nominate you.

I look forward to meeting everyone there!

Written by Greg Wilson

August 5th, 2010 at 6:04 pm

An Answer That Most Students Won’t Understand

with one comment

Two days ago, I asked how to generates tests from tables of fixtures using Nose:

…does Nose already have a tool for running through a table of fixtures and expected results? My hand-rolled version is:

Tests = (
    #  R1                 R2                  Expected
    ( ((0, 0), (0, 0)),   ((0, 0), (0, 0)),   None ),
    ( ((0, 0), (0, 0)),   ((0, 0), (1, 1)),   None ),
    ( ((0, 0), (1, 1)),   ((0, 0), (1, 1)),   ((0, 0), (1, 1)) ),
    ( ((0, 3), (2, 5)),   ((1, 0), (2, 4)),   ((1, 3), (2, 4)) )
)

def test_table():
    for (R1, R2, expected) in Tests:
        yield run_it, R1, R2, expected

def run_it(R1, R2, expected):
    assert overlap(R1, R2) == expected

which is simple enough if students already understand generators and function application, but hell to explain if they don’t—and they won’t.

After some back and forth, Jacob Kaplan-Moss (of Django fame) came up with this:

def tabletest(table):
    def decorator(func):
        def _inner():
            for args in table:
                yield tuple([func] + list(args))
        _inner.__name__ = 'test_'+func.__name__
        return _inner
    return decorator

table = [(1, 2), (3, 4)]

@tabletest(table)
def check_pair(left, right):
    assert left > right

The outer function tabletest takes the table of fixtures as an argument, and produces a function of one argument. That argument is supposed to be the function that is being wrapped up by the decorator, so:

@tabletest(table)
def check_pair(...):
    ...

means:

decorator = tabletest(table)
check_pair = ...what the 'def' creates...
check_pair = decorator(check_pair)

With me so far? Now, what decorator does is take a function F as an argument, and create a new function F’ that produces each combination of the original F with the entries in the table: in jargon, it creates a generator that yields F and the arguments that F should be applied to.

But what’s that inner_.__name__ stuff? That’s to make sure that the wrapped function’s name starts with the letters “test_”, because that’s how Nose knows to run it.

This does exactly what I wanted, but sparks three comments:

  1. Thanks, Jacob: I can understand the solution once it’s in front of me, but it would have taken me a long time to figure this out myself.
  2. Treating programs as data, i.e., manipulating code just as you’d manipulate arrays or strings, is incredibly powerful.
  3. Only a tiny fraction of the students who complete this course will understand how this works. I’m sure they all could, if they wanted to invest the time, but given their usual starting point, they’d have to invest a lot of time.

#3 is what many advocates of new technology (functional languages! GPUs! functional languages on GPUs!) consistently overlook. What Jacob did here is really quite elegant, but in the same way that the classic proof of Euler’s theorem is elegant: you have to know quite a lot to understand it, and even more to understand its grace. People who have that understanding often forget what the world looks like to people who don’t; we’re trying hard not to, and would be grateful if readers and viewers could tell us when we slip up.

Written by Greg Wilson

August 5th, 2010 at 4:14 pm

Posted in Community,Opinion