Bhadra Basepair received a B.Sc. in biochemistry five years ago. She has worked since then for Genes'R'Us, a biotech firm with labs in four countries. She did a Java programming course as a freshman, and a bioinformatics course using R as a senior, but has no other training in programming.
Bhadra and her colleagues are developing fuzzy pattern-matching algorithms
for finding similarities between DNA records in standard databases.
To help other Genes'R'Us researchers,
and to test her group's heuristics,
Bhadra runs queries that people send her by email.
These queries arrive as the subject lines of messages,
in their bodies,
or as links to pages on the company's firewalled file-sharing site.
URLs to pages behind the company firewall, etc.).
Bhadra saves them in files called
search/b.in, and so on,
then edits them to add query parameters.
(She almost never accidentally overwrites one query with another…)
Once a week,
Bhadra runs all of the queries that have piled up
to create output files with matching names like
then emails those back to whoever sent the original query.
(She almost never sends someone the wrong file…)
Once every couple of months
she creates another directory called something like
and copies the input and output files she has accumulated into it.
She would eventually like to do some statistical analysis on these,
but hasn't been able to find time.
Software Carpentry will teach Bhadra how to automate this task by writing simple scripts to retrieve, process, and reply to email queries. Those scripts will automatically record inputs and outputs and update a web page every day to show her and her colleagues how their algorithms are doing.
Fan Fullerene is a graduate student in chemistry who is working as a lab technician to help cover his costs. His only programming experience is a general first-year introduction to computational science using Python.
Fan's supervisor is studying the production of fullerenes (also known as "buckyballs"). Each set of experiments involves testing a sample at 20 different temperatures and 15 different pressures. Using a machine borrowed from a collaborating lab, Fan can run all temperature and pressure combinations in one job, but must upload a parameter file to the machine to do this. The temperatures and pressures to be used vary from sample to sample, so Fan now has two dozen different parameter files, each containing 300 lines of control information that he fervently hopes is correct.
The machine sends these files to Fan once the experiment is completed. Fan analyzes them by opening Excel, copying and pasting the data into a spreadsheet, then creating a chart using the chart wizard. He then saves the chart as a PNG file on the group's web site, along with the original data file.
Software Carpentry will teach Fan how to write programs to generate parameter files and analyze experimental results, and how to track the provenance of the data he is working with so that scientists can trace backward from the final charts to the raw data they represent. It will also teach him how to create a blog that is automatically updated each time a new set of results appears, and how to make his experimental results easier to search through.
Helen Helmet, a Ph.D. student in mechanical engineering, is currently doing a six-month internship at an engineering firm that makes carbon-fiber helmets for firefighters and other emergency service personnel. Her undergraduate courses included an introduction to scientific computing using MATLAB and a robotics course that used C. She learned some Python during a co-op placement between her junior and senior years, and used it again in a graduate course on finite elements.
Helen's task is to model the non-combustive thermal degradation (otherwise known as "melting") of candidate materials. Her starting point is a 4,000-line Python program that her supervisor wrote six years ago. She is currently trying to replace the mesh deformation functions with new ones that can handle non-uniform meshes. She sometimes writes, runs, and deletes sections of code three or four times before she is satisfied.
Helen tests her program by writing
the total heat content of the mesh at each time step
to a file.
She then loads this data into a separate Python program
to graph the percentage differences between these values
and the ones produced by the original program for six sample problems.
the difference is less than 5% for five test cases,
but 30% for the sixth.
Helen has added hundreds of
Software Carpentry will teach Helen to develop and modify programs in a disciplined way, to debug programs systematically, to use tests to ensure that new code doesn't break old code, and that version control systems are a better way to manage changes than copying, pasting, and commenting out.
Mehrdad Mapping is a graduate student studying bark beetle infestations in the Canadian taiga. He has never taken a programming course, but used SAS in an undergraduate statistics course.
For the last three years, Mehrdad has spent six weeks every autumn counting beetle bores in pine trees in the Yukon and Alaska. He now has a spreadsheet with 5,000 entries, each recording the location and time of a measurement, the number of bores found, the moisture and acidity of the soil, and several other values. He also has two hundred text files containing 7,000 measurements that his supervisor made in the same regions in the 1970s and 1980s. His task now is to put both sets of measurements on a map so that he can start to correlate changes in bark beetle distribution with changes in climate.
Software Carpentry will teach Mehrdad how to clean up and manage these data sets, and how to use web services to generate and share maps with colleagues. In addition, it will show him how to produce the figures and tables he needs for his papers with just a few lines of code.