Teaching basic lab skills
for research computing

A Question and Answer Matrix for Software Carpentry

Following up on yesterday's post about applying educational principles to this course, here's a not-yet-completed Q&A matrix for this course. The section headings are questions people ask (or equivalently, tasks they want to perform). The headings underneath are the major topics Software Carpentry covers, and below each of those is my attempt to relate those topics to the questions. "TBD" means "I haven't written it yet", while "N/A" means "I can't think of any relationship." This matrix is going to be the basis of our next big reorganization of material (which should start this fall), so we would be very grateful for your input:

  • What have we missed?
  • What's in the wrong place?
  • Most importantly, can we reframe our key questions to divide things up more usefully or more logically, and if so, how?

Thanks for your help!

  1. Q01: How can I manage this data?
  2. Q02: How can I process it?
  3. Q03: How can I tell if I've processed it correctly?
  4. Q04: How can I find and fix bugs when I haven't?
  5. Q05: How can I keep track of what I've done?
  6. Q06: How can I find and use other people's work?
  7. Q07: How can other people find and use mine?
  8. Q08: How can I do all these things faster?

Q01: How can I manage this data?

The Shell
  1. Use directories and sub-directories with meaningful names.
  2. Use filenames that can easily be matched with wildcards.
  3. Use filename extensions that indicate the type of data in the file.
  4. Use text unless there's a powerful reason to use something else.
Version Control
  1. If it's megabytes or less, put it under version control.
Basic Programming
  1. Create and use data formats that are easy for programs to parse.
Functions and Libraries
TBD
Databases
  1. Store it in a relational database.
  2. Store each atom of information in its own field.
  3. Make sure each record has a unique key.
  4. Make sure that information is never duplicated.
  5. Use foreign keys and joins to combine information from different tables.
Number Crunching
  1. Represent it as a matrix, because that's easy to process.
Quality
N/A
Sets and Dictionaries
TBD
Development
N/A
Web Programming
  1. Format it as HTML (or XML, or some other widely-used format).
  2. Separate content from presentation (e.g., use CSS for styling).

Q02: How can I process it?

The Shell
  1. Use Unix commands that manipulate lines of text.
  2. Combine those commands using pipes and redirection.
  3. Use loops to perform the same operations on many files.
Version Control
N/A
Basic Programming
  1. Write programs that use loops, file I/O, and string splitting to read data.
  2. Use floating-point numbers unless you are sure all values (including calculated values) will always be integers.
Functions and Libraries
TBD
Databases
  1. Write SQL queries to select, filter, aggregate, and sort data.
  2. Use a general-purpose programming language for everything else.
Number Crunching
  1. Use a linear algebra package like NumPy.
Quality
N/A
Sets and Dictionaries
TBD
Development
  1. Use the right data structures.
Web Programming
  1. Use an HTTP library to fetch it.
  2. Use an XML or JSON library to parse it.

Q03: How can I tell if I've processed it correctly?

The Shell
N/A
Version Control
N/A
Basic Programming
  1. Test your programs with small data sets whose results can be checked by hand.
Functions and Libraries
TBD
Databases
  1. Build queries in small steps.
  2. Run queries against small data sets whose output can be checked manually.
Number Crunching
  1. Compare a program's output to analytic results, experimental results, simplified test cases, and previous programs.
  2. Use tolerances when comparing results.
Quality
  1. Create simple data sets for which the right answer can be calculated by hand.
  2. Compare the results produced by the new program to results produced by older programs.
Sets and Dictionaries
TBD
Development
  1. Make code testable by dividing it into functions, and then replacing some functions with others for testing purposes.
Web Programming
N/A

Q04: How can I find and fix bugs when I haven't?

The Shell
N/A
Version Control
N/A
Basic Programming
N/A
Functions and Libraries
TBD
Databases
N/A
Number Crunching
N/A
Quality
  1. Write test cases that fail when the bug is present, but pass when the bug is fixed.
  2. Add assertions to programs to check its internal consistency.
  3. Use a debugger.
Sets and Dictionaries
TBD
Development
  1. Write tests.
Web Programming
N/A

Q05: How can I keep track of what I've done?

The Shell
N/A
Version Control
  1. Keep your work under version control.
  2. Check in whenever you've completed a significant change.
  3. Write meaningful check-in comments.
Basic Programming
  1. Put version control IDs in programs (and data files), and copy them forward to results.
Functions and Libraries
TBD
Databases
  1. Store queries in files (just like programs).
Number Crunching
N/A
Quality
  1. Turn bug fixes into assertions and test cases.
  2. Use a coverage analyzer to see what code is and isn't being tested.
Sets and Dictionaries
TBD
Development
N/A
Web Programming
  1. Use meta headers in your HTML/XML data files.

Q06: How can I find and use other people's work?

The Shell
N/A
Version Control
  1. Get it from their version control repositories.
Basic Programming
N/A
Functions and Libraries
TBD
Databases
N/A
Number Crunching
N/A
Quality
N/A
Sets and Dictionaries
TBD
Development
N/A
Web Programming
  1. Ask them to use well-formed URLs.
  2. And to format it according to well-defined machine-readable standards (e.g., XML or JSON).

Q07: How can other people find and use mine?

The Shell
N/A
Version Control
  1. Put your work in a publicly-accessible version control repository.
Basic Programming
N/A
Functions and Libraries
TBD
Databases
  1. Raise exceptions to signal errors so that other people can handle them as they think best.
Number Crunching
N/A
Quality
N/A
Sets and Dictionaries
TBD
Development
N/A
Web Programming
  1. Put it on the web at a stable URL.
  2. Format it according to well-defined machine-readable standards (e.g., XML or JSON).
  3. Include meta-data.

Q08: How can I do all these things faster?

The Shell
  1. Put commands in shell scripts so that they can be re-used.
Version Control
N/A
Basic Programming
  1. Use appropriate variable names so that people will waste less time trying to read programs.
Functions and Libraries
TBD
Databases
N/A
Number Crunching
  1. Use a linear algebra package like NumPy.
Quality
  1. Design code for testing.
  2. Write test cases before writing new code.
Sets and Dictionaries
TBD
Development
  1. Use a profiler to figure out why code is slow before trying to optimize it.
  2. Build code so that parts can be replaced easily.
Web Programming
N/A

Dialogue & Discussion

You can review our commenting policy here.