Lessons (Version 4.0)

Boot Camp at the Scripps Institute, 2012

Material from Version 3 is also available.

Version Control

  • Version control is a reliable way to share files between machines.
  • It also allows people to work simultaneously by tracking and managing conflicts.
  • And keeps a permanent history of what state programs, data, and documents were in at various times.

The Shell

  • The shell is an interactive interpreter: it reads commands, finds the corresponding programs, runs them, and displays output.
  • Output can be redirected using > and <.
  • Commands can be combined using pipelines.
  • The history command can be used to view and repeat previous operations, while tab completion can be used to save re-typing.
  • Directories (or folders) are nested to organize information hierarchically.
  • Use grep to find things in files, and find to find files themselves.
  • Programs can be paused, run in the background, or run on remote machines.
  • The shell has variables like any other program, and these can be used to control how it behaves.

Python

  • Variables are labels that refer to data.
  • Use while to repeat something until something changes.
  • Use for to do something once for each part of a larger whole.
  • Use if and else to make choices.
  • Use lists to store many related values in order.
  • Use strings to store text.
  • Many variables may refer to the same piece of data.
  • Define functions to break programs down into manageable pieces.
  • Remember that a function is really just another kind of data.
  • Use libraries to group related functions and other definitions together.

Testing

  • Testing can't find all mistakes, any more than proof-reading can find all typos, but both are still useful.
  • Use exceptions to report and handle errors: throw low, catch high.
  • Use an xUnit library to manage unit tests in a uniform, predictable way.
  • Isolating components for testing also improves code quality.
  • Use approximate comparisons when dealing with floating point numbers.
  • Separate test setup and teardown from test execution.

Sets and Dictionaries

  • A set is an unordered collection of distinct values.
  • Sets are stored using hash tables, which makes them fast, but means their elements can't be modified after being added.
  • A dictionary is an unordered collection of key-value pairs.
  • Use dictionaries to store data that is sparse, irregular, or not intrinsically ordered.

Regular Expressions

  • Regular expressions are written as character strings (which makes the notation somewhat clumsy).
  • Alphanumeric characters match themselves.
  • Use *, +, and ? for repetition.
  • Use character sets, character set shortcuts, and | to match alternatives.
  • Use parentheses to group things and to extract information from matches.
  • Use the regular expression library to find all matches, replace strings, and perform other operations.

Databases

  • Use a database to store and manage regular data (and make your data regular so that it can be managed).
  • Use queries to express what you want, and let the computer figure out how to get it.
  • Remember to account for gaps in your data.

Using Access

  • Use a database to store and manage regular data (and make your data regular so that it can be managed).
  • Use queries to express what you want, and let the computer figure out how to get it.
  • Remember to account for gaps in your data.

Data

  • Store data in a hierarchy of folders and files with regular names that can easily be pattern-matched.
  • Use README files to store metadata.
  • Have the computer track the programs used to process data.

Classes and Objects

  • Objects combine functions with data to make both easier to manage.
  • A class defines the behaviors of a new kind of thing, while an object is a particular thing.
  • Classes have constructors, which describe how to create a new object of a particular kind.
  • An interface describes what something can do; an implementation describes how a particular thing performs those operations.
  • One class can inherit from another, then override just those things that it wants to change.

Program Design

  • Build programs top-down: write as if the mini-language you wanted already existed, them go back and fill in the missing pieces.
  • Modular programs are easier to test and refactor than ones with many dependencies between components.
  • Careful choice of algorithms and data structures often produces bigger performance improvements than parallel hardware possibly could.

Make

  • A Makefile describes how files depend on each other, and how to update out-of-date files.
  • Use patterns, rules, and variables to eliminate redundancy.
  • Use macros to control operation.

Systems Programming

  • Programs can open and read directories to find out what files those directories contain.
  • They can work recursively on the contents of nested directories by walking the directory tree.

Spreadsheets

  • Ensure the data is regular along both columns and rows.
  • Use aggregation functions to combine values, and ranges to define which values to combine.
  • Use built-in sorting and ranking functions to manage data.
  • Use conditional and lookup functions to achieve the same effect as if and for.
  • Use pivot tables to aggregate data along multiple dimensions.
  • Named ranges make updates safer and easier.

Matrix Programming

  • Array libraries like NumPy store data in rectangular blocks of uniform data type.
  • These libraries use a data-parallel programming model to apply one operation to many data elements.
  • You should (almost) never loop over the elements of a vector or matrix: let the library do it instead.
  • Use slicing to select and combine subsets of data.
  • And use library functions to do higher-level linear algebra operations rather than writing them yourself.

MATLAB

  • Array libraries like those in MATLAB store data in rectangular blocks of uniform data type.
  • These libraries use a data-parallel programming model to apply one operation to many data elements.
  • You should (almost) never loop over the elements of a vector or matrix: let the library do it instead.
  • Use slicing to select and combine subsets of data.
  • And use library functions to do higher-level linear algebra operations rather than writing them yourself.

Multimedia Programming

  • Images may be stored in many different formats, but are normally represented in memory as rectangular patches of pixels.
  • Media formats are either lossy or lossless.

Software Engineering

  • Empirical studies have given a firm foundation to claims about what software engineering practices are effective (while casting doubt on others).
  • Agile methodologies emphasize short feedback cycles, and are most appropriate when doing exploratory development.
  • Sturdy methodologies emphasize analysis and design, and are most appropriate when building large programs in well-understood problem domains.
  • The most important lessons in this course can be summed up in seven principles:
    1. It's all just data.
    2. Data doesn't mean anything on its own—it has to be interpreted.
    3. Programming is about creating and composing abstractions.
    4. Models are for computers, and views are for people.
    5. Paranoia makes us productive.
    6. Better algorithms are better than better hardware.
    7. The tool shapes the hand.

Essays

  • Common tasks frequently cut across the subjects we have discussed.
  • There's almost always more than one way to do it.