Lessons (Version 4.0)
Material from Version 3 is also available.
- Version control is a reliable way to share files between machines.
- It also allows people to work simultaneously by tracking and managing conflicts.
- And keeps a permanent history of what state programs, data, and documents were in at various times.
- The shell is an interactive interpreter: it reads commands, finds the corresponding programs, runs them, and displays output.
- Output can be redirected using > and <.
- Commands can be combined using pipelines.
- The
history command can be used to view and repeat previous operations, while tab completion can be used to save re-typing.
- Directories (or folders) are nested to organize information hierarchically.
- Use
grep to find things in files, and find to find files themselves.
- Programs can be paused, run in the background, or run on remote machines.
- The shell has variables like any other program, and these can be used to control how it behaves.
- Variables are labels that refer to data.
- Use
while to repeat something until something changes.
- Use
for to do something once for each part of a larger whole.
- Use
if and else to make choices.
- Use lists to store many related values in order.
- Use strings to store text.
- Many variables may refer to the same piece of data.
- Define functions to break programs down into manageable pieces.
- Remember that a function is really just another kind of data.
- Use libraries to group related functions and other definitions together.
- Testing can't find all mistakes, any more than proof-reading can find all typos, but both are still useful.
- Use exceptions to report and handle errors: throw low, catch high.
- Use an xUnit library to manage unit tests in a uniform, predictable way.
- Isolating components for testing also improves code quality.
- Use approximate comparisons when dealing with floating point numbers.
- Separate test setup and teardown from test execution.
- A set is an unordered collection of distinct values.
- Sets are stored using hash tables, which makes them fast, but means their elements can't be modified after being added.
- A dictionary is an unordered collection of key-value pairs.
- Use dictionaries to store data that is sparse, irregular, or not intrinsically ordered.
- Regular expressions are written as character strings (which makes the notation somewhat clumsy).
- Alphanumeric characters match themselves.
- Use
*, +, and ? for repetition.
- Use character sets, character set shortcuts, and
| to match alternatives.
- Use parentheses to group things and to extract information from matches.
- Use the regular expression library to find all matches, replace strings, and perform other operations.
- Use a database to store and manage regular data (and make your data regular so that it can be managed).
- Use queries to express what you want, and let the computer figure out how to get it.
- Remember to account for gaps in your data.
- Use a database to store and manage regular data (and make your data regular so that it can be managed).
- Use queries to express what you want, and let the computer figure out how to get it.
- Remember to account for gaps in your data.
- Store data in a hierarchy of folders and files with regular names that can easily be pattern-matched.
- Use README files to store metadata.
- Have the computer track the programs used to process data.
- Objects combine functions with data to make both easier to manage.
- A class defines the behaviors of a new kind of thing, while an object is a particular thing.
- Classes have constructors, which describe how to create a new object of a particular kind.
- An interface describes what something can do; an implementation describes how a particular thing performs those operations.
- One class can inherit from another, then override just those things that it wants to change.
- Build programs top-down: write as if the mini-language you wanted already existed, them go back and fill in the missing pieces.
- Modular programs are easier to test and refactor than ones with many dependencies between components.
- Careful choice of algorithms and data structures often produces bigger performance improvements than parallel hardware possibly could.
- A Makefile describes how files depend on each other, and how to update out-of-date files.
- Use patterns, rules, and variables to eliminate redundancy.
- Use macros to control operation.
- Programs can open and read directories to find out what files those directories contain.
- They can work recursively on the contents of nested directories by walking the directory tree.
- Ensure the data is regular along both columns and rows.
- Use aggregation functions to combine values, and ranges to define which values to combine.
- Use built-in sorting and ranking functions to manage data.
- Use conditional and lookup functions to achieve the same effect as
if and for.
- Use pivot tables to aggregate data along multiple dimensions.
- Named ranges make updates safer and easier.
- Array libraries like NumPy store data in rectangular blocks of uniform data type.
- These libraries use a data-parallel programming model to apply one operation to many data elements.
- You should (almost) never loop over the elements of a vector or matrix: let the library do it instead.
- Use slicing to select and combine subsets of data.
- And use library functions to do higher-level linear algebra operations rather than writing them yourself.
- Array libraries like those in MATLAB store data in rectangular blocks of uniform data type.
- These libraries use a data-parallel programming model to apply one operation to many data elements.
- You should (almost) never loop over the elements of a vector or matrix: let the library do it instead.
- Use slicing to select and combine subsets of data.
- And use library functions to do higher-level linear algebra operations rather than writing them yourself.
- Empirical studies have given a firm foundation to claims about what software engineering practices are effective (while casting doubt on others).
- Agile methodologies emphasize short feedback cycles, and are most appropriate when doing exploratory development.
- Sturdy methodologies emphasize analysis and design, and are most appropriate when building large programs in well-understood problem domains.
- The most important lessons in this course can be summed up in seven principles:
- It's all just data.
- Data doesn't mean anything on its own—it has to be interpreted.
- Programming is about creating and composing abstractions.
- Models are for computers, and views are for people.
- Paranoia makes us productive.
- Better algorithms are better than better hardware.
- The tool shapes the hand.
- Common tasks frequently cut across the subjects we have discussed.
- There's almost always more than one way to do it.