Data
Short discussions of all things data-related.
Requires: Basic Python.
Introduces: Folder structures for archiving data, and the Bein library for data management.
Motivating questions:
- I have many data files, and different versions of each, how do I keep track of them all?
- My programs generate many files, how do I keep track of them all?
- My programs make calls to external programs, how do I keep track of how they were called, and make this simpler to do?
Lectures:
- Data management
- Archiving different versions of data files (not using version control)
- Using a chronological directory structure. e.g. /data/2010-01-02/samples.mat
- Using README files to store metadata (under version control).
- Using a pipeline directory structure. e.g. /parse/align/filter
- Bein
- Tracking the external programs you run and files to preserve
- Simplify calling external programs
- Execution blocks for isolating chunks of code
