Up: Lectures

Data

September 19th, 2011 Leave a comment Go to comments

Short discussions of all things data-related.

Requires: Basic Python.

Introduces: Folder structures for archiving data, and the Bein library for data management.

Motivating questions:

  1. I have many data files, and different versions of each, how do I keep track of them all?
  2. My programs generate many files, how do I keep track of them all?
  3. My programs make calls to external programs, how do I keep track of how they were called, and make this simpler to do?

Lectures:

  1. Data management
    • Archiving different versions of data files (not using version control)
    • Using a chronological directory structure. e.g. /data/2010-01-02/samples.mat
    • Using README files to store metadata (under version control).
    • Using a pipeline directory structure. e.g. /parse/align/filter
  2. Bein
    • Tracking the external programs you run and files to preserve
    • Simplify calling external programs
    • Execution blocks for isolating chunks of code

  1. No comments yet.
  1. No trackbacks yet.