Teaching basic lab skills
for research computing

Code and Data for the Social Sciences

Matthew Gentzkow and Jesse Shapiro have written an excellent guide "Code and Data in the Social Sciences". It's short (only 38 pages), very readable, and full of practical advice for scientists of all stripes:

  • Automate everything that can be automated.
  • Write a single shell script that executes all code from beginning to end.
  • Store code and data under version control.
  • Run the whole directory before checking it back in.
  • Separate directories by function.
  • Separate files into inputs and outputs.
  • Make directories portable.
  • Store cleaned data in tables with unique, non-missing keys.
  • Keep data normalized as far into your code pipeline as you can.
  • Abstract to eliminate redundancy.
  • Abstract to improve clarity.
  • Otherwise, don't abstract.
  • Don't write documentation you will not maintain.
  • Code should be self-documenting.
  • Manage tasks with a task management system.
  • E-mail is not a task management system.