Teaching basic lab skills
for research computing

Solution to Data Merging with Dictionaries

This week's tutorial problem was to merge the data from a set of input files to show how often different species were observed on different dates. The shell pipeline, Python code, and two sample input files follow the video.

shell command

grep -h -v '#' *.txt | sort | uniq -c

merge.py

import sys

# Read and merge data.
results = {}
filenames = sys.argv[1:]
for f in filenames:
    reader = file(f, 'r')
    for line in reader:
        if line.startswith('#'):
            pass
        else:
            date, species = line.split()
            key = (date, species)
            if key not in results:
                results[key] = 1
            else:
                results[key] += 1
    reader.close()

# Format output.
all_combos = results.keys()
all_combos.sort()
for key in all_combos:
    count = results[key]
    print count, key[0], key[1]

cousteau.txt

# Jacques Cousteau
2012-03-27 marlin
2012-03-29 tuna
2012-03-29 tuna
2012-03-29 turtle

haddock.txt

# Steve Haddock
2012-03-28 squid
2012-03-28 marlin
2012-03-28 marlin
2012-03-29 eel
2012-03-29 squid
2012-03-29 turtle
2012-03-29 turtle
2012-03-30 squid
2012-03-31 turtle

TUTORIAL

Dialogue & Discussion

You can review our commenting policy here.