Fooling the Internet

May 15th, 2012 No comments

A recent article in The Atlantic titled, “How the Professor Who Fooled Wikipedia Got Caught By Reddit” describes how GMU’s Prof. T. Mills Kelly has had students fake history online, and how their most recent effort unraveled. There’s lots to think about here regarding what scientists should know about using the web, trusting it, and making it their own…

 

Categories: Noticed Tags:

Two Boot Camps in Ontario in July

May 15th, 2012 No comments

We are pleased to announce that we will be running two boot camps in Ontario in July: one at the University of Waterloo on July 12-13, and another at the University of Toronto (Scarborough) on July 19-20. If you’d like to take part, please sign up, and please let friends and colleagues know about them as well.

Categories: Boot Camp, Version 5.0 Tags:

Solution to Indented List Problem

May 14th, 2012 No comments

Last week’s homework was to convert a two-level bullet-point list like this:

* A
* B
  * 1
  * 2
* C
  * 3

into an HTML list like this:

<ul>
  <li>A</li>
  <li>B
    <ul>
      <li>1</li>
      <li>2</li>
    </ul>
  </li>
  <li>C
    <ul>
      <li>3</li>
    </ul>
  </li>
</ul>

so it would display like this:

  • A
  • B
    • 1
    • 2
  • C
    • 3

My solution is shown in the video below; the code follows.

import sys

def do_inner(lines, current):
    need_to_start = True
    need_to_close = False
    while (current < len(lines)) and \
          lines[current].startswith('  * '):
        if need_to_start:
            print '  <ul>'
            need_to_start = False
        text = lines[current].lstrip('  * ').rstrip()
        print '  <li>' + text + '</li>'
        need_to_close = True
        current += 1
    if need_to_close:
        print '  </ul>'
    return current

def do_outer(lines):
    print '<ul>'
    current = 0
    while current < len(lines):
        assert lines[current].startswith('* ')
        text = lines[current].lstrip('* ').rstrip()
        print '<li>' + text
        current = do_inner(lines, current+1)
        print '</li>'
    print '</ul>'

lines = sys.stdin.readlines()
do_outer(lines)

Categories: Tutorial, Version 5.0 Tags:

Feedback from Michigan State

May 12th, 2012 6 comments

Our workshop at Michigan State University this week was three days long instead of two, and included two topics (Git and the IPython notebook) that we haven’t tried before.  Feedback was generally positive, but we’ve got lots to work on for next time as well.

Good Bad
  • Using history
  • Ending with general theory
  • Pen and paper database design
  • Version control was useful
  • Good practice in software
  • Concise and module programming in Python
  • Console segment
  • Smooth flow between Bash and Python
  • Challenging and flowed nicely
  • Futher reading material
  • Desktop setup
  • Instructor teaching style
  • Permission to spend less time coding
  • iPython notebook looks great
  • Paired programming model
  • Git script (tutorials used were available)
  • Legal issues (opensource)
  • Good for beginners
  • Free course (and food!)
  • Variety
  • Practical (reality-based)
  • Overview of DB options
  • Testing
  • Better ways to do things
  • Somewhat static seating created helpful partners
  • Typing speed is too fast
  • Class time chunks too long
  • Why iPython
  • Need more ‘why’
  • Curriculum
  • Advanced Git bounced
  • Too much switching screens
  • Some things failed
  • Beverages included only caffeine
  • Need snacks at breaks
  • Lacked connection between course material and applicability
  • Tuesday way too long
  • Wanted a cheat sheet
  • Not enough exercises
  • How to create DB
  • Anti-Windows bigotry
  • Next day install at end of day
  • Some concepts skipped
  • Don’t know where to start (registrationg etc.)
  • Inappropriate room size
  • Breadth
  • PPT for CS

Run My Code

May 11th, 2012 1 comment

RunMyCode is a web site and service intended to support reproducible research (initially in computational economics). Authors create companion web sites for papers that include the software they used; other people can then re-run their models, and (crucially) play with parameters, using cloud-based instances of those environments. They only support MATLAB, R, and SAS right now, but are hoping to add more tools soon. It’s a cool idea, and we’d welcome your impressions.

Categories: Noticed Tags:

Fish and Bugs

May 10th, 2012 No comments

The May/June 2012 issue of Washington Monthly has an article by Alison Fairbrother titled “A Fish Story“. Near the top, it says, “In 2009, a routine methodological upgrade at NOAA—and the subsequent discovery of a few lines of faulty computer code—forced the start of a profound shift in the ASMFC’s estimates of menhaden stocks.” A few pages later, we get more details:

In 2009, the Menhaden Technical Committee updated its methodology for estimating the menhaden population—something it does every five years—and then ran the menhaden catch data through a new computer model. The results weren’t much different: although the numbers of menhaden were declining, the estimated number of eggs produced by spawning female menhaden was at the target level, so according to the reference point, menhaden weren’t being overfished.

Shortly thereafter, a colleague of Jim Uphoff’s, a biologist named Alexei Sharov, got hold of the computer model that had been updated by NOAA scientists. Going through the code line by line, Sharov, one of Maryland’s representatives on the Technical Committee, found a fundamental miscalculation buried inside the model. Uphoff, meanwhile, studied the methodology of the code and discovered that NOAA had both underestimated the amount of fish killed by the industry and overestimated the spawning potential. Sharov brought these two mistakes to his peers on the committee, and it was agreed that corrections needed to be made.

Several months later, after the model had finished running a second time, the science finally caught up with what Jim Price and the anglers had been saying for decades: even using the lax reference points developed by the ASMFC, menhaden had been subject to overfishing in thirty-two of the past fifty-four years. When the assessment was then peer reviewed by a group of international scientists, the reviewers deemed that the reference point currently in use for menhaden—8 percent of maximum spawning potential—was not sufficiently safe or precautionary.

Furthermore, the number of menhaden swimming in the Atlantic had declined by 88 percent since 1983—to a level so low that it caused George Lapointe, former commissioner of Maine’s Department of Marine Resources, to have what he called an “oh shit moment.”

If anyone knows more about the “fundamental miscalculation”, I’d be grateful for a summary.

Categories: Noticed Tags:

Boot Camp in Boston, July 9-10

May 9th, 2012 No comments

We are pleased to announce that we will be running a boot camp on July 9 and 10 in Boston—please see its page for details (some of which we’re still working out). We have room for 40 participants, so please register early. (And if you can, register with friends: we are finding that people get a lot more out of this training if they’re learning with their labmates and other collaborators.)

Categories: Boot Camp, Boston Tags:

The Architecture of Open Source Applications: Volume 2

May 8th, 2012 No comments

We are very pleased to announce that The Architecture of Open Source Applications: Volume 2 is now available from Lulu.  A PDF version will go on sale in the next few days, and e-book will become available as soon as we can produce it.  Many thanks to everyone who contributed, and to the indefatigable Amy Brown for pulling it all together.  As always, all royalties will go directly to Amnesty International, so if you buy a copy, you’ll be helping to make the world a better place.

Categories: Noticed Tags:

An Exercise With Functions and Plotting

May 6th, 2012 No comments

[Code and Data]

Let’s say you have a text file called workout.csv that contains information about your workouts for the month of March:

# date, kind of workout, distance (miles), time (min)
"2012, Mar-01", run, 2, 25
"2012, Mar-03", bike, 10, 55
"2012, Mar-06", bike, 5, 20
"2012, Mar-09", run, 3, 42
"2012, Mar-10", skateboarding, 2, 10

# Broke my leg :( 

"2012, Mar-11", Wii, 0, 60
"2012, Mar-12", Wii, 0, 60
"2012, Mar-13", Wii, 0, 60
"2012, Mar-14", Wii, 0, 60

It’s a common-separated value (CSV) file, but contains comments and blank lines. The first line (a comment) describes the fields in this file, which are (from left to right) the date of your workout, the kind of workout, how many miles you traveled, and how many minutes you spent.

Our goal will be to read this data into Python and plot a graph with the day of the month on the x-axis and the time worked out on the y-axis. Let’s get started.

Read more…

Categories: Uncategorized Tags:

UCL Bootcamp: Version Control Wrap-Up

May 4th, 2012 No comments

For the boot camp at UCL, we tried using Mercurial (with EasyMercurial) instead of Subversion in the version control segment.

You can see the plan for the segment on this EasyMercurial project page. Briefly, we opened with a few plain slides about the purpose of version control, followed by a hands-on example in three parts (working by yourself, working by yourself with an online remote repository, and working with others). We started at the beginning and got as far as “hg bisect”, but did not cover branching.

I was presenting the segment, so I’m not well placed to judge how effective it was as a learning experience.  But I did make some notes.

Read more…