Archive

Archive for December, 2010

Software Carpentry Boot Camp Jan 12-14 in Madison

December 31st, 2010 1 comment

Registration is now open for a three-day Software Carpentry boot camp Jan 12-14, 2011, at the University of Wisconsin – Madison, organized by the folks at The Hacker Within. I’ll be speaking on the first morning, and hanging out the rest of the time trying to learn as much as I can. Look forward to seeing you there!

More Detailed Outline for HPC Lecture

December 30th, 2010 No comments

I’ve added a more detailed outline for a lecture on high-performance computing to the site; feedback on the content and order would be very welcome, as would pointers to tools that are easy to teach with (as opposed to being powerful in the hands of experienced users).

Categories: Content, Version 4 Tags:

Open Research Computation

December 27th, 2010 No comments

By now, many of you have (hopefully) seen the announcement of Open Research Computation, a new journal devoted to “…peer reviewed articles that describe the development, capacities, and uses of software designed for use by researchers in any field.” The editorial board includes several friends of this course; as one of them, Titus Brown, observed in his blog:

…the problem with the online world for scientists [is] there’s no real systematized incentive to any of this online stuff. And that makes it really tough. I’m going through Reappointment right now… Nowhere on there is there a place for “influential blog posts” — how would you measure that, anyway? Same with software — I listed my various software releases on the “scientific products” page of the form, and have since been asked to describe and discuss the impact of my software. Since I don’t track downloads, and half or more of the software hasn’t been published yet and can’t easily be cited, and people don’t seem to reliably cite open source software anyway, I’m not sure how to document the impact.

…I’m extra-specially-pleased to be on the board of editors, not least because so far it seems like this journal is trying to break significant new ground. Our ed board discussions so far have included discussions on how to properly “snapshot” version control repositories upon publication of the associated paper…and considerations for “repeat” publishing of significant new software versions, as the software matures, in order to help encourage people to actually update and release their software.

This new journal isn’t a panacea, of course. It’s going to take 3-5 years, or even more, to make a real impact, if it ever does. But I’m enthusiastic about a venue that speaks to a major theme of my own scientific efforts — responsible computing — and that could help in the struggle to place responsible computing more squarely in the scientific focus.

I wish them the best of luck, and hope to see many contributions from alumni of this course in coming years.

Categories: Community, Noticed, Research Tags:

Elimination

December 27th, 2010 2 comments

I’m working up another essay on software design, and would like to ask readers of this blog how they handle something that comes up when simulating interacting agents. If your program models the behavior of a flock of birds, it probably looks something like this:

# create birds
birds = []
for i in range(num_birds):
  new_bird = Bird(...parameters...)
  birds.append(new_bird)

# simulate movement
for t in range(num_timesteps):
  for b in birds:
    b.move(birds) # need access to other birds to calculate forces

There’s a flaw in this—or at least, something questionable. By the time you are moving the last bird for time t, every other bird is effectively at time t+1. There are many solutions, the simplest of which is to calculate each bird’s new position in one loop, then update the bird in another:

# simulate movement
for t in range(num_timesteps):
  new_pos = []
  for b in birds:
    p = b.next_position(birds) # doesn't move the bird
    new_pos.append(p)
  for (b, p) in zip(birds, temp):
    b.set_position(p)

(If you haven’t run into it, the built-in function zip takes two or more lists, and produces tuples of corresponding elements. For example, zip('abc', [1, 2, 3]) produces ('a', 1), ('b', 2), and ('c', 3).)

So far so good—but what if the things we’re simulating can produce offspring, die, or eat one another? Offspring are relatively simple to handle: we just put them in a temporary list (or set), then append them to the main list after everything else has been moved.

Removing creatures that have died is a bit trickier, because modifying a list as we’re looping over it may cause us to skip elements (we delete the element at location i, then advance our loop counter to i+1, and voila: the item that was at location i+1 but has been bumped down to location i is never seen in the loop). We can handle that either by “stuttering” the loop index:

i = 0
new_pos = []
while i < len(birds):
  state, p = birds[i].move(birds)
  if state == ALIVE:
    i += 1
    new_pos.append(p)
  else:
    del birds[i]

or by moving creatures that haven’t died into another list, and swapping at the end of the loop:

temp = []
for b in birds:
  state, p = b.move(birds)
  if state == ALIVE:
    temp.append((b, p))
birds = []
for (b, p) in temp:
  b.set_position(p)
  birds.append(b)

I think the second is less fragile—modifying structures as I’m looping over them always gives me the shivers—but either will do the job.

But now comes the hard case. What happens if birds can eat each other? If bird i eats bird j, for i<j, it’s no different from bird j dying. But if bird j eats bird i, we have a problem, because bird i is already in the list of survivors. Do we search for it and delete it (in which case, the stuttering solution above is definitely not the one we want, because the indexing logic becomes even more fragile)? Or… or what? Set a “but actually dead” flag in the bird’s record in the temporary list, and not move it back into the bird list after all in the second loop? What would you do, and why?

Categories: Content Tags:

Local Subversion Repositories

December 26th, 2010 3 comments

A colleague in the UK who is going to teach Software Carpentry asked about setting up repositories. In particular, he doesn’t have a server where he can create accounts and repos, so he was thinking of using Git or Mercurial, and having students host their repos on their own machines. That’s not actually necessary: if you’re going the locally-hosted route, and giving each student a separate repository, you can still use Subversion: just use the “file:” protocol for connecting instead of “http:” or “svn+ssh:”.  For example:

$ pwd
/users/gvw
$ mkdir demo
$ cd demo
$ svnadmin create jon
$ svn checkout file:///users/gvw/demo/jon mycopy
$ ls
jon   mycopy
$ cd mycopy
$ touch example.txt
$ svn add example.txt
A     example.txt
$ svn commit -m "Checking in an example file"
Adding         example.txt
Transmitting file data .
Committed revision 1.

The repository can be anywhere on the local file system—I just put it and the working copy in the same directory so that they’d be easy to delete afterward.  And a repository that you’re accessing via the “file:” protocol can also be accessed through other protocols—SVN does a good job of separating protocol from storage.  The only thing I trip over when I’m doing this is the triple slash: the protocol spec is “file://” (two slashes) and then there’s the absolute path to the repo (which starts with another slash) making for three in all.

Categories: Content, Version 4 Tags:

Extended Examples

December 23rd, 2010 No comments

We’d like to add more extended examples to this course, both because they’re fun and because they’re a good way to show how our topics relate to one another. Right now, we have:

We plan to add a simple N-body simulation (easy to do badly, motivates testing, visualization, and a bit of numerical analysis), but after that, we’d like to find some examples from psychology, linguistics, and other areas that are starting to do more computational work. Parsing text files full of patient records, putting the information in a database, doing some analysis using SQL, and visualizing the results is one possibility; what else would you recommend or like to see?

Categories: Content Tags:

Compute Canada’s “Strategic” Plan Isn’t

December 21st, 2010 2 comments

Last Friday—December 17—I received an email from Compute Canada. The emphasis is mine:

A Strategic Plan for Compute Canada was a key recommendation of the International Review Panel. This plan draws on suggestions from that panel as well as the information and discussions from the Town Hall Meetings held earlier this year. It has been many months in preparation and review by Compute Canada’s committees and must be submitted to CFI before the end of December 2010… Your comments are invited and should be sent…by end of day Monday, December 20th.

Call me cynical, but I have to wonder how much feedback they really want if they’re sending out more than 50 pages on the Friday of the weekend before Christmas, and insisting on replies by Monday… The plan itself is an even bigger disappointment. It is supposed to lay out the next decade’s goals for the entire Canadian HPC community, but of the six goals listed in the executive summary, only one talks about people (or in government terms, “highly qualified personnel”), and the “Implementation Strategy” given is as vague as it could possibly be: “Work with universities to develop HPC support expertise and train researchers to use HPC effectively and efficiently.” Sections 4.1.4 and 9.1.4 are equally vague—the latter acknowledges that “The key ‘product’ academia provides to businesses is Highly Qualified Personnel”, but the only concrete plan I see is decoupling funding for people from funding for hardware. Again, call me cynical, but I expect that will result in less money for the former, not more…

Nowhere do I see any mention of what matters most: giving scientists and engineers the foundational computational skills they need to use computers effectively—all kinds of computers, of all sizes. Big computers are vital pieces of experimental apparatus, and as the biggest line items in Compute Canada’s budget, the priorities for choosing them need to be stated (and argued for). Without skilled people, though, those fancy machines just space heaters with blinking lights. The best way to see what Compute Canada’s Strategic Plan should focus on is to redraw their tired old pyramid to show what things really look like:

Compute Canada’s Pyramid
Reality

If Compute Canada really wants to help academia and industry use high-performance computers more effectively, the picture on the right is the one that matters. If changing that doesn’t become their #1 priority, the gap between what Canadian scientists and engineers can do and what they could do will continue to grow, to the detriment of all.

Categories: Opinion Tags:

Executable Papers

December 20th, 2010 No comments

Elsevier is sponsoring an “Executable Paper Grand Challenge“.  If you have more than just ideas about the future of scholarly publication in computational science, it may be a good way to get them some press.

Categories: Noticed Tags:

Building a Recommendation Engine with NumPy

December 15th, 2010 No comments

Tommy Guy’s explanation of how to build a recommendation engine in NumPy, based on an example from Toby Segaran’s excellent book Programming Collective Intelligence, is now online.

Categories: Community, Version 4 Tags:

Presents for the Holidays

December 14th, 2010 3 comments

Some of the best presents I have ever received have been recommendations: “Oh, you’d like this author,” or, “You really should listen to this album.”  So, in the holiday spirit, please take a moment and share (in the comments) some pointers to  computational resources that you think deserve to be better known.  To get the ball rolling, I’m having fun playing with Ruffus, a simple Python library for constructing pipelined workflows.  I’d also recommend Jonathan Weiner’s Time, Love, Memory: A Great Biologist and His Quest for the Origins of Behavior. It’s ostensibly a biography of Seymour Benzer (who spent his entire career exploring how genes determine behavior), but it’s actually the best description I’ve ever read of how a successful long-lived research program actually works.

Categories: Community Tags: