Archive

Archive for July, 2010

Interview with Davor Cubranic

July 31st, 2010 No comments

Today’s interview is with Dr. Davor Cubranic, a statistician who lives and works in Vancouver, B.C. Davor recently ran a workshop for faculty and grad students in statistics that covered many of the same ideas as Software Carpentry.

Tell us a bit about your organization and its goals.

I work in the Department of Statistics of a large research university. Our goals are production of research papers, often in collaboration with researchers in other departments, such as life sciences, engineering, or forestry.

Tell us a bit about the software your group uses.

Primarily R, with some C/C++, Matlab and SAS. A few groups use version control (Subversion, with some thought given to migrating to distributed version control systems). I’m probably the only one using automated tests (with RUnit testing framework for R). A number of researchers use ESS, an Emacs front-end to statistical packages.

Tell us a bit about what software your group develops.

In-house development for our own use, although it is typically made publicly available over the web. I suppose some statistical packages we develop might be used as components in other researchers’ software, but I’m not familiar with any specific cases of this.

What can you tell us about your course?

It was a two-hour workshop on lightweight software engineering practices that help improve the quality of the research software we create, and so indirectly improve the quality of the research itself. This is the first time I gave such a workshop, and there was considerable interest in it. I hope to grow it into a more extensive course that would borrow from Software Carpentry, but using tools, languages, and problems that are more familiar to members of our department.

How do you tell what impact the course has had?

I don’t know yet.

What are your plans for future work?

Grow the workshop in duration and number of topics covered, and make it a regular fixture of the orientation given to incoming students every September.

Categories: Interviews Tags:

Stats for July

July 30th, 2010 No comments

Visitors

Page Views

Categories: Community Tags:

A Little Bit of Javascript

July 30th, 2010 1 comment

As I’ve mentioned before, one actionable finding in educational research is that faded examples—ones in which progressively less of the solution is shown to students as they progress—are a very effective teaching tool. I’ve been thinking about how to add them to this course, and have an idea I’d like to try out. It requires more Javascript than I know, though, so I’m hoping someone who reads this blog will be willing to write it for us. (And in general, if anyone wants to help produce material for this course, please get in touch: we’re looking for scripts, slides, voiceovers, examples, artwork, and everything else that an open education project needs.)

My idea is to create something like a simple folding editor to progressively expand solutions in place in a controlled order. I want to put specially-formatted comments in code to mark folds:

import sys, re
'''Find all duplicated words in an input file.'''
# <4> Finally, define a pattern that will match duplicated words.
pattern = r'(\b\w+\b)\s+\1'
# </4>

# <2> Process lines of text with a regular expression using the looping pattern we've seen before.
def process(lines):
  result = set()
  for the_line in lines:
    for match in re.findall(pattern, the_line):
      # <3> Extract data from matches. This is specific to *this* problem, and has to sync with the pattern.
      word = match.split()[0]
      result.add(word)
      # </3>
  return result
# </2>

if __name__ == '__main__':
  # <1> Write the main body of the program first using the read/process/write pattern we've seen before.
  lines = open(sys.argv[1], 'r').readlines()
  results = process(lines)
  for r in results:
    print r
# </1>

It will initially appear as:

import sys, re
'''Find all duplicated words in an input file.'''
...4...
...2...
if __name__ == '__main__':
  ...1: Write the main body of the program first using the read/process/write pattern we've seen before...

Clicking on the fold marked ’1′ expands it, and draws attention to fold #2 by putting its comment text inline:

import sys, re
'''Find all duplicated words in an input file.'''
# ...4...
# ...2: Process lines of text with a regular expression using the looping pattern we've seen before...
if __name__ == '__main__':
  # Write the main body of the program first using the read/process/write pattern we've seen before.
  lines = open(sys.argv[1], 'r').readlines()
  results = process(lines)
  for r in results:
    print r

Clicking ’2′ expands it to show (and draw attention to) #3, et cetera. And there would be markers of some kind to re-fold an item, which would automatically re-fold all higher-numbered items at the same time. This would let us show readers how we created a solution, not just the solution itself; the marked-up code would be a bit ugly, but pretty easy to create (at least for small examples).

So, any volunteers?

Categories: Tooling Tags:

Survey Update

July 29th, 2010 No comments

Here’s an update on responses to the survey I posted a couple of weeks ago. 172 people have responded at this point; it’s encouraging that priorities are relatively stable as numbers increase.

Education
77.3% Graduate degree
22.1% Undergraduate degree
0.6% High school
Field
41.0% Computer Science
30.1% Earth Sciences
28.9% Physics
25.4% Mathematics and Statistics
11.0% Microbiology
9.2% Biomedical Engineering
6.9% Macrobiology
5.2% Medicine and Health Care
5.2% Electrical Engineering
5.2% Astronomy
4.6% Mechanical Engineering
4.6% Aerospace Engineering
4.0% Chemical Engineering
2.9% Psychology
2.3% Economics
2.3% Business/Finance
1.2% Linguistics
1.2% Civil Engineering
0.6% Social Sciences
0.6% Arts and Humanities
Role
44.8% Academic Researcher
32.8% Software Developer
16.7% Graduate Student
16.7% Government Research Scientist
10.3% Engineer
9.8% Manager/Supervisor
8.6% System Administrator
3.4% Teacher
2.9% Industrial Research Scientist
1.1% Undergraduate student
1.1% Laboratory Technician
Priorities
2.51 Automating Repetitive Tasks
2.50 Reproducible Research
2.49 Data Visualization
2.46 Version Control
2.43 Performance Optimization
2.41 Data Structures
2.41 Coding Style
2.38 Basic Programming
2.37 Testing and Quality Assurance
2.35 Parallel Programming
2.34 Debugging with a Debugger
2.33 Using the Unix Shell
2.29 Computational Complexity
2.21 Object-Oriented Programming
2.21 Designing a Data Model
2.19 Working in Teams/on Large Projects
2.14 Refactoring
2.10 Static and Dynamic Code Analysis Tools
2.09 Matrix Algebra
2.06 Systems Programming
2.06 Integrating with C and Fortran
2.03 Design Patterns
2.01 Packaging Code for Release
1.95 Functional Languages
1.93 Handling Binary Data
1.80 Image Processing
1.77 Introduction
1.75 Build a Desktop User Interface
1.73 XML
1.64 Create a Web Service
1.39 Geographic Information Systems

Categories: Community, Research Tags:

Two More Episodes on Version Control

July 29th, 2010 2 comments

The third and fourth episodes of our lecture on version control are now online. These explain how to handle conflicts from concurrent edits, and how to roll back changes.  As with the second episode on basic workflow, they use a mix of slides, screen recording, and sound effects.

The next episode is supposed to explain how to create a repository, but I’m still trying to figure out what to show people. A repo on the same machine that’s being used for development is better than nothing, but that doesn’t help people share work with colleagues. On the other hand, creating a repo on a server somewhere requires at least basic knowledge of the shell: even if someone is willing to type in a password for each interaction (so that they don’t need to know about public/private keypairs), they’ll need to know enough to SSH in to the server and run “svnadmin create reponame”. There are web-based control panels for creating and managing repositories, and we could just require them to ask their friendly neighborhood sys admin to set that up, but it’s just enough of a stumbling block to, well, be a stumbling block. Suggestions would be welcome…

Categories: Lectures, Version 4 Tags:

Mark Guzdial on Software Carpentry

July 28th, 2010 No comments

Mark Guzdial, a leading researcher in computing education, blogged a few days ago about the Texas Advanced Computing Center’s training program for computational scientists, and asked, “Given the importance of computational science, what do all scientists and engineers need to know about high-performance computing?” As you might expect, I replied to say that the question was almost always premature: we should first ask what scientists and engineers need to know about computing in general before tackling HPC.

Mark has responded with a post on the CACM blog that quotes me, and puts Software Carpentry in a larger context:

…by 2012, there will be about 3 million professional software developers in the United States, but there will also be about 13 million end-user programmers—people who program as part of their work, but who do not primarily develop software… these end-user programmers don’t know a lot about computer science, and that lack of knowledge hurts them.  He find that they mostly learn to program through Google. In his most recent work, he is finding that not knowing much about computer science means that they’re inefficient at searching.

He then goes on to quote Alan Kay’s “Triple Whammy” of core concepts:

  1. Matter can be made to remember, discriminate, decide, and do.
  2. Matter can remember descriptions and interpret and act on them.
  3. Matter can hold and interpret and act on descriptions that describe anything that matter can do.

and asks, “How do we frame [this] in a way that fledgling scientists and engineers would find valuable and learnable?” I agree that these ideas are at the heart of computing, but trying to map them directly to “here’s what you do on Tuesday morning” is a really big step. I hope that our concept map is one of the intermediate steps, but there have to be many, many more.

Categories: Noticed, Opinion Tags:

Second Lecture on Version Control

July 26th, 2010 2 comments

Our second lecture on version control is now on the web.  It combines screen recording with static slides; please let us know whether the format works for you; in particular, can you follow what’s happening on the desktop?

Categories: Lectures, Version 4 Tags:

Introduction to Version Control

July 24th, 2010 No comments

It took a lot longer to put together than I expected, but I’m pleased with the result—the newest screencast explains what version control is, and why you’d want to use it, in four minutes and four seconds, including a wolf howl and a maniacal laugh. I hope you like it…

And in case you haven’t been reading comments, I’d be very happy to include a parallel version control lecture based on a distributed version control system like Mercurial or Git if someone would like to create it. Software Carpentry is an open source project—if that’s the itch you want to scratch, then please email me and we’ll figure out how to make it happen.

Categories: Lectures, Version 4 Tags:

Strictly Speaking, This Isn’t Part of Testing

July 23rd, 2010 No comments

The second episode of the lecture on testing is now up.  It covers exceptions, which strictly speaking aren’t part of testing, but it seemed like a natural place to introduce them. As always, feedback is welcome…

Categories: Lectures, Version 4 Tags:

First Episode of Testing Lecture

July 22nd, 2010 No comments

The first episode of the lecture on testing is now online. We’re going to be showing people how to use Nose, so I’ve tried to motivate the idea of a unit testing framework—I’d appreciate feedback on whether it works or not.

And yes, I am working on the version control lecture, but I’ve been stumbling over minor technical glitches, and I’m afraid that if I try recording today, I’ll start using uncivilized language. Hopefully I’ll get to it tomorrow…

Categories: Lectures, Version 4 Tags: