Teaching basic lab skills
for research computing

Three Bootcamps for Librarians

Over the past month and a half I have been fortunate to (co-)lead three bootcamps targeted at librarians, a relatively new audience for software carpentry. These bootcamps have been located in Edmonton, Toronto, and New York, and in the same way that the term "librarian" is very broad encompassing many different disciplines and skill sets, the three bootcamps had their share of similarities and important differences. I'll start with similarities and then discuss each bootcamp's differences.


The Edmonton and New York bootcamps were hosted by specific libraries, the Edmonton Public Library (EPL) and New York Public Library (NYPL) but open to librarians both within and outside those organizations. The Toronto bootcamp was hosted by the Mozilla Corporation and advertised to librarians in the Greater Toronto Area.

Two important similarities were 1) attendance was voluntary and 2) the bootcamps filled very quickly. Edmonton was limited to 25 attendees. An email was sent in the early afternoon and all spots were filled by the next morning. Toronto had 40 spots with about 30 or so actually attending. NYPL was open to 80 attendees, 40 within NYPL and 40 from the outer librarian community. The 40 NYPL spots filled within 5 hours of the invitation email. I'm unsure about the other 40, but I believe they filled just as quickly.

The EPL bootcamp was the first of the three bootcamps, held in early July. The bootcamp was advertised through out the Edmonton library community via email. I was the only instructor for the bootcamp and so I tried to get as much input as I could from previous librarian bootcamps. In particular, a librarian-oriented bootcamp was held at PyCon in Montreal in April, 2014. Dhavide Aruliah wrote a post-mortem and I took all advice it suggested. The primary suggestion was to be prepared to go slowly. Dhavide wrote "We asked for feedback at the end of the first day. There was an overwhelming consensus that we needed to slow down and to allocate more time for hands on stuff". The instructors in Montreal thought on their feet and Jess Hamrick came up with a lesson based on processing a circulation card that formed the basis of their second day.

Based on the above and some direct feedback from other coding librarians (Andromeda) we modified the bootcamp schedule to focus just on Python and library data and forego other SWC staples such as git and SQL. Version control and databases are essentials for people who program, but for people who have not programmed their importance is not yet clear.

In terms of material, we focused on creating three library-specific tasks. The first was Jess's circulation card example, the second was processing of a spreadsheet of library hold data and the third was processing overdue fine payments. The latter two of these lessons used real data given to us by the EPL. These lessons were more than enough content for the two day bootcamps.

In Edmonton, we started with shell and python basics. We focused on answering the question, "How many holds are there for the title Ender's Game"? We used shell programs to get the answer. This lesson introduced the concepts of files, working directories, the command-line and Unix pipeline model. The hold data we had contained over 30000 hold records, so the need for automating it was clear to the attendees.

I used this above lesson as the starting point in all three bootcamps. We followed it with a slightly harder question "What is the most popular hold?" (spoiler: it was Frozen) and introducing the new tools necessary to answer it.

After the shell, we moved onto Python. We started from the very basics, literally "Hello World". To help reinforce concepts of accessing and parsing data, we introduce Python by building solutions to the same questions above - holds on a particular title, then counting holds for all titles. This lesson usually spilled over to the second day, but by the end of the first day, students had seen variables, math operators, opening and looping over lines of a file, and splitting column data in CSV files.

The first day was very similar in all three bootcamps. Our day 1 feedback across all bootcamps was that is was generally well-paced. A handful said it was too fast, a few said too slow which with the group sizes we had is the best we can hope for. From this point the topics in the bootcamps diverged on Day 2.

Edmonton Day 2

The pre-bootcamp survey in Edmonton told us that 90% of the attendees had never programmed before. For this reason we decided to stick with Python and work with librarian data on Day 2. We used our second pre-prepared lesson of merging catalogue data. The lesson was originally created and also taught by Vicky Varga of the EPL. We concluded the second day by bringing together Python and the shell in running Python programs on the command-line. We then left about an hour for attendees to play around and ask questions.

Toronto Day 2

In Toronto, we had a larger portion of experienced attendees that wanted to move beyond Python (more than Edmonton for sure). Sensing this, at the end of day 1 we took a vote as to what students wanted to do on the second day. The choices were 1) more python with library data 2) git or 3) sql. "More Python" and SQL won out. So, the morning of the second day, we finished our Python program to count each title in the hold data. This task requires dictionaries, so it took a good chunk of the morning. We then finished the morning with about 1 hour of regular expressions.

BUT, in the aforementioned vote on topics, we had a handful of students who in addition to voting for git wrote something to the tune of "Please, please teach git, this is why I came to this". So we actually created a choice for the Day 2 afternoon. The main group learned SQL using a library-based example created by Thomas Guignard and Abby Cabunoc in the afternoon, but we had one instructor branch off (inside joke) to another room and teach git to the 6 or so students that really wanted to learn it.

Finally, at the end of the second day, we left about 1/2 hour of free time for attendees to work. This is where we learned that a few had hoped to learn about Python modules, something we didn't get to. But, I showed a few of them an example of using the XLRD module to read Excel files directly.

New York Day 2

At NYPL, we had 80 attendees so we split into two rooms due to capacity limits. We had initially planned to have a "beginner" room and an "advanced" room, but in the pre-bootcamp survey, over 40 of 52 replies indicated "I have not programmed before". So, we decided to start both rooms with the basics we had used in Edmonton and Toronto. Day 1 was similar in the two rooms working with the shell and an intro to Python from the very basics.

On the second day, we decided to offer a choice since we had two rooms. We decided to split on the two topics used on the second day in Toronto. So, before lunch on day 2, attendees were asked to indicate which topic they wanted to attend in the afternoon - "More Python" or SQL. We were considering offering a third option of git, but no one at the bootcamp had indicated a strong interest in git, we so decided not to. Most students wanted SQL, but enough wanted more python to allow us to have one room of Python and one of SQL (If everyone wanted SQL we would have done it in both rooms).

One point that I'll return to later is the importance of using library-specific examples. For Day 2 at NYPL the "More Python" lesson was based on a problem one of our attendees discussed with us. We hacked up a lesson in the late morning on Day 2 after she sent us some data files. Moreover, we also got into processing MARC records from Python using a lesson that one of helpers Jared (a librarian coder himself) created on that day.

A final point of clarification: while regular expressions and SQL were taught in both Toronto and NYPL, we did not get to using them from within Python (i.e., using the re and sqlite modules). They were taught as separate concepts which I believe the audience still found value in.

Lessons Learned

It was great to teach librarians. They were an excellent audience - interested, adventurous and at ease with each other because a good number knew each other already. These qualities led to an interactive environment that is always more fun. So what lessons did we learn? Here are the suggestions I would recommend to anyone teaching software carpentry to librarians:

  1. Librarians like Software Carpentry. All three bootcamps filled quickly. Many comments at the end mentioned an interest in a "Part 2" bootcamp to learn more. Both EPL and NYPL (as well as librarians from other organizations who attended) have expressed interest in hosting another bootcamp in the future.
  2. Do a pre-bootcamp survey. This is probably true with any bootcamp, but librarians carry a higher number of beginners and knowing how many is essential.
  3. Use Library-specific examples - audience-specific data is important. "Data is just data" does not work. Software Carpentry examples tend to be scientific in nature. Using existing library examples mentioned above or creating examples from real data is important to capture the audience. There is no single example that all librarians will find relevant, but as long as they can recognize the concepts, they'll be more interested to follow along. One caveat: very narrow library-specific topics like processing MARC records was not of much interested in any bootcamp.
  4. Librarians liked to learn Regular Expressions and SQL. A good portion of librarians wanted to see them in Toronto and NYPL (and probably Edmonton too if I'd asked.) Most librarians probably saw SQL and regular expressions during their degree, but level of coverage can vary greatly. When preparing a bootcamp, I would suggest making time for these topics on the second day, even with a mostly beginner group. And as mentioned, making the examples library specific helps and it's OK if you do not get to integrating regexes and SQL with Python, they're still valuable.
  5. Librarians are interactive. Be ready to think on your feet. In both Montreal and NYPL, lessons were created during the bootcamp to fit the audience. So ask for examples from your audience and try to work them in where appropriate. Also, be aware if your librarian audience gets too quiet. It may mean you're losing them (true of any audience, really).
  6. We need more feedback on problems librarians would like to solve with programming - Following on the points above, Software Carpentry needs to build up a good set of of library-based examples. Like anyone seeing a new tool for the first time, it's tough for librarians to know where software carpentry lessons might apply in their day-to-day tasks. Ask for feedback during and after bootcamps and please share it.

Dialogue & Discussion

You can review our commenting policy here.