Teaching basic lab skills
for research computing

Our First High School Workshop at Rockefeller University

Camille Avestruz, Ivan Gonzalez, Timothy Cerino, and Daniel Chen all had the great opportunity to teach Software Carpentry's first zero-entry workshop to high school students. We were able to teach at Rockefeller thanks to the scientific foresight of Jeanne Garbarino and the rest of the Rockefeller team along with Arliss Collins, Greg Wilson and the SWC team. Lastly, thanks to Gabriel Perez-Giz for volunteering his time to help during the workshop.

The main goal of this workshop was to expose tomorrow's scientists to scientific computing as early as possible. For example, as genomics data for biology continues to grow, we are beginning to see a shift of biologists from the pipetter to the data scientist. Our goal was not to teach everyone all the skills needed so they can dive into retrieving, cleaning, and analyzing genomics or astronomy data the next day, but rather show them what is possible with computers, and expose them for the first time that the GUI may not always be the best tool for the job; and give some foundation of knowledge and concepts for perpetual self learning.

We followed the traditional SWC workshop materials, adapting the pace as needed. Bash, Python, and Git were covered.

What Went Well, and What Didn't

High School Students

Since there was a lot of material to cover in two days, the lessons were given in 90 minute blocks. For the typical NYC high school (I'm now speaking from personal experience), there are 10 blocks of 45 minutes each throughout each day. Some classes may take a 2 period block (90 minutes), but almost never is the entire day solely taught in 90 minute blocks. In higher education a 90 minute class is almost considered the baseline and is very common with many other classes lasting much longer. This difference was echoed to us when we finished our workshop and had a 15 minute open discussion of what worked for the students, and what did not. Many suggested having 4 half-day workshops, each covering 1 topic with many small in-class practice problems and a 'homework'. Reflecting on this point, this makes sense and should be tested the next time we teach high school students.

Teaching these workshops usually has a bimodal distribution of students in terms of proficiency, experience, and familiarity of the materials being covered. In our particular case, we pre-screened the class, but only about 2 students had prior experience with the material.

Sending these students to a site with short challenge exercises so they don't get bored (i.e. Rosalind or Project Euler for Python; Learn Git Branching for Git) would have helped for those few who thought the pace was too slow.

The world cup probably didn't help either :D

Sticky notes, the Etherpad, and raising your hand

The sticky notes (red up for a problem, green up for no problem), Etherpad document, Etherpad chat, and the basic raising of the hand to ask for help were extremely effective. Students were using all means of getting help, although I would have to say most were doing what they were used to in the classroom, which is raising their hands. For those that were shy, the Etherpad chat was a way for the instructor to clarify an idea or answer. Once again, the sticky notes also provided a great means for honest feedback.

Bash

We began in Bash with the basic pwd, cd, and ls commands, eventually moving to mkdir and touch. The helpers realized very quickly that the students were struggling to keep up. Common questions were:

  • Did I do this right?
  • It doesn't look like what's up there
  • I don't understand what is going on

For those of us who live in Bash, or use the terminal regularly, we customize the look of the prompt, removing extraneous information, adding colors, and etc. From the student's point of view, they've never seen a blinking cursor in its current context. Trying to follow along while blindly typing what they see on the screen, except what they see on the presentation screen was not what they saw on their own monitor; thus confusion arose, even though many of them created and moved into the correct directory. A possible workaround is to have all students copy/paste an alias for ls that colors directories (this is not a default in OSX), and to have everyone (instructors included) export the same PS1. The former can be done right after the ls command is brought up, and searching through the man pages for the --color flag. Experimentation with when the optimal time to export the PS1 variable would have to be conducted, but it seems that a black box 'please copy/paste this so your terminal looks like mine' may be sufficient.

This brings me to my next point, when using the Etherpad, it will be useful to not copy the $ before the actual Bash command. Many of the students are still trying to get some idea of what is going on that they will copy the entire line, then attempt to copy/paste and when they finally remember the correct terminal copy/paste commands, the command will error out and will not understand why. This could be a failure on our part in not properly explaining the components of a Bash command (e.g., the first 'token' is always a 'verb'), but at this level, it may be too much information trying to navigate the file system and the nitty-gritty of Bash all at once.

The students were very literal, as they should be when exposed to a new concept. However, as literal as they were, spelling mistakes were common, and learning to tab complete was a lesson they were learning the hard way.

When using Bash to navigate and create files, zero-entry bootcampers do not have a mental image of the directory structure/hierarchy. A GUI file system window that is shown simultaneously with the basic navigation and touch and mkdir commands solidify and contextualize what exactly is going on in the file system them that mkdir and touch actually create files in the GUI interface as well. This can be accomplished by running the following terminal commands for Windows, OSX, and Linux operating systems, respectively:

  • explorer .
  • open .
  • xdg-open .(or nautilus . in Ubuntu)

From the first series of feedback, we learned that repetition is extremely important. For example, when we are teaching what cd is and when we say 'cd into a new directory' it was important for the instructors to say 'cd, change directory into your SWC folder'. This applied to the other topics cover as well and is especially important since these are new terms for the students and it naturally slows down the lesson (having tons of spelling mistakes work just as well as a natural delay).

After reading the first round of feedback, we put up a small exercise on the screen during lunch. This gave the students to read and see what the task was when they return. Additionally, with Gabe's help, we came up with a more practical exercise that show cases the necessity of learning Bash. We created a directory of 48,000 files of mixed names and file types and asked the students how they would move a certain pattern of files (e.g. I want my 2013 pictures that are some form of .jpg into a pics/2013 subfolder). This was a powerful example since the 'usual' way they would have done the task is through the GUI. However with that many files in the directory, the GUI actually had problems drawing all the icons to display. By the time the GUI loaded up the icons, we were almost finished with the entire task. That also included explaining the problem in Bash, giving them time to come up with the wildcard expression to move the files, and watching the GUI crash a few times. This example really gave them a real world context on why the GUI is bad and we reiterated that a massive file dump like the example shown is not uncommon in science.

Python

We took the feedback regarding pacing in Bash and paced the Python lessons in a much more digestible fashion. We finished off the day covering conditionals, with some feedback about the section on dictionaries to be too long. By the second day I overheard students being excited about their newfound knowledge, as if they are thinking in terms of a logical set of commands to give a computer to accomplish a task. It was inspiring to know that at the same time 'it is just the tip of the iceberg and there's a lot more out there.' We began the second day reviewing everything in the previous day. Go to the SWC work directory, create a folder for today's work, open up a ipython notebook, and review some questions that were brought up in the preview day's feedback. It was fascinating to see that even amongst all the confusion regarding Bash the previous day, filesystem navigation was not an issue on the next day.

We finished the session off with loops, functions, basic file I/O, and we gave them a practical exercise to read in a dataset of animals and their brain mass and body mass. The task was to calculate each animal's brain:body mass ratio and save the information so it can be used later.

Some things we learned teaching Python:

  • Students had trouble understanding the value of toy exercises. When I was helping with during the Python session, some seemed to be very worried about having the example or exercise "right", as the value of the task was just this.
  • Exercises/examples should be tuned to connect to their previous experiences or otherwise have some sense of completion (e.g. ipythonblocks).
  • During the last capstone exercise, many of the students were simply stumped, overwhelmed, and had 'no idea where to start', even though the basic components were covered. This became a very powerful example of breaking down the problem (engineering), and the use of comments. Showing them comments and how they can be used to pesudo-write code and get some process out without be burdened by actual code
    1. forces them to actually break down a problem into manageable practices and makes the problem less daunting.
    2. shows a real practical use case of comments and how they can be used before any code is written, not just documenting what you already have done. I think we did a great job introducing good practices in programming. Hopefully the next time they are asked to implement their knowledge it becomes less daunting.
  • Windows users can preface the ipython notebook code block with %%bash to run Bash commands such as ls and head
  • Because of the guest wireless system, students were being knocked offline during an ipython notebook session, and this affected the actual notebook being able to run the code blocks. Running a kernel interrupt, kernel restart, and run all cells brought students back to a working state.

Git

Version control is probably one of the more difficult topics to both cover, preach, and attempt to convert non-users into using. We began by mentioning 2 common problems, the first of which is nicely depicted in this PhD Comic on final documents

The second was an example of backing up data while working on a thesis. The biggest barrier to entry, is showing them why it is better than their current workflow: dropbox, track changes, save as with a number or date, collaboration, etc. We began by diagramming Git on an oversize notepad. However, due to the nature of the room, it was difficult to see the drawn figures from the back. We went back and fourth between the Git diagram and the actual committing and checkouts in Git to get a sense of how things are being tracked using 2 files keeping track of guacamole ingredients and instruction on how to make guacamole. We mostly went over version control on a local and individual level. During the final section of the workshop we went over collaboration using the guacamole recipe on GitHub.

Here we also ran into many technical problems regarding the older versions of OSX, namely Snow Leopard. The problem arises when one tries to git init the directory to be tracked. Simply checking the Git installation by using which git does not accurately diagnose the problem. The fix was to use an older version of Git than the one posted on the main website. However, it seems that an even more universal and simpler solution is to install the GitHub application and within preferences, have the program install the command line tools.

Conclusion

From the student feedback and instructor observations, the workshop was a success. The students asked very good questions about practical use cases for each of the topics covered. The turnaround between Bash proficiency between the 2 days was astounding. We also spent some time throughout the workshop referencing a few good links on where to practice their new found skills, explaining that the problems do not have to seem 100% applicable to your current work (although that may help), practice doing various (unrelated) tasks train the mind to see how a solution can be applied to other problems because of an underlying pattern. Thus, we directed them to a few Python and Git websites to give them practice.

Our goal was to give them enough instruction to get over the big initial learning hurdle into Bash, Python, and Git so that they have the foundation to explore and learn on their own.

Dialogue & Discussion

You can review our commenting policy here.