Fork me on GitHub

Our First Data Carpentry Workshop

Posted 2014-05-14 by Karen Cranston in Bootcamps, Data Carpentry.

Update: for more information on Data Carpentry, please see their web site.

On May 8 and 9, 2014, 4 instructors, 4 assistants, and 27 learners filed into the largest meeting space at the National Evolutionary Synthesis Center (NESCent) for the inaugural Data Carpentry bootcamp. Data Carpentry is modeled on Software Carpentry, but focuses on tools and practices for more productively managing and manipulating data. The inaugural group of learners for this bootcamp was very diverse. They included graduate students, postdocs, faculty and staff, from three of the largest local research universities (Duke University, University of North Carolina, and North Carolina State University). Over 55% of the attendees were women and research areas ranged from evolutionary biology and ecology to microbial ecology, fungal phylogenomics, marine biology, and environmental engineering. One participant was even a library scientist from Duke Library.

Acquiring data has become easier and less costly, including in many fields of biology. Hence, we expected that many researchers would be interested in Data Carpentry to help manage and analyze their increasing amounts of data. To get a better idea of the breadth of perspectives that learners brought to the course, we started by asking learners why they were attending. The responses reflected a broad spectrum of the daily data wrangling challenges researchers face:

The instructors discussed many of these kinds of scenarios during the months of planning that preceded the event. Therefore we were hopeful that the curriculum elements we chose from the many potentially useful subjects would qaddress what many of the learners were hoping to get out of the course. Here is what we finally decided to teach, and the lessons we learned from that as well as from the feedback we received from the learners.

We taught four different sections:

This was the first-ever bootcamp of this kind, so after it was all done, we had a lot of ideas for future improvements:

There were also various things we wanted to teach but that came under the chopping block due to lack of time and other reasons. One of these, and one that learners asked about repeatedly, was the subject of "getting data off the web". It will take more thought to pin down what that should actually mean as part of Data Carpentry bootcamp aimed at zero-barrier to entry. It might mean using APIs to access data from NCBI or GBIF, but it's far from clear whether that would be meeting learners' needs or not. For most general-purpose data repositories, such as Dryad, most of their data are too messy to use without extensive cleanup.

All of the helpers including Darren Boss (iPlant), Matt Collins (iDigBio), Deb Paul (iDigBio), and Mike Smorul (SESYNC) did a great job of helping the students pick up new data skills. Finally, we'd like to thank our sponsors for their support, including NESCent for hosting the event and keeping us nourished, and the Data Observation Network for Earth (DataONE), without whom this event wouldn't have taken place.

For more on this workshop, please see this Storify.

comments powered by Disqus