Teaching basic lab skills
for research computing

Blog Archive

The Software Carpentry blog is no longer updated regularly. You can find more information about what's happening in our community through The Carpentries blog, a great resource that collates posts from Data Carpentry, Library Carpentry, and Software Carpentry, and publishes updates of general interest to the community.

This page contains a list of posts from the legacy Software Carpentry blog for archival purposes.

Git lesson using worksheets
Pariksheet Nanda / 2018-05-26
I attended my first Software Carpentry workshop in 2015 as a helper and was mesmerized by how Ivan Gonzalez taught the Git lesson using an easel pad and different colored markers. He had separate boxes for the working directory, stage and history and went back and forth between the terminal and drawing on the pad to recap his last set of commands. Ever since, Git has been one of my favorite lessons to teach, because the material has depth and is challenging to explain, but it can be taught well with the help of Ivan’s drawings and the well maintained lesson materials. The Git drawings seemed to work well until last year when I taught in a room with afternoon sunlight and a difficult to see blackboard. I was scheduled to teach in the same room at a workshop earlier this month, and to remedy the hard-to-see blackboard, I used a document camera and drew on a worksheet instead. Herein was a nice opportunity: why not have the learners also draw long? During my PhD studies, I’ve had two classes, biochemistry and phylogenetics, where the instructor had us draw along and I found I enjoyed drawing things in the classroom. How often does one get to draw as an adult? I was instantly transported back to kindergarden. In addition to the Git drawing section at the top of the worksheet, I added a cheatsheet of the Git commands we would be covering that afternoon with little checkboxes next to the commands. That way, learners can get the visceral feeling of checking off a command when it is covered and understood (or have an opportunity to protest if it wasn’t understood). Drawing along can have educational value in addition to feeling good. In Chapters 4 and 5 of “How to Teach Programming (and Other Things)” Greg Wilson discusses combining words with visuals (dual coding) and of the need to slowly present a diagram in pieces to later help trigger recall of what was said by pointing at the diagram. Here is the cheetsheet clean and filled-in: The reason to use arrows of the history items pointing back to their immediate previous items is to make the “detached HEAD” error state more clear. When we create the detached HEAD state in the classroom, we don’t see recent commits because the history items are only aware of their previous commits; or at least that’s a good enough mental model of Git’s true behavior. Learner sticky note feedback from that section of the workshop is slightly clouded by some confusion during the collaboration section in the late afternoon, and because Git was taught without teaching the Shell first. I will use the worksheets again at a more traditional workshop in 2 months and hope you will also try using Git worksheets! Read More ›

Meet the Members of the Software Carpentry CAC
Erin Becker, Belinda Weaver / 2018-05-07
New Curriculum Advisory Committee for Software Carpentry Lessons Read More ›

Launching The Carpentries Website
Tracy Teal, Belinda Weaver / 2018-04-25
Website Launch We are excited to announce that The Carpentries website is now live! The new website celebrates our merged identity as The Carpentries. The new website will give you access to all things ‘Carpentries’, in other words, it will give you easy access to what is common information across the merged organization. The sorts of things you will find there include our Code of Conduct, information about instructor training and assessment, a range of shared policies, including our privacy policy, details of staffing and project governance, and a whole lot more. The existing Data and Software Carpentry sites will remain in place alongside the new site. Since Data and Software Carpentry are ongoing lesson organizations, information related to lessons belongs on those individual sites. We will gradually take down material that is now more logically based on The Carpentries website. You may notice that a lot of the links on The Carpentries transfer you directly to The Carpentries Handbook that we launched last week. The Handbook has been enthusiastically received by our community. For those who haven’t seen it yet, find it here. The aim of the Handbook is to provide a one-stop shop for people wanting all kinds of Carpentries-related information. Information is being added and updated all the time so please let us know if there is something missing. The Handbook and the website will complement each other to cover all things Carpentries. Please let us know if there are errors or omissions on our new website. You can raise an issue about the website at this link, or about the Handbook at this link. The launch of the new website completes our transition to a new, merged, online identity as The Carpentries. Increasingly we will blog as The Carpentries, rather than as Software or Data Carpentry, so be sure to check out our new blog. We also have our new merged Twitter feed. Follow The Carpentries on Twitter. Read More ›

Running workshops on limited budgets
Belinda Weaver, Toby Hodges, Anelda van der Walt / 2018-04-13
Anelda van der Walt and co-host Toby Hodges ran the first themed discussion session in March to brainstorm ideas about running workshops with little to no funding. Attendees from Australia, the US, the UK, and Canada joined to hear and contribute ideas. Existing Funding Models The Brisbane model is to use Eventbrite to charge a low fee (around AUD $60) for attendance at workshops. This has a few benefits - waitlists can be managed, the booking widget appears in the workshop website, fees can be easily refunded up to a week before the workshop, and attendance is generally good because people have paid out of their own pockets or have been sponsored by their school to come. The fee covers the costs of room hire and the fairly lavish morning and afternoon teas that keep people from wandering off in search of sustenance during breaks. The Brisbane instructor pool is fairly large so there is no need to fly people in, which means we don’t have to factor in those costs. One drawback is having to pay the catering and room hire costs up-front by personal credit card as the Eventbrite payout comes only after the workshop. This is a break-even charge - we could charge more and return some money to The Carpentries, but our main interest is in keeping costs low. Another model we have used is to charge a small fee of, say, $20 to get people to commit to coming, but return the value to them as a $10 lunch voucher each day. This only works when we don’t have to pay for room hire, and people do wander off in the breaks in search of coffee instead of sticking around as they do at the catered workshops. The South Africa model was to charge 500 Rand (around USD $42) per person, but this was not feasible outside metropolitan areas, where it dropped as low as 50 Rand (USD $4). Many workshops are now externally funded which removes the need for having to charge a fee and as local instructor pools grow, the costs of running workshops are also decreasing considerably. We have also found that it is often easier for research groups or funders to sponsor smaller amounts or specific items, such as flight tickets or catering. By pooling resources, we’ve been able to run many workshops at very low cost per sponsor rather than trying to source one lump sum of money from one sponsor. Online services to manage the admin side can be problematical as the payout only comes afterwards whereas costs may be up-front. Luckily, no-shows are rare. In Canada, there can be a problem with tax collection and reporting when running on-campus events. That means finding some university person willing to use their account in the system to ‘sponsor’ the workshop for the tax side of things. If fees are charged for catering, universities generally then charge for the room hire, so this starts to make workshops more expensive. Currently room hire has been free thanks to personal lobbying and favours, but may become more of a problem as workshop numbers ramp up. Another model that can work well is tying a workshop in with a conference or other big event happening locally at your institution. Generally you can get funding for the workshop from the conference, or charge a small fee that will cover any costs you have. Where can allies on campus be found? Libraries are good places to ask for support, and they may also have teaching rooms that can be used. Also any people involved in IT, especially research IT, HPC, student support or graduate schools are potential allies, as are postgraduate student societies of all kinds. Off-campus allies can also help. Meetup can help you find local coding groups. If you make connections and raise awareness through those, it helps spread the word. People from local tech companies or organizations may be willing to act as workshop helpers because they need to learn how to teach people. How do people advertise workshops? Twitter seems to be the main avenue for getting the word out. Facebook, especially posting to a university-related student page, can be a good place to post an event. It is worth trying email lists within a university, posters on bulletin boards, and asking people to spread the word through their contacts. Consider using a workshop title like “Data Analysis with R” if awareness of Software and Data Carpentry is low to non-existent where you are based. Use the poster or email to say what people will learn rather than assuming they will know. Something like “Start using computation in your research” or “Start using R to analyse and visualise your data” tells people what to expect. Read More ›

Launching our New Handbook
Tracy Teal, Maneesha Sane, Belinda Weaver / 2018-04-11
Find new pathways to a range of Carpentries' materials Read More ›

Building Library Carpentry Community and Development
Tracy Teal, John Chodacki, Chris Erdmann / 2018-04-11
We are excited to announce that Chris Erdmann has been hired as the Library Carpentry Community and Development Director starting May 4, 2018. Chris has worked in libraries for more than 21 years to integrate data management and workflows in database and library systems. Through training, consulting and tool development to build programs, he has tried to empower people in research and library communities to work effectively with data. Chris received his MLIS at the University of Washington iSchool while working at the University’s Technology Transfer Office, where he helped automate workflows and develop the unit’s web presence and analytics. He spent roughly ten years working alongside astronomers at the European Southern Observatory (ESO) and Harvard-Smithsonian Center for Astrophysics to advance library data-mining and linking services, e.g. ESO Telescope Bibliography. Also during this time, he led an experimental training series called Data Scientist Training for Librarians (DST4L) geared towards teaching librarians data-savvy skills to help transform their library services to meet the needs of their research communities. He recently joined the Library Carpentry governance group. He is a co-author with Matt Burton, Liz Lyon, and Bonnie Tijerina on the recent report Shifting to Data Savvy: The Future of Data Science In Libraries, where Library Carpentry and The Carpentries are highlighted as a necessary next step for libraries to advance their research services. Chris will be working with the Library Carpentry community and The Carpentries to start mapping out the infrastructure for growing the community, formalizing lesson development processes, expanding its pool of instructors, and inspiring more instructor trainers to meet the demand for Library Carpentry workshops around the globe and thus reach new regions and communities. This new position is funded by IMLS and hosted by the University of California Curation Center (UC3), the digital curation program of the California Digital Library (CDL). It is intended to support the work of the Library Carpentry governance committee on streamlining operations with The Carpentries, determining standard curriculum, growing instructor training for librarians and planning for community events like the upcoming Mozilla Sprint to update Library Carpentry materials. Chris will be helping to manage the sprint work in the northern hemisphere. Chris is excited about advancing the profession and sees the Library Carpentry and The Carpentries communities as the perfect catalyst to do that. He is on Twitter as @libcce, on GitHub and on LinkedIn, and we’re very excited to welcome Chris to this role! For more information on Library Carpentry, see https://librarycarpentry.github.io. Follow @libcarpentry on Twitter. For more information on UC3 and California Digital Library, see http://uc3.cdlib.org. Follow @caldiglib and @UC3CDL on Twitter. Read More ›

Developing GitHub labels for The Carpentries lessons
François Michonneau / 2018-04-05
The process of developing GitHub labels for our lessons. Read More ›

Mentoring Groups Open for Multiple Timezones
Kari L. Jordan / 2018-04-04
Did you miss the deadline to join a mentoring group? Do not fear, there are still openings for mentees to join groups in the following timezones: UTC+2 UTC+1 UTC+8 UTC+11 To join, fill out this application. Mentoring groups are beneficial to participants because group members are able to focus on specific goals, including teaching their first workshop and developing new lesson contribution material. Being a part of a group that addresses something important to you is both powerful and enjoyable. You do not want to miss out on the Carpentries mentoring opportunities. Join a mentoring group today! Read More ›

What can I do during the Bug BBQ? › How to be involved in the Bug BBQ?
François Michonneau / 2018-04-03
A short guide on how to contribute and be involved during The Carpentries Bug BBQ. Read More ›

Who Belongs at CarpentryCon 2018? You Do!
Belinda Weaver / 2018-03-23
CarpentryCon 2018 will be the key community building event in the Carpentries’ annual calendar. To be held in Dublin from 30 May - 1 June, 2018, CarpentryCon will be three action-packed days of skill ups, breakout sessions, talks, social events (there will be a conference dinner), and workshops. Who do we want to see there? YOU! Grad student? Tick. Post-doc? Tick. Working in industry? Tick. Project or lab leader? Tick. Research Software Engineer? Tick. Librarian supporting researchers? Tick. The list could go on, but what it boils down to is there’ll be something for everyone. Maybe you are very new to the Carpentries community. Maybe you have attended a workshop, helped at a workshop, or just heard that the Carpentries do amazing things to help researchers. Come to CarpentryCon and find out more about what we do, how you can get involved, and how being part of this great community can change your skill levels, your career trajectory, and maybe even your life. Lots of people have changed careers after attending our workshops, after finding that the coding and data skills they thought were unattainable were actually well within their grasp. So don’t let being new to our community be a barrier - we have all kinds of plans to help even the newest members of our community feel that they belong with us. Maybe you are based somewhere that has no Carpentries community. If so, then joining our community - our global community - is one way to end that isolation. Find people working in the same discipline, or with the same tools, and make connections to help you feel less alone once you go back home again. Who knows? Maybe you will be inspired to kick start your own community. If so, we have lots of ways we can help. Perhaps you’re already a Carpentries Instructor, Trainer, or Lesson Maintainer? (Did you hear that rhyme?) CarpentryCon will provide lots of opportunities for those groups both to network and cross-network. Older hands - people who have been in the community for a while - we want you at CarpentryCon to show new people the ropes and help them develop the skills to drive their careers forward. From sessions on leading projects or research labs to breakouts on mentoring and diversity, our community has so much to offer new people just starting out. We plan to offer professional development sessions that will augment your Master’s or PhD qualifications so that your career can really go places. (Have a look at what skills volunteering as an Instructor, Trainer, or Lesson Maintainer in the Carpentries can add to your CV.) And did I mention that the contacts you will make at CarpentryCon will be invaluable? We also have some fantastic keynote speakers. Valerie Aurora, a diversity and inclusion consultant who co-founded the Ada Initiative, will not only speak, but will run her fantastic Ally Skills Workshop on the final day. Anelda van der Walt - and who better? - will talk about community building throughout Africa. Desmond Higgins, who developed Clustal, will keynote about that groundbreaking work. And Software Carpentry co-founder Greg Wilson will speak about how we all got here in the first place. We will have plenty of Library Carpentry content - not least an onboarding workshop for existing Instructors who want to teach the material. We will have Next Steps in R and a session on using Carpentry methods in university courses. We have also slotted in some Open Mic ‘unconference’-style sessions so good ideas on the day can have a chance to be voted on to the program. And it’s Dublin in May! Where else would you rather be? Check out the program, submit an abstract for a poster or lightning talk, but most of all … GET YOUR TICKET TODAY! Early Bird sales end in just a few hours! Read More ›

Software Carpentry: Considering the Future
Christina Koch / 2018-03-23
When Software Carpentry became part of the merged Carpentries this January, outgoing Software Carpentry steering committee members Rayna Harris and Christina Koch were tasked with continuing the work of the steering committee. For the past several weeks, we’ve been working together with a small group to determine what Software Carpentry needs as a sub-community to grow and thrive in the future, and we have identified two primary goals. One first goal is a restatement of Software Carpentry’s mission. The Carpentries are growing by leaps and bounds, so this seems like a timely moment to stop and think about what Software Carpentry is doing as a branch of the Carpentries and why. This exploration of mission and identity isn’t just so we can have a nice tagline, but to provide direction and a reference point for future decisions that impact the community. Our second main goal is to improve the support for maintaining and curating the lessons we already have. But we can’t do these things alone! We’re looking to the community to help us achieve our goals. This has already been happening for our second goal, as Erin Becker (even before the Carpentries merged) has been building up the community of lesson maintainers and recruiting new lesson maintainers. Most recently she opened applications for a Software Carpentry curriculum advisory committee that will be a resource for individual lesson maintainers (see the blog post here for more information). We’d like to now take action to make our first goal happen. Moving forward, over the next 6-8 weeks, we’d like to invite all members of the Software Carpentry community to join us in thinking about our identity and what we are hoping to accomplish as Software Carpentry lesson developers, maintainers, instructors, workshop organizers, learners, and champions. Whether you’re a newly trained instructor, have commented on the lessons many times, or are just organizing your first workshop – we want to hear from you; all are welcome. Ideas, suggestions and concerns from the community will be gathered in two ways. The first way is to talk with us “in person” by joining one of the hour-long calls that we’ll be scheduling. If you don’t have the time for a call, we will also have a form with suggested questions to fill out. We know that this is a bit of an ask – everyone is busy (who wants another meeting or form to fill out?). However, based on my past experience in workshops, trainings, and the mentoring-hosted discussion sessions, I always leave a conversation with fellow “carpenters” with new ideas, new connections and a lot of hope about the world. This is an invitation to give that to yourself by joining the conversation, with the added bonus of helping us craft a summary of this community’s mission and values that truly reflects its members. Action Item Sign up on the agenda to attend a call or fill out the form. We look forward to hearing from you! We plan to be receiving community feedback from now until the end of April. We will review these community contributions and present them to the community some time in May. Read More ›

Webinar with Rochelle Tractenberg: Debrief
Marianne Corvellec, Karen Word / 2018-03-20
On February 2, the Assessment Network held a well-attended webinar with Rochelle Tractenberg. Dr. Tractenberg holds two PhDs, one in Cognitive Sciences and the other in Statistics, and directs the Collaborative for Research on Outcomes and Metrics at Georgetown University, where she is a tenured professor in the Department of Neurology. It was a great privilege for our community to be able to engage in a conversation with her and to learn from her expertise. Our starting point was the controversy about short-format training which arose last year, following the publication of a PNAS paper titled “Null effects of boot camps and short-format training for PhD students in life sciences.” The Carpentries design and deliver short-format training for working with software and data; trainees are researchers from various fields. The Carpentries’ initial response to that paper discussed many ways in which we have been successful with respect to our goals for Software and Data Carpentry workshops. However, given that short-format training is a known challenge for generating sustainable content learning, we hoped that Dr. Tractenberg’s expertise might shed some light on areas with room for improvement. Dr. Tractenberg identified two of our strategies (i.e., “meet learners where they are” and “explicitly address motivation and self-efficacy”) as areas which could benefit from our leveraging of tools and concepts from educational psychology. So,Dr. Tractenberg introduced us to Bloom’s taxonomy (1956) and Messick’s criteria (1989). Bloom’s taxonomy comes with six levels of learning objectives, corresponding to increasing and accumulating complexity in thinking. These are: Remember/Reiterate Understand/Summarize Apply/Illustrate Analyze/Predict Create/Synthesize Evaluate/Compare/Judge Messick’s criteria ask three questions to be used for instructional design and evaluation: What are the Knowledge, Skills, and Abilities (KSAs) desired for learners? What actions/behaviors will reveal that those KSAs are present? What tasks will elicit those actions/behaviors? To “meet learners where they are,” we could identify which Bloom’s level they are at and which higher Bloom’s level we are planning on taking them to. We would then use Messick’s KSAs to design, evaluate, and educate learners about where they are in the learning process. For example, when teaching programming with Python, we may start from the understanding of a for loop (B2) and get to applying it to solve a specific problem (B3). However, if learners don’t know what a for loop is when they enter (B1), it might be better to constrain our short-term objectives to achieving understanding (B2). Setting our goals in terms of “growing a level” means our target outcomes might not match the level at which trainees need to operate once on their own. In their work, researchers typically operate at the highest levels of complexity, evaluating (B5) and creating (B6). The fact that researchers do habitually operate at these higher levels is helpful insofar as they know what it’s like to think in these ways. To help them be successful in meeting these post-workshop (real-world) goals, we should educate our learners about what they have achieved, but also about what the next steps of advancing in the Bloom’s hierarchy might look like. This not only gives them next steps to plan on, it also fosters metacognition – the process of thinking analytically about the process of learning, which is key to sustainability. To “explicitly address motivation and self-efficacy,” metacognition is particularly key. Articulating expectations with regard to learning growth will also be helpful here, to help learners perceive their accomplishments within the workshop setting. However, to set the stage for sustainable learning, we should aim to educate learners about the road ahead of them and offer guidance about the kind of learning they should expect to be doing after they leave the workshop. Anticipating the steps that lie between them and the ability to actually apply their new skills to their research is key to both taking those steps and appropriately evaluating success as that learning continues. While the Carpentries haven’t explicitly taught Messick, Bloom’s taxonomy has been present in the instructor curriculum up to its most recent iteration, and remains in spirit as we guide instructors through interpretation of our learning objectives. Its recent removal is owed to the fact that instructors are not typically the ones defining those learning objectives in the first place, so this is something we may wish to consider as we go about creating on-boarding procedures for curriculum designers and maintainers. While learning objectives are present throughout our curriculum and have been crafted with Bloom’s in mind, they are not necessarily specific in targeting a goal of “growing a level.” Furthermore, most of the education that we offer with regard to future learning occurs peripherally, e.g. as instructors model and describe their own learning process. There is clearly room for these ideas to grow within our community, and we will be keeping these suggestions firmly in mind as we move ahead. We will also be keeping in touch with Dr. Tractenberg, as she will be joining us for our next meeting of the Virtual Assessment Network on March 23rd! Click here for your timezone. All are welcome to attend – please sign up via this Etherpad. Contact Kari Jordan (email) with questions or requests to join the Assessment Network list. If you missed the webinar or want a refresher, check out the Etherpad, the annotated slides, the video recording, or the audio-only recording. Read More ›

Welcome to New Trainers
Karen Word, Erin Becker / 2018-03-14
We are excited to welcome seventeen newly badged Trainers to our community. The group recently finished their program with Karen Word and are now certified to teach new Carpentry Instructors. Join us in welcoming Tania Allard, Anne C. Axel, Mik Black, Murray Cadzow, Mesfin Diro, Caroline Fadeke Ajilogba, Anne Fouilloux, Alejandra Gonzalez-Beltran, Claire Hardgrove, Toby Hodges, SherAaron Hurt, Senzo Mpungose, David Perez-Suarez, Juliane Schneider, Nicholas Tierney, Jessica Upani, Elizabeth Williams. This was a very widely distributed group, so congratulations to the group and to Karen for making the training work across a challenging number of time zones. Instructor training in New Zealand will get a boost with the addition of Mik and Murray. Adding Claire and Nicholas has doubled Trainer numbers in Australia. Expansion of Carpentries activity throughout Africa will accelerate with the addition of new Trainers Caroline, Senzo, Mesfin, and Jessica. Senzo and Caroline have already co-instructed with Erin Becker and Martin Dreyer at this workshop. The revival of the African Task Force should also spark an uptick in activity across the African continent. Alejandra, David, and Tania will also help build our Spanish language representation as we expand into Central and South America. We look forward to many upcoming opportunities to teach with our new Trainer cohort! Read More ›

Revival of the African Task Force
Caroline Ajilogba, Mesfin Diro, Erika Mias, Lactatia Motsuku, Kayleigh Lino, Juan Steyn, Katrin Tirok, Anelda van der Walt / 2018-03-13
Over the past five years, the Carpentries have gained considerable traction in Africa. Since the first online instructor training in 2015, more than 100 African-based instructors have been trained of which more than 40 had qualified. An estimated 50 workshops have taken place on the continent in countries including South Africa, Namibia, Botswana, Mauritius, Kenya, Ghana, Gabon, and Ethiopia. More than 15 instructors from other continents have visited the continent to join local instructors in building community and teaching workshops. These include instructors from the UK, several states in the USA, Canada, the Netherlands, and New Zealand. After the first in-person instructor training in South Africa, an African Task Force was established to help mentor trained instructors and support them throughout the instructor checkout process. This task force consisted of volunteers from Australia (Belinda Weaver), the UK (David Perez-Suarez), and the USA (Matthew Collins, Deb Paul, and Henry Senyondo). Each task force member was assigned a group of trainees and worked with them in their own way - including online meetings, online demo sessions, and support via email. At the time, connectivity (thus communication with the African instructors) as well as isolation, and lack of local community, were some of the challenges experienced by both mentors and mentees. To date, 11 out of 23 trainees from the first in-person instructor training in Africa had qualified. The low turnover rate from trainee to a qualified instructor, and the lack of opportunity for new instructors to teach have been a concern of the local instructor community for some time and in December 2017 the African Task Force was brought to life again to address these issues. The new task force consists of eight African-based instructors based in South Africa and Ethiopia. The task force members represent a variety of disciplines including the libraries, digital humanities, bioinformatics, public health, ecology/engineering, life sciences, and computer science. Members include: Caroline Ajilogba Mesfin Diro Erika Mias Lactatia Motsuku Kayleigh Lino Juan Steyn Katrin Tirok Anelda van der Walt The task force will focus on assisting trained instructors to qualify, mentoring instructors and helpers before teaching a workshop, and generally nurturing a healthy African Carpentry community of instructors. We will also provide clearer communication about the process for running workshops in Africa, and for volunteering o teach at workshops. We will work closely with the African Workshop Administrators and the Mentoring Subcommittee. Some of the activities of the African Task Force (specifically the South African-focused activities) are funded through the Rural Campuses Connection Project II (RCCPII). Funding for activities elsewhere in Africa is currently mostly on an ad hoc basis and the task force hope to provide resources to those who would like to apply for larger grants to run Carpentry activities in their other countries on the continent. The task force will meet in person twice per year and members will serve until March 2019. The activities of the task force will be re-evaluated throughout early 2019 to make recommendations about the way forward and to recruit new members should it be viable. In our next post, we will share more information about where to start if you want to run a local workshop. Please join the African Carpentries Google Group if you would like to be informed of local activities, opportunities, and more. We are looking forward to working with our local and international community over the next 12 months and thank everyone for their enthusiastic suppor Read More ›

Carpentries para Latinoamerica
Paula Andrea Martinez, Alejandra Gonzalez-Beltran / 2018-03-12
La idea de traducir las lecciones de Carpentry al español y otros lenguajes no es nueva [1, 2, 3]. A finales del 2017, un grupo de voluntarios (ver Grupo inicial) nos embarcamos en la meta de hacer las traducciones al español una realidad. Estamos muy felices de que muchas más voluntarias y voluntarios se nos han unido en este esfuerzo y hoy podemos dar a conocer el fruto de estos meses de trabajo a toda la comunidad. ¡Lo logramos! La experiencia: La experiencia de traducir las lecciones de The Carpentries ha sido muy valiosa. “¡Para mi esta experiencia ha sido increíble! gracias por poner todo su esfuerzo en este proyecto. Y por otro lado, la respuesta de mucha gente con ganas de colaborar, que desbordaron entusiasmo en participar en las distintas facetas de este proyecto. Un placer trabajar con ustedes.” - H.S. “Estoy muy feliz de participar (y al mismo tiempo de aprender) en este proyecto” - I.L. “¡Gracias a todos! Todas sus contribuciones son un gran paso para realizar una meta de mucho tiempo, que es traducir las lecciones de @swcarpentry del Inglés al Español!” - R.H. “Fue una gran experiencia. Me encantó el ambiente de trabajo y entusiasmo del grupo. La retroalimentación y análisis, me enriqueció mucho.” - V.J.J. Los logros: Guía para los colaboradores (Glosario y lineamientos) Lección de la terminal de shell traducida Lección de control de versiones con Git traducida Traducción del template de estilos Construcción de una comunidad de habla hispana en las Carpentries, permitiendo la difusión de las nociones de programación y mejores prácticas para hacer la investigación reproducible El proceso de traducción: Las traducciones se organizaron eligiendo las lecciones a traducir y asignando un traductor encargado de cada episodio. Posteriormente, cada traductor se hizo cargo de revisar otro de los episodios ya traducidos (el proceso de revisión se hizo en dos rondas). Durante todo este proceso, los traductores intercambiaron sugerencias y produjeron una serie de lineamientos a seguir para futuras traducciones (ver logros). La mayor parte del trabajo se realizó en línea, de manera remota, pero también a través de algunas ‘hackathons’ o ‘do-a-thons’, en donde los grupos de voluntarios trabajaron en conjunto estando geográficamente en el mismo lugar. Planes para el futuro Siempre hay oportunidades para seguir mejorando las lecciones, y esperamos sus contribuciones. Empieza por leer y usar las lecciones y si tienes alguna sugerencia puedes abrir un nuevo “issue” en cualquiera de los repositorios para realizar tu colaboración. Las versiones en inglés y español comenzarán a diferir sustancialmente según las contribuciones de la comunidad; por lo tanto, antes del próximo lanzamiento (en ~ 6 meses) trabajaremos con un coordinador de traducción para incorporar cambios bidireccionales en las lecciones de inglés y español. Los voluntarios: Todo este trabajo ha sido posible gracias al esfuerzo y tiempo de muchos voluntarios de varios países incluyendo Argentina, Bolivia, Brasil, Cuba, España, Estados Unidos, Guatemala, México, Perú, Uruguay, Venezuela. Agradecemos especialmente: El grupo inicial: Heladia (Hely) Salgado y todo su grupo de trabajo en México (Shirley Alquicira, Leticia Vega, Verónica Jiménez Jacinto, Irma Martínez-Flores, Kevin Alquicira, Romualdo Zayas-Lagunas, Daniela Ledezma, Laura Gómez Romero y Juan M. Barrios). Francisco Palm que desde Venezuela nos apoya en temas de infraestructura. Paula Andrea Martinez, comunicandome con toda la gente que nos quiere ayudar. Selene Fernandez por compartir su versión traducida de la lección de Git. Voluntarios de Software Carpentry y Data Carpentry Sue McClatchy por ayudarnos en el inicio a hacer la convocatoria abierta para interesados en traducir las lecciones. Raniere Silva que nos ayudó mucho con el template de estilo Paula Andrea Martinez y Rayna Harris co-organizando las Hackathons Voluntarios que se unieron a través de la invitación abierta: Silvana Pereyra, Hugo Guillen, Otoniel Maya, Javier Forment, Matias Andina, Olemis Lang, Laura Angelone, Alejandra Gonzalez-Beltran, Ana Beatriz Villaseñor Altamirano, Kevin Martinez-Folgar, Nohemi Huanca Voluntarios de la Do-a-Thon de OpenCon 2017 Rayna Harris, Paula Andrea Martinez, Guillermina Actis, Julieta (Juli) Arancio, Eunice Mercado, Ivonne Lujano Organizadores del Hackathon y taller planificado en Buenos Aires Rayna Harris, Juli Arancio, and Marceline Abadeer Empleados The Carpentries: Erin Becker y François Michonneau, por mover el tren de la publicación. Registros de publicación Heladia Saldago (ed), 48 authors: “Software Carpentry: La Terminal de Unix”, The Carpentries, Version 2018.04.1, March 2018, 10.5281/zenodo.1198732 Rayna Harris (ed), 49 authors: “Software Carpentry: Control de Versiones con Git”, The Carpentries, Version v2018.04.3, March 2018, 10.5281/zenodo.1197332 Read More ›

Mentoring Groups are Back!
Kari L. Jordan / 2018-03-12
In February, we held our mentoring groups virtual showcase, and community members engaged with mentors and mentees virtually to hear about the accomplishments they made over the course of their mentoring experience. Not only were mentees able to finish their instructor checkout tasks and contribute to the CarpentryCon taskforce, mentors started local communities and reconnected with community members globally. You’ve expressed to us and each other the benefits these mentoring groups have had on your growth in and outside of the Carpentries. You are building powerful, and meaningful connections by developing peer communities. Because of that, we want to continue to support these groups, and give more community members the opportunity to join, either as a mentor, or a mentee. The next round of mentoring groups will run from April 9th to August 13th. Get a head start on joining a mentoring group by attending one of two upcoming information sessions (to suit different time zones). Sessions will be held on March 15th at 11:30 UTC and 20:30 UTC. Sign up to attend either session on this etherpad. Applications for both mentors and mentees are now open, and due by March 23rd. Mentor applications are open to instructors who have taught at least two workshops. Mentee applications are open to instructors who have taught less than two workshops. If you’d like to serve as a mentor, please complete the mentor application. If you’d like to be a mentee, please complete the mentee application. Many mentor/mentee relationships extend well beyond the program time. Don’t miss out on the opportunity to connect, learn, and grow with community members. Join a mentoring group! Tweet us your thoughts (@datacarpentry, @swcarpentry, @thecarpentries, @drkariljordan) using the hashtag #carpentriesmentoring. Read More ›

Carpentries for Latin America
Paula Andrea Martinez, Alejandra Gonzalez-Beltran, Rayna Harris / 2018-03-12
The idea of translating Carpentries’ lessons into Spanish and other languages is not new [1, 2, 3]. At the end of 2017, a group of volunteers (see Initial Group members below) embarked on the goal of making Spanish translations a reality. We are very happy that many more volunteers joined us in this effort. We did it, and today we can make the fruit of these months of work known to the entire community! The experience: The experience of translating The Carpentries lessons of has been very valuable. “For me this experience has been incredible! Thank you for putting all your effort into this project. Many people responded with overflowing enthusiasm to participate and collaborate in the different facets of this project. It was a pleasure to work with all of you.” - H.S. “I’m very happy to participate (at the same time learn) in this project!” - I.L. “Thanks everyone! Your contribution is a giant step toward realizing a long-time goal of translating @swcarpentry lessons from English into Spanish!” - R.H. “It was a great experience. I loved the work environment and enthusiasm of the group. The feedback and analysis were very enriching for me.” - VJJ The achievements: A glossary and guidelines document for collaborators A translation of the Unix Shell lesson A translation of the Version Control with Git lesson A styles template for translated lessons Building a Spanish-speaking community in the Carpentries to further disseminate the best practices of programming and reproducible research The translation process: The translations were organized by choosing the lessons to be translated and assigning a translator in charge of each episode. Subsequently, each translated lesson was reviewed by two other translators. Throughout this process, the translators exchanged suggestions and produced a series of guidelines to follow for future translations (see achievements). The majority of the work was done asynchronously, but some volunteer groups worked together geographically in the same place during ‘hackathons’ and ‘do-a-thons’. Future plans There are always opportunities to keep improving the lessons, and we look forward to your contributions. To contribute, start by reading and using the lessons, and open a new issue or submit a pull request to collaborate. The English and Spanish versions of lessons will began to differ following contributions of the community; therefore, before the next release (in ~ 6 months) we will work with a translation coordinator to introduce bidirectional changes to the English and Spanish versions. The volunteers: All this work has been possible thanks to the effort and time of many volunteers from several countries, including Argentina, Bolivia, Brazil, Cuba, Spain, United States, Guatemala, Mexico, Peru, Uruguay, Venezuela. We especially appreciate: The initial group: Heladia (Hely) Salgado and her entire working group in Mexico (Shirley Alquicira, Leticia Vega, Veronica Jimenez Jacinto, Irma Martinez-Flores, Kevin Alquicira, Romualdo Zayas-Lagunas, Daniela Ledezma, Laura Gomez Romero and Juan M. Barrios). Francisco Palm, who supported the infrastructure. Paula Andrea Martínez, communicated with all the volunteers. Selene Fernández, who shared her translated version of Git. Volunteers of Software Carpentry and Data Carpentry Sue McClatchy for helping us in the beginning with the open call for participation in the translation of the lessons. Raniere Silva who helped us a lot with the style template Paula Andrea Martínez and Rayna Harris co-organizing the Hackathons Volunteers who joined through the open invitation: Silvana Pereyra, Hugo Guillén, Otoniel Maya, Javier Forment, Matías Andina, Olemis Lang, Laura Angelone, Alejandra González-Beltrán, Ana Beatriz Villaseñor Altamirano, Kevin Martínez-Folgar, Nohemi Huanca Volunteers of Open-Con 2017’s Do-a-Thon Rayna Harris, Paula Andrea Martinez, Guillermina Actis, Julieta (Juli) Arancio, Eunice Mercado, Ivonne Lujano Organizers of a Hackathon and a planned workshop in Buenos Aires Rayna Harris, Juli Arancio and Marceline Abadeer Employees The Carpentries: Erin Becker and François Michonneau, for the push to publish. Publication Records Heladia Saldago (ed), 48 authors: “Software Carpentry: La Terminal de Unix”, The Carpentries, Version 2018.04.1, March 2018, 10.5281/zenodo.1198732 Rayna Harris (ed), 49 authors: “Software Carpentry: Control de Versiones con Git”, The Carpentries, Version v2018.04.3, March 2018, 10.5281/zenodo.1197332 Read More ›

Call for Code of Conduct Committee Volunteers
Kari L. Jordan, Tracy Teal, Erin Becker / 2018-03-08
The Carpentries are dedicated to developing and empowering a diverse community of enthusiasts around computational methods for research and data science. Whether you are an Instructor, learner, Maintainer, Mentor, Trainer, Executive Council member, Champion, or member of our Staff, you belong to this community. We are committed to creating avenues for you to contribute that are welcoming and inclusive, whether in-person or online. Our Code of Conduct (CoC) serves a vital role in this commitment, as it outlines our commitment to provide a welcoming and supportive environment to all people regardless of who you are, or where you come from. Our CoC outlines very detailed reporting guidelines and an enforcement policy, so that, should violations occur, members of our community can rest assured that their concerns are being handled appropriately and in a timely manner. The CoC was last updated in 2016. To read more about how these documents were developed, see this blog post. In a recent review however, we realized that we needed to increase the transparency of how the Code of Conduct Committee handles Code of Conduct incident reports, update wording that would discourage reporting incidents and update how we handle urgent situations. We’re consulting with Sage Sharp of Otter Tech Diversity and Inclusion Consulting to address these issues and make appropriate updates. In revisiting the CoC, we now also have openings for the Code of Conduct Committee, and are looking for committee members who are passionate about equity and inclusion to serve on this committee. The Carpentries Code of Conduct Committee manages all responses to reports of conduct violations, and advises the Executive Council on the need to alter any of the policies under its purview. As this committee deals with complex issues including ethics, confidentiality, and conflict resolution, upon appointment, members of the committee will receive incident response training from Otter Tech. Be a part of a committee that ensures that our community continues to thrive on diversity of thought and perspective. Complete this application to apply to serve. The deadline for applications is Monday, March 19 at 1100 UTC. Should you have questions about the Code of Conduct, or want clarification on the roles and responsibilities of this committee, please contact the CoC committee. Read More ›

Announcing the first joint Carpentries Bug BBQ
François Michonneau, Erin Becker / 2018-03-08
The Bug BBQ is a Carpentry-wide event to improve all our lessons – both existing Software and Data Carpentry lessons, and new releases. We welcome contributions from the Community on all of our lessons. If you want to contribute, but you are not sure where to start, head over to the Bug BBQ website where we will highlight how to get involved. The Carpentries are preparing to publish the Social Sciences and the Geospatial Data Carpentry curricula on Zenodo. The Social Science Lessons will be published in April, and the Geospatial Lessons in June. This will be the first publication for these lessons. We regularly publish our lessons (SWC, DC) to provide stable identifiers for polished versions of the lessons. This enables referenced discussions of the lesson materials and gives contributors a verifiable product to cite on their CVs or resumes. This release will include the following lessons: Geospatial Curriculum Introduction to Geospatial analysis Geospatial analysis in R Social Science Curriculum Spreadsheet organization SQL Python OpenRefine R Get involved! If you’ve made a contribution to one of these lessons, you’re already an author. Help make sure the final product is polished and complete by getting involved in the lesson release. We are organizing a Bug BBQ to prepare these lessons for release. The main goal for the Bug BBQ is to get the Geospatial and Social Science lessons ready for release. However, if you are a Maintainer for another lesson, and you are available and interested in getting some extra eyes on your lessons, let François know and we’ll include your repository on the Bug BBQ website. Bug BBQ details The Bug BBQ will start on April 12th, 2018 at 9am Eastern Time USA (1pm UTC) and end on April 13th, 2018 at 5pm Pacific Time USA (midnight on April 14th UTC). (Click on the links to see these times for your time zones) Join with the community in a hacky-day to submit Issues and PRs to identify and fix problems and get us ready to publish. We’ll provide communication channels for you to work with other community members and guidelines for how to get started. Keep an eye open for more information about the Bug BBQ! We’re excited to work with the community to release these lessons. Put these dates on your calendar, and we’ll send out reminders and updates too. These lessons belong to the community - help keep them great! What’s a Bug BBQ? During a Bug BBQ, the community gathers online to squash and grill as many bugs as possible to make our lessons polished and ready to be officially released. This is a distributed event, so you don’t have to go anywhere. We’ll use GitHub, Slack, and a website to get organized. If you are part of a local Carpentries community with several people interested in taking part in the Bug BBQ, feel free to organize a real BBQ to feed the crowd. If you plan on getting together, let us know by opening an issue! We’ll add you to the website so other people can join you. How long should I attend? The Bug BBQ lasts almost 36 hours to accommodate working hours across the globe. We are a global community and we want everyone to have a chance to participate. Feel free to participate for as little or as long as you want. However, note that contributions made when sleep deprived are rarely the best ones. If you are a Maintainer, please coordinate with the other Maintainers for your lesson to be ready to review, and provide feedback on the issues and pull requests that you will receive during the Bug BBQ. Who is the Bug BBQ for? Everyone is welcome to participate even if you are not familiar with the content of the lessons. We need your help to find typos, issues with formatting, help new contributors submit pull requests, answer general questions, review suggested changes, and more! If you have questions, please contact François. Read More ›

First African Carpentries Instructor Training of 2018.
Martin Dreyer / 2018-03-06
On 21-23 February 2018, the fourth South African Carpentries instructor training for African based Instructors took place in Kleinmond, Western Cape. The workshop was funded through the Rural Campuses Connection Project Phase II (RCCPII) Tertiary. The lead Trainer, Erin Becker, is the Associate Director of The Carpentries and currently based near San Francisco, California. Other Trainers included Senzo Mpungose, Scientific Data Center and Infrastructure Manager for Mathematical Sciences at WITS, Caroline Ajilogba, a Postdoctoral Researcher from the Microbial Biotechnology group, NWU Mafikeng Campus, and Martin Dreyer, an eResearch support consultant at NWU IT. The workshop was the first of its kind in South Africa in the sense that it was the very first instructor training event in which new instructors would check-out and qualify as Carpentry Instructors by the time the workshop ended. Day 1 started with welcoming and introductions, it was also at this time that we learned that three of the four Trainers would teach Carpentry Instructor training for the first time, which had everyone a little nervous for a short while. But as soon as everything started, and with the guidance of Erin and Anelda, the first day went well with the exception of internet issues here and there along with some outside noises that sometimes made it difficult to hear. During day 2 the newly trained Instructors would already tick off one of the three things they needed to do to qualify as Carpentry Instructors by participating in the African Carpentries Instructor Monthly meetup and discussion session. On the evening of day 2 we had a networking function and dinner in order to get to know each other better in a “less formal” setup, and there Erin had everyone move a few times so that we could mingle. This provided the workshop participants and Trainers with a good platform to get to know each other. It also provided some well-deserved entertainment. Day 3 started with a quick demonstration on setting up your own workshop website and using GitHub. The participants also had an opportunity to tick the second item off of the check-out list by contributing to lessons. By the afternoon, everyone who wanted to teach their five-minute demo session were divided into smaller groups and each group had a Trainer assigned to it and the new Instructors could teach their chosen lesson. By mid-afternoon the Trainers announced that everyone who did the demo session passed and we had 12 new South African-based Carpentry Instructors. Some of the instructor trainees decided to observe the teaching demos and opted to check out on a later date. Read More ›

Call for Contributions: Data Carpentry Ecology and Software Carpentry Curriculum Advisory Committees
Erin Becker, Christina Koch / 2018-03-06
In mid-2017, Data Carpentry piloted a Curriculum Advisory Committee (CAC) for the Genomics curriculum. The goals for this committee were to provide general oversight, vision, and leadership for the full Genomics lesson stack, to ensure that the lessons stay up-to-date with existing best practices in the field, and to continue to serve the needs of genomics practitioners attending our workshops. Genomics Curriculum Advisors met after the initial Genomics lesson publication in November to discuss proposed structural and major topical changes to the lessons and will be helping the Genomics Maintainer team to make decisions about these changes as we prepare for a second release this year. Since first piloting the idea of a CAC, we’ve learned from our Maintainer community that this type of overall guidance is strongly desired by Maintainers for other lessons! Maintainers often face challenges trying to decide whether proposed large-scale changes are appropriate for their lessons. As Maintainers are usually not deeply familiar with other lessons in their curricular stack (and aren’t expected to be!), they often wonder how changes to their lesson will affect other lessons taught in the same workshop. Curriculum Advisors help to provide this higher-level oversight and take some of this burden away from the lesson Maintainers. Due to overwhelming enthusiasm from the Maintainer community, we are now recruiting for Curriculum Advisors for the Data Carpentry Ecology lessons and the Software Carpentry full lesson stack. Applications are open to all Carpentry community members. We strongly encourage applications from community members with current classroom teaching experience, university or college faculty and staff, and Maintainers for these lessons. Read more about the role of Curriculum Advisors. Apply to join the Data Carpentry Ecology Curriculum Advisory Committee Apply to join the Software Carpentry Curriculum Advisory Committee. Applications will be open through March 16th, 2018. Please contact Erin Becker (ebecker@carpentries.org) with any questions. Read More ›

Lesson Infrastructure Subcommittee 2018 February meeting
Raniere Silva / 2018-02-28
On 14 February 2018 at 18:00 UTC+0, the Lesson Infrastructure Subcommittee had their 2018 February meeting. This post will cover the topics discussed and their resolutions. Software Carpentry and Data Carpentry merge Software Carpentry and Data Carpentry have merged. Their Executive Council was elected, they have a new fiscal sponsor and a new logo. Also, they have hired Dr. François Michonneau to lead curriculum development efforts. This is all very exciting news. And February isn’t over yet. Windows Installer We will proceed to adopt Git for Windows v2.15.1, which includes nano, and drop our Windows installer. This should simplify our installation instructions and provide a better experience for our learners during workshops. All the details of these changes are covered in this document which summarises many pull requests and issues over the last 6 months. Workshop Installation Instructions In January, a blog post was published with a proposal to enhance the process of customizing the installation instructions on the workshop page. After an initial round of feedback, Raniere Silva, François Michonneau and Rémi Emonet will move the proof of concept included in the proposal forward to a feasibility prototype. The plan is to have the prototype ready in April and, if we decide to go forward, to made it available to instructors in July so workshops in August could benefit from this. Lesson Release 2018/01 Christina Koch (for Software Carpentry) and Erin Becker (for Data Carpentry) will coordinate the next Lesson Release, planned for the middle of the year. Maintainer survey on template As announced on the Maintainers mailing list, François Michonneau is running a survey around our lesson template. We should hear the results soon. Labels and Lessons If you don’t know, GitHub rolled out some improvements to labels. These changes couldn’t have come at a better time. Erin Becker and François Michonneau are working to revamp the labels used in the Git repositories that host the lessons. The new set of labels will facilitate navigation around issues and pull requests for maintainers and contributors. Next steps The subcommittee will meet again in April to provide an update on some of the topics covered by this post and discuss new requests from the community. Acknowledgement Thanks to Erin Becker, Rémi Emonet, Christina Koch, Geoff LaFlair, François Michonneau, Tracy Teal and Naupaka Zimmerman for the valuable contribution during the meeting. Special thanks to Christina Koch for the great notes. Read More ›

CarpentryCon - Hotel Accommodation Options
Belinda Weaver, Fotis Psomopoulos / 2018-02-28
Hotel Accommodation for CarpentryCon 2018 We have two hotel options for you for CarpentryCon 2018. Both hotels are offering us a special CarpentryCon rate. The special rates will be valid for the period of the event and available until the cut-off date; anyone looking to book after that date will be offered the best available rate at the time. So please book before 18 April if you want to lock in the cheaper rate. Clayton Hotel, Ballsbridge See it on the Dublin map. Book online or by calling the hotel +353 1 668 1111. Reservation Code: CARP300518. Cutoff date for booking: April 18th. Talbot Hotel, Stillorgan See it on the Dublin map. Availability: 29/05/2018 to 01/06/2018 (latest checkout on June 1st) Book online or by calling the hotel +353 1 200 1800. Reservation Code: CARPCON18 Cutoff date for booking: April 18th. We also hope to offer on campus lodging at UCD itself. Stay tuned for details. Read More ›

State of the State: Instructor Checkout
Erin Becker / 2018-02-27
This blog post is the second in a series examining the roles and contributions of the different parts of the Carpentry community. In case you missed it - read the first post in this series, about Maintainers. Carpentry Instructors are the core of our community. Without Instructors, there would be no workshops. Because of the vital role that Instructors play in advancing the Carpentry mission, we as a community take preparing Instructors very seriously. Before becoming certified Instructors, trainees must show familiarity with our curriculum, demonstrate their teaching skills (with a focus on the Carpentry pedagogical model), and interact with the broader Carpentry community. Software Carpentry Instructors also need to demonstrate familiarity with Git and GitHub. Since 2015, these goals have been served by a three-part checkout mechanism: Submitting a lesson contribution, Participating in an instructor discussion session, and Presenting a short teaching demonstration. These steps are estimated to take a total of 8-10 hours and are overseen by the Maintainers group, the Mentoring Subcommittee, and the Trainers group, respectively. These groups frequently discuss how to ensure that our checkout process is continuing to meet the needs of new Instructors as our community grows and changes. Recently, staff facilitated a set of discussions with the Mentoring Subcommittee, Maintainers, and Trainers, to understand whether there were reasons to remove one or more of the steps of the checkout process, and more broadly, to understand how members of these groups feel these steps are meeting Instructor’s needs. Getting input from each of these groups proved to be vital, as different parts of the community had different perspectives about these steps and how they affect Instructor preparation. Although the decision at this time was to maintain the current checkout process, there were many ideas raised about how we can change this process in the future to better align with the needs of new Instructors. The three topics raised for discussion were: Removing the requirement for trainees to submit a lesson contribution. This was brought to the Maintainers and Trainers groups for discussion. Many voiced concerns that, without this requirement, new Instructors would not be prepared to contribute to lessons in the future. Other options to require trainees to use GitHub without increasing Maintainer workload were discussed. The decision was to make no changes to this requirement at this time, but to clearly communicate to trainees that rather than creating new issues or putting in unsolicited PRs, they can help by contributing to existing issues, reviewing existing PRs, and putting in PRs for requested issues. The Trainers group will work to better communicate this with new trainees. On the Maintainers side, there is work ongoing to update issue labels to help guide contributions. Removing the requirement for trainees to participate in an instructor discussion before becoming Certified. This was brought to the Mentoring Subcommittee and the Trainers group for discussion. In both groups, people expressed concern that these discussions were necessary to prepare new Instructors to teach. The decision was not to change this requirement at this time, but to continue exploring other opportunities to provide mentorship for new Instructors. Removing the requirement that trainees must complete their teaching demo with a Trainer who did not teach their instructor training. This policy was intended to avoid conflicts of interest by requiring that new Instructors were approved by Trainers outside of their institutions, however, it inadvertently disadvantaged new Instructors in geographic areas with fewer Trainers. The Trainers group passed this change with a vote of 22:1 with 1 abstaining. Trainers are still encouraged to identify any potential conflicts of interest. To summarize, although all three steps of the checkout process will remain the same for the time being (with the minor change that trainees will now be able to schedule their teaching demonstration with any Trainer), there have been many good ideas generated during this discussion process that will help us as we plan future revisions to continue to meet the needs of our community. If you’re interested in learning more about these conversations, read: minutes of the Trainer meeting minutes from the Mentoring Subcommittee meeting discussion on the Maintainers list discussion on the Trainers list vote summary Preparing new Instructors is an important job that is shared across our community. There are many ways you can be involved! Sign up to lead discussions (If you’re not sure how, see this handy checklist. Apply to become a Maintainer If you’re not ready to commit to being a Maintainer, help out informally by reviewing PRs and commenting on issues for lessons that you teach. Your help is definitely appreciated! Read More ›

Library Carpentry Governance - An Update
Belinda Weaver / 2018-02-20
Eight of us from Australia, the UK, the US, Canada, and South Africa met last week via zoom to discuss taking the Library Carpentry project forward. We discussed the creation of a new website to replace this one, talked about what constitutes a Library Carpentry workshop, discussed who can teach the material, and and started thinking about what kind of governance structure we need to lead the work from now on. There was an update on the status of the IMLS-funded Library Carpentry Coordinator role to be based at CDL. The website Richard Vankoningsveld has done a great draft of a new website. Can people please have a look at it and raise issues that need fixing on the repo? We would be very grateful for comments and feedback - please file as many issues as you need to address any problems or omissions. If you are interested in working on the repo or having rights to edit, please ask there by raising an issue. One day or two day workshops? While Software and Data Carpentry workshops run for two days, this can be problematic for librarians. Many librarians struggle to be granted time out for training, so two consecutive days off might be too big an ask. Accordingly, a one-day format might be the best way forward for now. If people want a two-day format, then the workshop could be split, e.g. over four mornings, or single days in separate weeks. What is a workshop? For a workshop to be badged Library Carpentry, the community needs to establish what the core curriculum is. There is a statement to that effect under What is a workshop? on the old website. This group will revisit that in the coming weeks. Belinda suggested that the core curriculum could be Intro to Data (which includes regular expressions), Shell and OpenRefine. That material can be taught within a (fairly intense) day. People can then add extras from the other modules under development if they want to have a two-day format, or they can stretch the core curriculum to be taught over two days, which would give more time for people to practise and embed what they have learned. Who is an instructor? The question of Who is an instructor? has been addressed on the old website, with general agreement that instructors should have completed the Carpentries instructor training program. Since we eventually plan to merge with the Carpentries, this makes sense for the long-term sustainability of Library Carpentry. However, we do not want to mandate that certification for now as we do not have enough trained instructors. Through the IMLS-funded project with the California Digital Libraries, The Carpentries have already committed to running two open instructor trainings for librarians this year. Though this was not raised at the meeting, we are now developing an onboarding process for certified Data and Software Carpentry instructors who would like to teach Library Carpentry. Have your say in what we should include. Governance Belinda suggested that at the minimum we need a chair, a co-chair, and a secretary, and suggested she could be the liaison between the eventual governance group and the Carpentries. Tim Dennis and Juliane Schneidler have put together this document for discussion about the role of a governance group. Greg Wilson proposed we review that document between now and May, with a view to electing an interim governance group when we meet at CarpentryCon 2018 in Dublin. Please add your thoughts by raising an issue on the repo. Workshop requests The Carpentries are now taking responsibility for organising Library Carpentry workshops. Use this form to request a Library Carpentry workshop. Library Carpentry Position based at CDL Interviews are ongoing for this position. It was readvertised when no appointment was made from the first round. We hope that there will be an appointment soon. The person in the role will work jointly with CDL and the Carpentries, and will look at finalising lessons, and building a network to grow workshop and instructor numbers in the US and more globally. Summing Up All in all, this was a very productive meeting. This group plans to meet monthly to discuss business as we move forward to a merger with the Carpentries. Post-Dublin, we should have a governance group that, with the help of the CDL-based position, can take forward the work of finalising lessons, defining core curriculum, extending workshops, and boosting instructor numbers. Having that group will also put us in a position to seek additional funding from library associations and related groups who might be interested in supporting our work. It is an exciting time to be involved with Library Carpentry! Come and say hi in the chatroom and start your Library Carpentry journey. You can watch our Twitter feed for news too. Greg Wilson suggested we all get hold of Building Powerful Community Organizations by Michael Jacoby Brown and discuss it over the coming months. Read More ›

Mentoring Groups Showcase their Accomplishments
Kari L. Jordan / 2018-02-19
We just finished our second round of mentoring groups and had an amazing showcase of their work and ideas. In this round, we were more specific and focused on multiple topic areas. There were groups on community building, lesson maintenance, and preparing for instructor checkout. The feedback and outcomes were great! Participants were able to focus on specific goals, including teaching their first workshop and developing new lesson contribution material. Being part of a group that addressed something important to them, (e.g. developing new communities in Japan and Singapore), made the mentoring groups powerful and enjoyable. Read about what community members have accomplished in these mentoring groups, find out how to get involved, or give feedback on how mentoring would be useful to you! The second round of the Carpentries mentoring groups began on 25 October, 2017. Goals of the revised mentoring groups were to offer curriculum-specific mentoring, and encourage groups to focus their efforts on lesson maintenance, teaching, organizing workshops, or building local communities. If you missed the wrap-up of the first round of mentoring, check out this blog post. Over a period of four months, 20 mentors and 39 mentees (a total 14 groups) representing eight time zones met either in-person or virtually to accomplish specific goals. Kari Jordan hosted a training session on 9 November, 2017 to help mentors prepare for their first meeting, and to discuss goal setting. On 28 November, 2017, mentors participated in a “power check-in” to discuss issues and any concerns they were having with their groups. These were mostly scheduling-related as we were nearing the holiday season. Results from the mid-program survey showed that several groups were working on projects to build local communities, and several group members were preparing to teach specific Carpentries lessons. Participants identified several resources that would improve their experience, such as a dedicated Slack channel, and more time to work with their groups. A mentoring Slack channel was created, and the program was extended from 10 January, 2018 to 6 February, 2018. The culmination of this mentoring period was the mentoring groups’ virtual showcase, which took place on 6 February, 2018. Two showcases to accommodate multiple time zones hosted a total of 25 attendees. During this time, mentoring group representatives presented slides showcasing either what they learned, or something cool they developed during their mentoring period. A lively discussion took place on the etherpad, and several resources were added to the mentoring-groups repo on GitHub. Here are a few highlights from the showcase: Kayleigh taught her first workshop as a qualified instructor at the first ever Library Carpentry workshop in Ethiopia. Katrin completed the check-out process and onboarded as an r-novice-inflammation lesson Maintainer. One of the African mentoring groups emphasized the value of community in helping to get workshops organized. Saran was able to connect with some regional Carpentries advocates in North Texas, and he will be attending his first workshop as a learner in the next few months! This allowed Saran’s mentor, Jamie, to reconnect with the North Texas instructors she had previously co-instructed a workshop with. They opened a discussion about their burgeoning Carpentries efforts at their respective institutions. A local data science community was started at the University of Konstanz, Germany. Robin got to live-demo an R lesson. Blake’s workshop ran in January and got snowed in! One group developed a program plan for the mentees in the group based on their goals and interests. Check out this sample plan. Chris was able to sign up for instructor training and reconnect with a community of Carpentries folk at Vanderbilt. Simon found out who to get in touch with, and will be an instructor at a Stanford workshop in March. Toby contributed improvements to mentoring material on GitHub. One group drafted step-by-step instructions for beginners to contribute to lessons using the terminal or a web browser. Malvika contributed to the CarpentryCon taskforce and shared ideas with Kari for the next mentoring round. One group used the evolving community ‘cookbook’ to plan activities in Japan and Singapore. Did you miss the showcase? Check out the recording from the second showcase! Why should you participate in mentoring? Both mentors and mentees received certificates for participating in their groups, and several group members plan to continue working together beyond this round of mentoring. Mentoring group participants were asked to tell the community why they should participate in mentoring. Here is what they said: It gives you a direct and personal channel for questions and support. You get to know other Carpentries colleagues from across the world at differing levels of experience. You accomplish goals you probably wouldn’t have accomplished otherwise. You learn new things and gain new perspectives. You meet more community members. It speeds up instructor checkout, and brings forward the first teaching experience at a workshop. You become more confident contributors and practice PRs on lessons before submitting them to the main repo. It’s very rewarding to help people with SWC material in an in-depth, one-on-one setting. You never know what connections you will make! You gain community connections and support to grow our collective abilities. You get advice on organising workshops. You get help when starting a community from scratch. It’s a great opportunity to learn more about Carpentries programs and to make connections with current instructors. You receive positive feedback for running a workshop. Mentoring group meetup in Germany. Photo credit: G Zeller (EMBL Bio-IT) Where do we go from here? The post-mentoring survey results showed that the major concerns during this period of mentoring were finding a schedule that suited everyone. Additionally, several participants suggested that a longer duration would be useful. Lastly, there were recommendations for open selection of mentoring groups. As a result of the feedback from this round of mentoring, and discussions among the mentoring sub-committee, we are in the process of developing the instructor discussion sessions so that they include ongoing mentoring for new instructors and experienced community members. Look for the next round of mentoring to begin this April! In the meantime, get involved with mentoring by requesting to join the mentoring Slack channel and/or attending the next Mentoring Sub-committee meeting. Are these things that would help you, or keep you engaged with the Carpentries? Tweet us your thoughts (@datacarpentry, @swcarpentry, @thecarpentries, @drkariljordan) using the hashtag #carpentriesmentoring. Read More ›

My Favorite Tool - Docker
Mark Woodbridge / 2018-02-15
What kind of tool is it? Docker is a virtualization tool. Why I like it I use Docker every day - it’s the one piece of software that has changed the way I work in recent years. There are many other approaches to virtualisation but its versatility and ubiquity is compelling. I use it for many purposes - for rapidly obtaining and executing other open source software, for providing a predictable environment in which to test code that I’ve written, and for building and deploying my own and third-party applications. We use it in the RSE team at Imperial College to ensure that others can rapidly install and evaluate our work. How does the tool help you in your work? It makes the implicit explicit. Software provided with a Dockerfile is self-describing in the sense that you know the environment in which it expects to be run, i.e. the operating system and dependencies. Alongside versioned code and data this provides reproducibility - which is vital for research software. The ephemeral nature of containers also encourages immutability, simplifying deployment and maintenance. For further information, I recommend the article “An introduction to Docker for reproducible research” https://doi.org/10.1145/2723872.2723882. What do you wish someone had told you when you first started learning how to use this tool? Containers, as provided by Docker (or Singularity), shouldn’t be confused with virtual machines. Both can help with portability but containers should have a single purpose and package a discrete tool or application. This ensures that they can be independently versioned and re-used. – Mark Woodbridge, Research Software Engineering Team Lead, Imperial College, London Have you got a favourite tool you would like to tell us about? Please use this form to add a bit of detail and we will do the rest. You can read the background to these posts here. Read More ›

Valerie Aurora to Keynote at CarpentryCon 2018
Belinda Weaver / 2018-02-14
The Carpentries are excited to announce that Valerie Aurora will be one of four keynote speakers at this May’s CarpentryCon in Dublin. Valerie is a software engineer turned diversity and inclusion consultant. We want CarpentryCon 2018 to be a truly global, diverse and inclusive event, which is why we are so happy that Valerie has accepted our offer to speak there. Valerie founded Frame Shift Consulting, which helps technology organizations build in-house expertise and leadership in diversity and inclusion. Members of our community may know her as the creator and facilitator of Ally Skills Workshops, which teach simple, everyday ways for people who have more power and influence to support people with less. Valerie has taught thousands of people these skills. In addition to keynoting, Valerie has offered to teach an Ally Skills Workshop at CarpentryCon. I am sure many of our community will scramble to attend that. She was a co-founder of the Ada Initiative, which, between 2011 and 2015, supported women in open technology and culture by producing codes of conduct and anti-harassment policies, advocating for gender diversity and teaching ally skills. Valerie also helped establish Double Union a non-profit which supports women and non-binary people in technology and the arts. She previously worked for more than 10 years as a Linux kernel and file systems developer at Red Hat, IBM, Intel, and other technology companies. Valerie is on Twitter. You can register here for CarpentryCon 2018. Read More ›

Making the Case for CarpentryCon
Belinda Weaver / 2018-02-08
Getting a PhD is, was or will be a major career milestone for many in our community. Yet getting a PhD, momentous as that achievement is, is really just the first step on the career ladder. Employers increasingly want other skills as well. In all kinds of surveys, the same key skills keep popping up: Communication skills. Teamwork. Problem solving. Ethical and intercultural understanding. Leadership. Critical thinking. The wording in job ads may vary but what it boils down to is this: employers want people who can lead, and they want people who can collaborate. Yet few academic conferences focus on building those kinds of skills, concentrating instead on advances within a discipline. Attending conferences may help people develop skills in, say, speaking and presenting, but that is generally a by-product. What CarpentryCon will offer CarpentryCon will be different. Its main aim is to help people develop professional skills to drive their careers forward. We want to teach the next generation of research leaders not just technical skills – though these are becoming increasingly important – but skills such as promoting diversity and inclusion, project and lab leadership, working ‘open’, and speaking effectively, as our program demonstrates. And don’t forget teaching! CarpentryCon will have a strong focus on teaching. Whether you eventually work in academia or industry, teaching experience is a very desirable attribute. CarpentryCon will provide opportunities for Instructors and Trainers, both current and aspiring, to discuss and learn more about teaching practices, and to connect with people interested in improving the way computational and data wrangling skills are taught. Many universities claim to turn out graduates with a desirable set of attributes, yet few offer concrete opportunities for students to acquire those skills. CarpentryCon will. Find out if your university is really serious about equipping you for your future career. Ask them to send you to CarpentryCon. Registrations are open now. Can we help you get to CarpentryCon? For those whose pleas fall on unsympathetic ears, we hope to be able to offer some level of travel subsidy. However, we have not yet finalised the budget for that. If you buy a ticket and then find you are unable to afford to come, you can get a refund. If you buy a ticket now, that will not disqualify you from later applying for travel assistance once we open applications for subsidies. We will post more information on all this once we finalise our arrangements. Read More ›

Unveiling Our New Logo
Belinda Weaver / 2018-02-07
It must be the worst-kept secret in cyberspace - that the Carpentries have a new logo. Now that Software and Data Carpentry have merged, we wanted a new logo to celebrate our coming together as the Carpentries, and to give that project its own distinct identity. The new logo retains a ‘Carpentry’ feel - at the basic level, it represents a wrench around a hexagonal bolt. Yet it also conveys a sense of exhilaration and celebration - that magic moment when you ‘get’ something and your arms shoot up in celebration. There are many such ‘aha!’ moments in Carpentries’ workshops, so it is fitting that our logo represent not just the hard work of learning (the wrench) but the satisfaction of achievement and mastery that we gain (the ‘Yay!’). The same, but different While we have a new logo, and one that we like very much, as far as our community goes, much of what we do will seem unchanged. As The Carpentries, we will continue to teach foundational computational and data skills to researchers. We will continue to observe and evolve our Code of Conduct. We will continue to grow our memberships, and we will continue to mint new instructors through our Instructor Training program. The individual ‘Carpentries’ will remain as distinct lesson organizations, and we plan to communicate more as the year goes on about how these projects are evolving. The Software and Data Carpentry logos will remain the same, with The Carpentries an umbrella under which they come together. Some things are different. Tracy Teal is now our Executive Director, our two staffs have merged with some reshuffling of roles, and we are working as The Carpentries with a new fiscal sponsor, Community Initiatives. Our governance has merged - from having two separate Steering Committees, we now have a brand new Executive Council. These changes should only enhance what we do by streamlining communications and making our working practices more efficient. We will still support the growth and spread of our community - that will never change. CarpentryCon 2018 in Dublin will be a celebration of just how far we have come as a community. We hope to see you there. Keep an eye out for our new website soon! Read More ›

Carpentry Champions Call
Jonah Duckles / 2018-02-07
At the 14 February Carpentry Champions call we’ll be talking about the development of a new resource called the Carpentries Cookbook. This is an open contribution, community-developed document to help members and supporters share strategies to build local Carpentries communities. We’ve begun this work here. The Champions call is scheduled for 8pm UTC, 14 February, 2018. The meeting agenda, connection details (via Zoom) and sign up are on the etherpad, and you can check the local time and date for you. In preparation I’ve created a few issues to help us gather some feedback about what tools, events, and practices have helped you grow your local community. I’d appreciate contributions and feedback in advance of the meeting to the prompts I’ve created here. If you have other prompts or suggested chapters, please add them as new Issues on the repo. The repo will feed the website which we hope will become the go-to source for learning about building communities of practice around digital skills. As always, we’re an open community and we’re always keen to have new participants and contributors. So please invite anyone you think would be interested along. This quarterly meeting will focus on building tools and resources for supporting new community members. I look forward to seeing you there! Read More ›

Carpentries Transition From Fiscally Sponsored Project to NumFOCUS Community Alliance Member
Tracy Teal, Gina Helfrich / 2018-01-30
Software Carpentry and Data Carpentry have combined their separate projects into a new project, now known as The Carpentries. As part of this transition, Software Carpentry and Data Carpentry are moving from Fiscally Sponsored Projects with NumFOCUS to The Carpentries with Community Initiatives, whose fiscal sponsorship administration services are better aligned with our emerging needs. The Carpentries looks forward to new opportunities with NumFOCUS and will continue to participate in the NumFOCUS Community as a new member of the NumFOCUS Community Alliance. As a Community Alliance member, we will be one of the organizations whose mission intersects with that of NumFOCUS and reflects support for open source scientific computing. NumFOCUS cross-promotes activities and events held by members of their Community Alliance in a reciprocal, supportive relationship. In particular, both organizations share a commitment to increasing diversity and inclusion in the data community. Software Carpentry joined NumFOCUS as a fiscally sponsored project in 2014, and Data Carpentry joined NumFOCUS as a fiscally sponsored project in 2015. Over the ensuing years both Carpentries worked closely together and in early 2017 began discussions about a merger. The merger was approved last summer, and this January marks the milestone of a fully merged project, The Carpentries, with a newly-elected Executive Council. We are very grateful for the support of NumFOCUS through the initiation, development and growth of Software and Data Carpentry, and for their continued support through this transition to The Carpentries with Community Initiatives. We look forward to continuing to work closely with NumFOCUS as a member of their Community Alliance, to promote the growth and development of the open source scientific computing community. Read More ›

Be our Advocate
Belinda Weaver / 2018-01-24
Dear members of our community, we need your help. Most of you probably know that our inaugural CarpentryCon will take place from 30 May - 1 June, 2018 at University College Dublin. We would now like to ask for your help in helping us find sponsors to support this event. We have written blog posts that lay out the case for support from commercial enterprises as well as from support from foundations and other grant-making bodies. If you work for an organization, please consider advocating for CarpentryCon 2018 sponsorship within that organization. We have laid out the arguments for you in the two linked posts above. If you would prefer us to make the approach, that is fine. Please just email me the contact details of the person I should approach and I will do the rest. The more sponsorship money we have, the more we can open the event up: we can provide travel scholarships to facilitate attendance by people who live a long way away we can rebate registrations for people suffering financial hardship we can offer greater support for people travelling with their families we can offer a broader range of workshops We want CarpentryCon 2018 to be a really diverse and inclusive event that fosters a sense of belonging to a wide global community. Help us make CarpentryCon 2018 a fantastic event! – Belinda Weaver, Community Development Lead Read More ›

Foundations and Funders: Why Should You Sponsor CarpentryCon 2018?
Belinda Weaver / 2018-01-23
CarpentryCon 2018 planning is now well underway. We have chosen a venue and a date, identified keynote speakers, and roughed out a program, but we are still missing one vital piece: SPONSORS! Why sponsor CarpentryCon? CarpentryCon 2018 will teach the practical skills people need to lead 21st century research within academia and industry. By supporting us, you will help develop that next generation of research leaders, people who will attend CarpentryCon to develop the skills needed to take research forward. Without the generosity of sponsors who share our vision, we will not be able to deliver the event our research community needs. Your support can help us keep registration costs low so people aren’t priced out of attending maximise participation by people from diverse geographies and communities, including the global south provide travel scholarships to allow as wide an audience as possible to attend These are the outcomes your support can help us deliver: Improving researchers’ knowledge and skills so as to enhance the quality and practice of scientific computing and data science Helping develop the diverse and skilled workforce research and industry need to innovate Fostering networking so researchers can collaboratively find solutions to the problems they need to solve Providing networking opportunities for our Instructor pool to learn from each other and extend our reach Extending the Carpentries’ community to new countries and disciplines Fostering best practice around open, reproducible science, which benefits everyone Diversifying the range of voices heard there by supporting attendance from underserved communities What is CarpentryCon all about? CarpentryCon 2018 aims to skill up the next generation of research leaders. This might mean learning advanced R or Python, skilling up on High Performance Computing, or figuring out how to lead a research lab or a big project. Sessions will be hands-on, and attendees will leave with practical skills they can immediately use in their research or careers. By running this event, we hope to improve people’s knowledge and skills so as to enhance research, research outcomes and research productivity across the board. We also want to do more. We want to widen the number of voices who can be heard there. We want to provide opportunities for people to attend from diverse geographies and communities, including the global south. We plan to hold workshops and skill-ups not just on tools and skills but on fostering attitudes of openness, diversity, inclusivity and reproducibility, because those are just as essential as technical skills ato research success. We hope your organisation will see the worth of this landmark event in the history of a rapidly growing project dedicated to skilling up a diverse and inclusive global research community If this has convinced you to find out more about this first-time event, you can express interest or start a conversation here. Who or what are the Carpentries? CarpentryCon 2018 is a joint initiative of Software and Data Carpentry (The Carpentries), a non-profit project that teaches foundational computational and data science skills to researchers. Within the worldwide research community, the Carpentries have great ‘brand’ recognition and many supporters, including a large number of member organisations, both in academia and industry, who underpin our work financially. Read More ›

Why Should You Sponsor CarpentryCon 2018?
Belinda Weaver / 2018-01-18
The inaugural CarpentryCon 2018 will take place on 30 May-1 June at University College Dublin. Planning is now well underway. We have chosen a venue and a date, identified keynote speakers, and roughed out a program, but we are still missing one vital piece: SPONSORS! Why sponsor CarpentryCon? CarpentryCon 2018 will teach the practical skills people need to lead 21st century research within academia and industry. Without the generosity of sponsors who support and share our vision, we will not be able to deliver the event our community needs, one that can deliver real benefits to your industry. Help us keep registration costs low so people aren’t priced out of attending Help us maximise participation by people from diverse geographies and communities, including the global South Help us provide travel scholarships to allow as wide an audience as possible to attend. By supporting us, you will help skill up the next generation of research leaders, people who will attend CarpentryCon to develop the skills needed to lead 21st century research, whether that be in academia or industry. In return for your support, we can promise you a number of benefits. See the full listing of sponsorship opportunities or Express interest/start a conversation through this form What benefits can we offer you? Marketing Your name, your product, your service can all be part of our event marketing. Get your logo on our website, our T-shirt, our poster, our conference slides, and on all our promotional materials, including social media and email channels. Leading sponsors will get conference time to air their messages, and will be able to send emails to delegates. Expo We will provide exhibition space to allow you to showcase your offerings and network with delegates during the three days of CarpentryCon. You can demo products or services, give away swag, and start some great conversations. Our audience CarpentryCon’s diverse audience will include graduate students, early career researchers, senior academics, lab and project leaders, software engineers, people in tech and other industries, and more. We expect people to come from all round the world to attend this signature event. Finding people to join your company Many graduate students and early career researchers make the jump from academia to industry. What better place to find your next staffer than at CarpentryCon where people have come along to grow and sharpen a diverse range of skills? Meeting your needs Not sure any of our packages work for you? We will tailor a sponsorship package that exactly fits your needs. All you have to do is ask! Contact Belinda Weaver or SherAaron Hurt to discuss your options. What is CarpentryCon all about? CarpentryCon 2018 will skill up the next generation of research leaders. This might mean learning advanced R or Python, skilling up on High Performance Computing, or figuring out how to lead a lab or a big research project. Sessions will be hands-on, and attendees will leave with practical skills they can immediately use in their research or careers. The Carpentries teach introductory material - enough to get people started with coding and data science tools. But to innovate, researchers need more. Our people want to hear about tools or services that can take their research to the next level. Who or what are the Carpentries? CarpentryCon 2018 is a joint initiative of The Carpentries, the joint project of Software Carpentry and Data Carpentry. A non-profit project that teaches researchers foundational computational and data science skills, The Carpentries have taught more than 34,000 researchers worldwide. Within the worldwide research community, The Carpentries have great ‘brand’ recognition and many supporters, including a large number of member organisations, both in academia and industry, who underpin our work financially. We hope CarpentryCon 2018 will spread our message further and wider than ever before. Be part of that message. Read More ›

Workshop Template Enhancement Proposal
Raniere Silva / 2018-01-16
Over the last couple of years, many emails and issues on GitHub requested moving the setup instructions in the Workshop Template to a separate page for valid reasons. To provide some reference, see swcarpentry/DEPRECATED-bc#415, swcarpentry/DEPRECATED-bc#729, swcarpentry/workshop-template#194 and swcarpentry/workshop-template#408. What did we discover during those years and trials? Finding the balance to accommodate long time and advanced instructors, novice or intermediate instructors and learners is as hard as carrying a watermelon with a tea spoon. So why are we having this discussion again? Last year, we changed the Workshop Template by adding some ‘if-clauses’ so that Software Carpentry, Data Carpentry and Library Carpentry instructors could share the same template, see workshop-template#393. In part because of minor differences in how Software Carpentry, Data Carpentry and Library Carpentry lessons and workshops are structured, the ‘if-clauses’ weren’t enough for Data Carpentry and Library Carpentry to use the Workshop Template. With plans to create more lessons in the short future under the umbrella called “The Carpentries”, the urge to move the setup instructions has reappeared and we need to nail it. I put a prototype in place, see swcarpentry/workshop-template#459, that demonstrates how we can require the lead instructor to only list within the Workshop Template index.html’s YAML header the lessons that will be used during the workshop and let some Javascript code fetch the latest installation instructions for those lessons. The proposal idea has as advantages: reduction in the number of lines that need to be edited in the workshop page, that is, 3-5 lines in the YAML header instead of many open/close comments; reduction of confusion where the files with instructions to install a given software can be found, as all instructions are in the lesson that will be taught; avoid the ugly and hard to deal with <iframe> element, <iframe> size can’t be responsive; reduction in the amount of time to propagate some important changes, e.g. that learners can’t use Firefox Quantum because the SQLite3 add-on isn’t compatible. Some drawbacks will happen: one-time customizations will require more work, basically to clone a repository to customize the installation instructions; instability due to client-side Javascript code, it might not work in 5-years from now; and increase of page load/rendering time (I’m talking about much less than 1 second). The drawbacks are minimal compared with the advantages. By “forcing” instructors to clone the lessons they will teach, we will reduce the problem of instructors and learners being surprised with some changes in the lesson during their pre-workshop night of sleep. And the installation instructions will be out of date in less than 5 years so Javascript stopping working can even be something positive because the reader will do another search for more updated instructions. For this to work, all lessons will need to improve the Setup page that they already have by mainly copying and pasting part of the current content in the Workshop Template page, for example, see this change to the Git lesson and this change to the Python lesson. This change offers the opportunity to require learners to install any specific package or to download any set of files that will be used. In addition to it, some things can be improved in the Javascript code to provide a better user experience. This post is only the start of the conversation. All comments are welcome in swcarpentry/workshop-template#459. No change will be made without a technical and rolling plan consensus. Read More ›

My Favorite Tool - Midnight Commander
Colin Sauze / 2018-01-14
What kind of tool is it? A text editor and file manager. Why I like it My favourite tool is Midnight Commander and its text editor (which also works as a standalone tool) Mcedit. These two utilities are powerful yet easy to use. They both run on the command line in Unix operating systems, but instead of typing commands, they work mostly by using menus and selecting things with arrow keys or a mouse. This makes them a kind of halfway house between a traditional command line and a GUI utility and in my opinion they bring together the best of both worlds. Mcedit I’ve always hated the Vi vs Emacs holy war that many Unix users like to wage and I find that both editors have serious shortcomings and definitely aren’t something I’d recommend a beginner use. Pico and Nano are certainly easier to use, but they always a feel a bit lacking in features and clunky to me. Mcedit runs from the command line but has a colourful GUI-like interface, you can use the mouse if you want, but I generally don’t. If you’re old enough to have used DOS, then it’s very reminiscent of the “edit” text editor that was built into MS-DOS 5 and 6, except it’s full of powerful features that still make it a good choice in 2018. It has a nice intuitive interface based around the F keys on the keyboard and a pull-down menu which can be accessed by pressing F9. It’s really easy to use and you’re told about all the most important key combinations on screen and the rest can all be discovered from the menus. I find this far nicer than Vi or Emacs where I have to constantly look up key combinations or press a key by mistake and then have the dreaded “what did I just press and what did it do?” thought. Underneath it’s got lots of powerful features like syntax highlighting, bracket matching, regular expression search and replace, and spell checking. I use Mcedit for most of my day-to-day text editing, although I do switch to heavier weight GUI-based editors when I need to edit lots of files at once. I just wish more people knew about it and then it might be installed by default on more of the shared systems and HPCs that I have to use! Midnight Commander (MC) The Midnight Commander or MC file manager is usually bundled with Mcedit and it makes navigating through filesystems very easy. It’s based on an old DOS utility called Norton Commander. Like Mcedit, it runs on the command line but presents a GUI-like interface. You get shown two panes with two directory listings, one on the left and one of the right side of the screen, you can set these to be any directory on your system. Anyone who uses the Filezilla or WinSCP utilities to copy files will find this layout very familiar. Using a set of F keys, you can copy files, move/rename files, make directories or delete files very easily. But it also lets you type in any Unix command you like and have it executed in the directory you’ve selected. In addition to showing local files, it can also connect to a remote file server, FTP server or SFTP server and show you the files on there or open a .tar archive, zip file or compressed .tar file. Mcedit can be brought up to edit any file quickly by pressing the F4 key or a simpler file viewer can be launched by pressing F3. External file viewers such as image viewers can be spawned just by pressing Enter when a file is selected. I find Midnight Commander makes looking around directories on my system much quicker and easier, it’s great for trying to clean up a disk when you’re running out of space and it’s often just quicker and easier than using the Unix command line. One particular use case I find it shines at is copying a subset of files from one directory to another. This is because it lets you select a few files by a given pattern and then select a few more with another pattern and add them to the list of files to copy. I’ll often make 5+ selections using this and then copy (or delete) all the files at once. The file copying screen is great too as it shows an accurate progress bar, time estimate and data rate, this is especially useful for copying large data files or copying stuff over a network. How does the tool help you in your work? It lets me work faster, it gives me features often only found in GUI tools on the command line and combines the best of working on a command line and in a GUI. This is especially useful when logging into remote systems with SSH where only a command line is readily available. What do you wish someone had told you when you first started learning how to use this tool? I’m not really sure there was anything, I found it really intuitive from the moment I started using it and figured out most of the features pretty quickly. – Colin Sauze, Research Software Engineer, Aberystwyth, Wales, UK Have you got a favourite tool you would like to tell us about? Please use this form to add a bit of detail and we will do the rest. You can read the background to these posts here. Read More ›

How do instructors get placed at workshops?
Maneesha Sane / 2018-01-14
If you are a badged Carpentries Instructor, you have most likely received emails about upcoming workshops that need Instructors. After the emails go out, we sometimes get questions about how Instructors are actually chosen and placed, so we want to share our methods with you. First, it is not first come, first served. That would not be fair, as that just shows who was lucky enough to be online and able to check their calendars at the moment my email went out. Rather, we give it a few days, or sometimes even a couple of weeks, depending on how quickly the workshop may be coming up. Once we have a number of Instructors expressing interest, we look at a few things: What other workshops have you signed up for in that time period? If you’ve volunteered to teach three workshops over the next few months, we likely won’t place you at all of them. However, we will take advantage of your flexibility to get you in on at least one. How much experience do you have with the Carpentries? All our workshops are taught by two Instructors. We look over the list of participants and match up an experienced Instructor with a newer Instructor. As our Instructor pool grows, we want to make sure everyone has the chance to teach with us. We have many Instructors whose workshop tally is well into the double digits. As excited as we are to see that level of dedication, we also want to make sure other Instructors have a chance to get real teaching experience. So we may ask our most experienced Instructors to step aside so others can reach that point. What Carpentries’ lessons are you comfortable teaching? In the call for Instructors, we always note the workshop content. While the two Instructors will divide up teaching responsibilities for the different lesson modules, we do want to make sure each Instructor is comfortable enough with all the content to be a helper while their co-Instructor is leading a lesson. Even better is when Instructors can fill in for each other in case of an emergency, like illness or a missed flight. What is your own academic background? The Carpentries’ model is one of practicing scientists and researchers teaching other practicing scientists and researchers. Regardless of your field of research, your passion for the tools we teach will always come through. When possible, however, we try to match up the research fields of Instructors and the sites hosting the workshop. That way, Instructors get opportunities to network in their fields, and learners get to hear from people working in those same fields. How close are you geographically to the host site? We know that many workshops offer a chance for Instructors to travel and make connections in new places. However, there are times we prioritize local Instructors. This is partly to be sensitive to host sites’ budgets but also because it helps build local communities as the hosts and Instructors get to know each other. We cannot guarantee that host sites can get local Instructors but we do take this into consideration. Read More ›

The Centrality of the Code of Conduct
Belinda Weaver / 2018-01-10
This is the first in a series of posts about Carpentries’ teaching practices. Subsequent posts will cover the other practices - live coding, sticky notes, helpers, challenges, etherpads - that make Carpentries’ workshops the success that they are. I gave a talk recently for the Australian National Data Service on ‘teaching the Carpentries way’. Originally I planned to cover six reasons why our workshops are effective, but ended up covering thirteen, with the thirteenth being the Carpentries’ Code of Conduct. I left the Code till last because it is probably the most important. Unless people observe the Code of Conduct at workshops, all our other positive teaching practices can count for nothing. Among other things, the Code of Conduct states: We are committed to creating a friendly and respectful place for learning, teaching and contributing. All participants in our events and communications are expected to show respect and courtesy to others. Instructors introduce the Code of Conduct at the start of workshops for a reason. As a community that values diversity and inclusivity, a community dedicated to providing a welcoming and supportive environment for all people regardless of background or identity, the Code sits at the very heart of everything we do. If someone breaches the Code in a workshop, the Instructor is empowered to warn that person and, if need be, to have that person removed from the workshop. We also encourage Instructors to report the behaviour to us. We have developed a manual on how to enforce the Code. Harassment is unacceptable, as the Code clearly states: Harassment is any form of behaviour intended to exclude, intimidate, or cause discomfort. Because we are a diverse community, we may have different ways of communicating and of understanding the intent behind actions. Therefore we have chosen to prohibit certain forms of behaviour in our community, regardless of intent. Read more. The Code helps people feel safe, which assists their learning. It also makes our workshops accessible to people who might otherwise be marginalised. While not as serious as religious, sexual or racial vilification, or the other behaviours we prohibit, there are still many off-putting things that people at workshops can do. If learners are worried about being mocked, talked over, treated with sarcasm, condescended to, or made to feel small or stupid for any reason, their enjoyment of the workshop will be diminished, if not extinguished altogether. In those situations, rather than take the offending person on, some people simply prefer to give up on the workshop, thus losing their opportunity to pick up vital skills. If they choose to stay, the offence will still take up valuable room in their minds, leaving much less space for learning. It is therefore up to the Instructors to set the workshop tone. If someone is endlessly parading their knowledge, or hogging workshop time to show off, then the Instructors must try to rein that person in. Your learners will be grateful, and they will also feel you are ‘walking the talk’, not just paying lip service to an ideal. An attendee at a workshop I taught last year wrote on a feedback sticky: “Nice that there are talking rules”. The sticky included a smiley face. A meaningful Code of Conduct makes the workshop better for everyone. However, it is not only in our workshops that the Code of Conduct applies. We want all interactions within our community to be underpinned by the Code, whether it be contributions to email lists such as Discuss (info on joining all our lists appears on this page), responses to tweets or Facebook postings, discussions about issues raised on GitHub repositories, or contributions to our Slack channel. As we move forward to the merged Carpentries, it is timely to remind people why we value our the Code of Conduct. The Code is central to our efforts to build a welcoming, diverse, inclusive global community. Read More ›

Teaching Statistics in the 21st Century
Greg Wilson / 2018-01-09
The late 1980s saw a wave of new undergraduate programs launched in computational physics, as the advent of affordable workstations and PCs made the power to compute and simulate more accessible. A decade later, though, many of those programs had drastically scaled back their ambitions or quietly wound down. The problem wasn’t the programming: the problem was that whenever a curriculum is designed as “X plus some Y”, it’s the Y that gets cut when time runs short, budgets are squeezed, or tough hiring decisions need to be made. “Computational physics” became “the physics we’ve always taught, but with assignments on computers” and then just “the physics we’ve always taught”. That experience is part of why I’m so excited by things like Daniel Kaplan’s 2017 paper “Teaching stats for data science”, which is a great example of how some faculty are re-thinking pedagogical approaches from the ground up. Kaplan argues that much of what we currently teach in introductory stats courses is left over from a time when data was scarce and calculation was hard. In its place, he advocates a ten-step calculation-first approach: Data tables Data graphics Model functions Model training Effect size and covariates Displays of distributions Bootstrap replication Prediction error Comparing models Generalization and causality UBC’s Stat 545 course is another great example of how people are not just putting old wine in new bottles, but approaching their subject from an entirely new angles. If you have any favorite examples, please add them to the comments–I’m sure our community would enjoy hearing about them. Update: several people have pointed us at the following: as other examples of curriculum being re-thought from the ground up: Ruth Anderson, Michael Ernst, Robert Ordóñez, Paul Pham, and Ben Tribelhorn: A Data Programming CS1 Course. George Cobb: The Introductory Statistics Course: a Ptolemaic Curriculum? William Wood: Innovations in Teaching Undergraduate Biology and Why We Need Them Please keep them coming. Read More ›

My Favorite Tool - Twitter
Belinda Weaver / 2018-01-08
Why do I like Twitter? At the risk of sounding like a shallow person with a short attention span, Twitter really is a favourite of mine. I tweet as cloudaus. It’s not really Twitter’s fault that people use it for shameless self-promotion, for marketing, or for fighting culture wars. That side of it has probably turned many people off, which is a shame because Twitter has so many practical uses. Amplification and reach Twitter is a messaging tool. Put your message out and Twitter can amplify it many, many times. You can put up a poster up around your campus but Twitter will extend your reach exponentially. I used it to advertise Software Carpentry and Library Carpentry workshops, ResBaz events, Library Carpentry sprints, and many other things. The workshops fill up within days, sometimes within hours. Other Tweeters help spread the word. Job done. Early career researchers generally struggle to be seen and known as they start to build their careers. Using Twitter to establish a social media profile is a step into the light. Once you have a following, you can spread the word about your research and and start to get known within your field. Being ‘social’ is a key part of academic careers now and Twitter makes it easy. Just be sure to avoid the pitfalls that have sunk many people on Twitter. Remember, careless tweets, or tweets that can be taken the wrong way, will be amplified too. Find your tribe Twitter is a great place to find people interested in the same things as you. Follow people who share your interests, whatever they might be, and you will find more people through scanning the networks of the people you initially followed. Pretty soon, you will have a feed that helps you stay on top of everything new and exciting in your field. Think you don’t have time for Twitter? Think again. Twitter saves you time by filtering information that matches your interests. Finding work People post job openings on Twitter all the time. If you are hunting for a job in your field, Twitter is where you’ll hear about it first because your network will tell you. Ditto for grant opportunities, PhD scholarships and placements and all kinds of interesting opportunities. Getting and giving help People ask questions on Twitter all the time. Chances are someone, somewhere can answer that curly question you have. Some organisations provide help services through Twitter, for example, people can get nVivo help by tweeting to QSR. If you have a beef with a service industry, tweet about it - you will be surprised how quickly that issue will be fixed. I like to share what I know or have found useful, so I use Twitter to point to interesting reports, or to highlight issues that I think people in my network will care about. Providing a link to something useful is a great way to contribute to your network on Twitter. Direct messaging I use Twitter direct messaging all the time. It is much more accessible to someone than email if that person is teaching a class or travelling or at a conference where they are tweeting from their phone. Response is usually immediate. DMs can be great for meeting up with people at conferences, especially if you don’t have their email address or phone number. If they follow you, DMs can put you in touch. Following thought leaders By following people or organisations working in areas that interest me, I have learned a huge amount and met great people. Some I only know through Twitter, but I can still talk to them there and find out what is going on. Using their Twitter handle, I can address a tweet directly to them. Are there people you admire? Follow them and be part of their conversation. Hashtags I can’t get to every conference or event I am interested in. But I can follow the event hashtag and keep an eye on what is happening. I can do that while the event is live or catch up on it later. Either way, I can plug into the event, and possibly find new people to follow from some of the interesting or informative tweets that have come out of it. I can even tweet to the event hashtag myself, and possibly get a question asked. It’s all part of extending your reach. When momentous events happen, following a hashtag is a fantastic way of keeping up with the latest. Twitter has been a great early warning system for natural disasters, for example, as well as a place to see footage of extraordinary events like floods or cyclones or snowstorms. It’s often the most efficient way to keep up with political events that are unfolding very quickly. Widen your world I first heard about Software Carpentry on Twitter. Now I work for them. A whole range of amazing initiatives and communities - rOpenSci, the Dat project, csv,conf - first caught my eye on Twitter. It’s the tool that connects me to my world. What’s not to love about that? How does the tool help me in my work? My work is all about this community. Twitter keeps me in touch with people in the community and the issues that matter to them, so it is invaluable. What do you wish someone had told you when you first started learning how to use this tool? That they would switch up to 280-character tweets and ruin it! Limiting tweets to 140 characters forced people to be concise. It takes a lot longer to scan the feed now and extract the nuggets. Bring back 140-character tweets! Belinda Weaver, Community Development Lead, Software and Data Carpentry Have you got a favourite tool you would like to tell us about? Please use this form to add a bit of detail and we will do the rest. You can read the background to these posts here. Read More ›

Fourteen and Counting - People's Favorite Tools
Belinda Weaver / 2018-01-08
Thank you to all the people who have sent in short posts about their favourite tools. We are up to fourteen now, and the variety of tools is great to see. Finding out what other people use - and possibly more importantly - why they use it, and for what, is a great shortcut for researchers. There are a bewildering number of research tools around. Getting a tip from someone in your discipline, or from someone who is doing a similar task to you, can really help. So why not tell us about yours? All it takes is a few short sentences in a form. So far we have had Paula Martinez on R, with Bianca Peterson enthusiastically seconding, Jeff Oliver sharing his love of Git and GitHub, Kellie Ottoboni talking up IPython, and Thomas Arildsen on how the Jupyter Notebook facilitates his teaching. Juliane Schneider weighed in on the wonders of OpenRefine. Clifton Franklund likes RStudio, while Francesco Montanari is a fan of emacs. Rayna Harris nominated videoconferencing as her most useful research tool, while Greg Wilson talks up the benefits of asking for help. Robert Sare has posted on the benefits of using rasterio in earth sciences research. Since then, we have had Richard Vankoningsveld on why he uses a coding sandbox, and Auriel Fournier telling us how she uses Todoist to stay on track. QGIS just got a big tick from Simon Waldman. Expect more posts as people contribute further favourites. Even if your tool has already been mentioned, we would still welcome a post about it, as your use of the tool may be different. Have you got a favourite tool you would like to tell us about? Please use this form to add a bit of detail and we will do the rest. You can read the background to these posts here. Read More ›

A Successful 2nd RSE Conference
Tania Sanchez / 2018-01-03
The second RSE conference took place on the 7th and 8th of September 2017 at the Museum of Science and Industry (MOSI). There were over 200 attendees, 40 talks, 15 workshops, 3 keynote talks, one of which was given by our very own head honcho Mike Croucher (slides here), and geeky chats galore. RSE team members Mozghan and Tania were involved in the organising committee as talks co-chairs and diversity chair (disclose: they had nothing to do with Mike’s keynote). Also, all of the RSE Sheffield team members made it to the conference, which seems to be a first due to the diverse commitments and project involvement of all of us. Once again, the event was a huge success thanks to the efforts of the committee and volunteers as well as the amazing RSE community that made this an engaging and welcoming event. Conference highlights With so many parallel sessions, workshops, and chats happening all at the same time it is quite complicated to keep a track of every single thing going on. And it seems rather unlikely that this will change over time as it was evident that the RSE community has outgrown the current conference size. So we decided to highlight our favourites of the event: The talk on ‘Imposter syndrome’ by Vijay Sharma: Who in the scientific community has not ever experienced this? Exactly! So when given the chance everyone jumped into this talk full of relatable stories and handy tips on how to get over it. Another talk that seemed to have gathered loads of interest was that of Toby Hodges from EMBL on community building. This came as no surprise (at least to me) as RSEs often act as community builders or as a bridge between collaborating communities. Opposed to just being focused on developing software and pushing it into production. During the first day the RSEs had the chance to have a go at interacting with the Microsoft Hololens. There was a considerable queue to have a go at this, and unfortunately, we were not among the chosen ones to play with this. Maybe in the future. My hands-on workshop on ‘Jupyter notebooks for reproducible research’. I was ecstatic to know the community found this workshop interesting and had to run this twice!!! Also, I’d like to casually throw in here that I have been elected as a committee member for the UK RSE association, so expect to read more about this in this blog. For obvious reasons I missed most of the workshops but Kenji Takeda’s workshop on ‘Learn how to become an AI Super-RSE’ was another favourite of the delegates as this was run twice too! Our workshop on Jupyter notebooks for reproducible research Being a RSE means that I serve as an advocate of sustainable software development. Also, as I have discussed here before: I am greatly concerned about reproducibility and replicability in science. Which, I might add, is not an easy task to embark onto. Thankfully, there are loads of tools and practices that we can adopt as part of our workflows to ensure that the code we develop is done by following the best practices possible, and as a consequence, can support science accordingly. Naturally, as members of the community come up with more refined and powerful tools in the realm of scientific computing we (the users and other developers) adopt some of those tools meaning that we often end up modifying our workflows. Such is the case of Jupyter notebooks. They brought up to life a whole new era of literate programming: where scientist, students, data scientist, and aficionados can share their scripts in a human readable format. What is more important, they transform scripts into a conveying scientific narrative where functions and loops are followed by their graphical outputs or allow the user to interact via widgets. This ability to openly share whole analysis pipelines is for sure, a step in the right direction. However, the adoption of tools like this brings not only a number of advantages but also presents a number of challenges and integration issues with previously developed tools. For example, the traditional version control tools (including diff and merge tools) do not play nicely with the notebooks. Also, the notebooks have to be tested as any other piece of code. During the workshop, I introduced two tools: nbdime and nbval, which were developed as part of the European funded project: OpenDreamKit. Such tools introduce very much needed version control and validation capabilities to the Jupyter notebooks, addressing some of the issues mentioned before. So in order to cover these tools as well as how you would integrate them within your workflow I divided the workshop in three parts: diffing and merging of the notebooks, notebooks validation, and a brief 101 on reproducibility practices. Notebooks diffing and merging During the first part of the workshop the attendees shared their experiences using traditional version control tools with Jupyter notebooks… unsurprisingly everyone had had terrible experiences. Then all of them had some hands-on time on how to use nbdime for diffing and merging from the command line as well as from their rich html rendered version (completely offline). As we progressed with the tutorial I could see some happy faces around the room and they all agreed that this was much needed. Need more convincing? This tweet showed up in my feed just this week. And just earlier this week this tweet showed up on my feed: Notebooks validation The second part of the workshop focused on the validation of the notebooks. And here I would like to ask this first: ‘How many of you have found an amazing notebook somewhere in the web just to clone it and find out that it just does not work: dependencies are broken, functions are deprecated, can’t tell if the results are reproducible? I can tell you, we have all been there. And in such cases nbval is your best friend. It is a py.test plugin to determine whether execution of the stored inputs match the stored outputs of the .ipynb file. Whilst also ensuring that the notebooks are running without errors. This lead to an incredible discussion on its place within conventional testing approaches. Certainly, it does not replace unit testing or integration testing, but it could be seen as a form of regression testing for the notebooks. Want to make sure that your awesome documentation formed by Jupyter notebooks is still working in a few months time? Why not use CI and nbval? Wrapping up The closing to the workshop was a 101 on working towards reproducible scientific computing. We shared some of our approaches for reproducible workflows and encouraged the delegates to share theirs. We covered topics such as valuing your digital assets, licensing, automation, version control and continuous integration, among others. The perfect close to a great RSE conference! Just a few more things Let me highlight that all the materials for the workshop can be found at https://github.com/trallard/JNB_reproducible and that all of it is completely self contained in the form of a Docker container. If you missed out on the conference and would like to see the videos and slides of the various talks do not forget to visit the RSE conference website. This post originally appeared here. Reposted with permission. Read More ›

My Favorite Tool - QGIS
Simon Waldman / 2018-01-02
QGIS (formerly Quantum GIS) is an open source Geographic Information System. QGIS is capable of advanced analysis and cartography, but I don’t use it for that. In my research in hydrodynamic modelling, I deal with a lot of spatial data - coastlines, bathymetry, and the like - and this will eventually be processed and plotted using R, MATLAB or Python. But if I’ve received a file and simply want to take a quick look at it, or if I want to quickly compare two files that use different coordinate systems and see if things line up, most of the time I can throw the file at QGIS and it will show it to me with a few clicks. This approach lacks the reproducibility of a coded solution, but it’s an awful lot quicker for a throwaway visualisation. – Simon Waldman, Postdoc, Aberdeen, Scotland Have you got a favorite tool you would like to tell us about? Please use this form to add a bit of detail and we will do the rest to create a blog post about it. You can read the background to these posts here, or see what other tools people have written about. Read More ›

My Favorite tool - A Coding Sandbox
Richard Vankoningsveld / 2017-12-21
My favorite tool is repl.it, a ‘sandbox’ tool to help me test and refine Python code. Why do I like it? It is an online tool for testing/running code. It allows you to write and run code in a number of different languages, and see the results. No R support I’m afraid, but it does have Python (2 and 3), Ruby, Haskell, Go, JavaScript and many more. I use it to both test small bits of Python code to ensure they work as expected, but also to act as a virtual Python shell and run scripts that take input (text I paste in) and print results to stdout. I can then copy these results to paste into wherever I need to put them (Word, Excel, etc). For Python fans like me, it also boasts every PyPI package available, although for packages that need Internet access (e.g., requests) you need to sign up for a paid account. I like repl.it for many reasons. For one, it means I have Python (and other languages I might need) available at a workplace where I otherwise would not. It is free for a basic account, you can save your work, it’s pretty fast and responsive. It is also very actively developed so new features get added all the time. For a quick playground to test scripts, I can’t fault it. How does the tool help me in my work? I work somewhere that doesn’t have Python available natively, so now I can use Python to help do my work without the hassle of trying to get it installed, which would be hard. What do you wish someone had told you when you first started learning how to use this tool? It would have been good to know that it doesn’t include Internet access with the free account - it took me a while to figure out why I couldn’t get requests to work (even though what I wanted was available as a Python package). – Richard Vankoningsveld, Technology & Support Librarian, Queensland Supreme Court Library, Brisbane Have you got a favorite tool you would like to tell us about? Please use this form to add a bit of detail and we will do the rest to create a blog post about it. You can read the background to these posts here, or see what other tools people have written about. Read More ›

Green Stickies for 2017: Report from the Happy Holidays Green Sticky Party Calls
Belinda Weaver / 2017-12-14
When Carpentries’ staff score a win, we call it a #greensticky. So for our final community calls for 2017, we hosted two Happy Holidays Green Sticky Parties to celebrate wins we have had this year. The calls highlighted successes, both personal ones, and wins for the wider community. Among the pluses reported were: The upcoming Data and Software Carpentry merger Plans to bring Library Carpentry into the fold Planning for our 2018 CarpentryCon event Starting bi-lingual (English/Spanish) teaching demos Getting ideas for starting a community from the Community Champions calls Teaching a Software Carpentry workshop for the first time Successfully using Carpentry-style teaching in an undergraduate programming course Getting to meet community members in person Tracy Teal said that calls like these are the #greensticky, and no-one disagreed. Red stickies got less air time (this was a party, after all!) but both calls discussed their biggest (Carpentry-related) fear for These included: Leaving people behind Different world views in Data and Software Carpentry - can we sound like ‘one band’? Keeping our brand strong as we grow (and other scaling issues) Keeping our community engaged and informed Community renewal so the same people aren’t always volunteering For those who couldn’t make either call, sorry we missed you! You can see the conversations on the etherpad. Happy Holidays and see you in 2018. Read More ›

Our 2017 Community Service Award winner: Anelda van der Walt
Christina Koch, Kate Hertweck / 2017-12-13
The Carpentries are happy to honor Anelda van der Walt as our 2017 Community Service Award winner. We received seven independent nominations for Anelda this year, which is a testament to her commitment to both individual people and the broader community. Starting from scratch, Anelda planted the tiny seed that has now become the phenomenal growth of Software and Data Carpentry in South Africa, not to mention its spread to an ever-growing list of other African countries, such as Namibia, Botswana, Ghana, Gabon, Mauritius, and Ethiopia. With great determination and persistence, she secured funding to enable a range of workshops and Instructor trainings to be run, such as this first workshop in 2016 and this one in 2017. Funding meant many participants could travel to and attend training which would normally have been far beyond their reach. She also secured the first ever Software and Data Carpentry membership in South Africa. Through her passion for the Carpentries, she has inspired many people to acquire the command line, HPC and other skills that many thought were beyond their capacity to learn. Since then, she has successfully grown a pool of qualified instructors and has helped hundreds of researchers in South Africa and other African countries develop foundational computational and data skills to drive their research forward. Instructor numbers are now above 22. Community and capacity building on this scale are much more challenging in southern Africa. Differing research sector priorities, cultural issues, and the availability (or otherwise) of reliable networked infrastructure mean that funding alone is not the only challenge workshop organizers face. Given this, it is commendable that Anelda has worked so hard to foster and support diversity, reaching out to researchers in rural areas and actively working to include groups hitherto under-represented in STEM. In addition to capacity building, she has taught at more than 10 workshops and has both organized and taught at three instructor training events within the past 18 months. Post-training, she has followed up with trainees to encourage them to complete their check-out, and has helped many begin planning and running their own workshops, oftentimes helping them source extra instructors and helpers. She encourages Instructors across Africa to interact with each other via African-centred calls like this, both to foster collaboration and to ensure new Instructors feel valued and welcomed into the community. She also contributes to the global Carpentries community by participating in regular Trainer discussions and meetings and by taking her turn at hosting instructor discussion sessions and teaching demos. Congratulations Anelda and thank you very much for everything you have done – we honor and value the work you do for the Carpentries. Read More ›

Announcing the 2018 Executive Council for the Carpentries
Kate Hertweck / 2017-12-12
Voting in the election for community governance of the Carpentries (Executive Council, formerly named Steering Committee or Board of Directors) closed last week. Out of the 501 members eligible for voting, 147 ballots were cast (29% turnout). We are pleased to announce the four newly elected members of the Executive Council: Raniere Silva Lex Nederbragt Amy Hodge Elizabeth Wickes Raniere and Lex received the highest number of votes and will serve two year terms; Amy and Elizabeth will serve one year terms. These four elected members will join the five appointed Council members selected from the current leadership of Software Carpentry and Data Carpentry: Karen Cranston is a computational biologist at Agriculture and Agri-Food Canada working on digitisation and integration of biodiversity data. She was the lead PI of the Open Tree of Life phylogeny synthesis project, and serves on the board of the Open Bioinformatics Foundation (OBF). She has been involved with Software Carpentry since 2012, was a founding board member of Data Carpentry, and is a certified instructor trainer. Kate Hertweck is an Assistant Professor at the University of Texas at Tyler. Her research and teaching focuses on bioinformatics and genomics. She completed Instructor Training in fall 2014, served on the Mentoring Subcommittee in 2015, and was elected to the Software Carpentry Steering Committee in 2016 and 2017, also serving as Chair in 2017. Mateusz Kuzak is Scientific Community Manager at the Dutch Tech Center for Life Sciences. He has background in bioinformatics live cell imaging and research software engineering, and is passionate about Open Source, Open Science and Reproducible Research. He is currently working on training activities and coordinating life science data and technology projects in the Netherlands. Mateusz is an Instructor Trainer and was elected to the 2017 Software Carpentry Steering Committee. Sue McClatchy is a bioinformatician and research program manager at the Jackson Laboratory. She provides research training at all academic levels from high school to faculty. She mentors students and develops training materials for analysis of quantitative and high-throughput data. Her expertise in curriculum design and instruction stems from an eight-year science teaching career in schools in the U.S. and Latin America. Sue is an Instructor Trainer and was elected to the 2017 Software Carpentry Steering Committee. Ethan White is an Associate Professor at the University of Florida working on computational and data-intensive ecology. He is a Moore Foundation Investigator in Data Driven Discovery and serves on the board of directors of Impactstory. He has been involved in Software Carpentry since 2009, was a founding member of the Data Carpentry steering committee, wrote the first version of the Data Carpentry Ecology SQL material, and leads the development of the semester long Data Carpentry course for biologists. Many thanks to all candidates who chose to stand for election. The voting was very close, which reflects the commitment you all show towards service to our community. We are fortunate to have such awesome leaders representing diverse education, careers, and geography. We look forward to continuing to work with you in the Carpentries community, and hope you will consider pursuing other opportunities for leadership. Also thanks to the outgoing steering committee members: Software Carpentry: Rayna Harris, Christina Koch, Karin Lagesen Data Carpentry: Hilmar Lapp, Aleksandra Pawlik, Karthik Ram Finally, thanks to all of you across the Carpentries for your continued participation and engagement! Read More ›

When Do Workshops Work? A Response to the 'Null Effects' paper from Feldon et al.
Karen Word / 2017-12-11
Author: Karen R. Word Contributors: Kari Jordan, Erin Becker, Jason Williams, Pamela Reynolds, Amy Hodge, Maxim Belkin, Ben Marwick, and Tracy Teal. “Null effects of boot camps and short-format training for PhD students in life sciences” is the provocative title of a recent article in the Proceedings of the National Academy of Sciences. Those of us who enthusiastically design and deliver short-format training promptly took note, then scratched our heads a bit. We waited a little for a response, wondering if one or more of the programs that participated in the study might step up to their own defense. Nothing happened. We thought about letting it go - we’ve got our own programs, with distinct goals, and our own assessment data, so maybe this broad-brush study isn’t so important. But … it keeps being raised. Someone will bring it up here and there, asking what we think about it. Whenever this paper comes up in conversation, its title certainly throws some weight around. So, do workshops work? However certain we may be about the value of our own programs, it seems important to have a little sit-down with this paper and talk about what it means to us, what it doesn’t mean and, most importantly, what it does not address at all: the question of what you can do with a short course [1] when a short course is all you’ve got. The premise: Spacing instruction over time is better for learning When given a choice between teaching a two-day short course versus stretching those same hours and content across several weeks of repeated meetings, you can expect to get a lot more learning out of the longer course. This point, described as a core premise for the PNAS study, is essentially irreproachable. There is abundant evidence that distributing instruction over time maximizes learning in comparison with the “massed practice” that occurs when teaching is concentrated into an intensive short-format course. The problem: Spacing instruction over time is often impractical Traditional courses match students and faculty on a spaced schedule over a quarter or semester time period. When this format is possible, it should be pursued and optimized, not replaced with short courses. But when isn’t it possible? When there aren’t enough instructors. If expertise in an area is scarce, the time demand for distributed training often exceeds the FTEs available to meet that need. Until that shortage can be remedied, a large number of people are left to self-teach or go without. Under these circumstances, short-format workshops are often the only practical way to deliver training to the many more who need it. This is currently the situation with regard to training in data management and analysis, and in many cases, with foundational computing skills as well. When learners don’t have time. A similar scenario emerges when those in need of training are fully committed to jobs or research or are otherwise unavailable for a time-distributed course. This is the case for most professional-development training. Even within academia, researchers may need training right away and can’t wait for the next semester-long course offering. When opportunity knocks. Even within graduate school, where long-format courses are the norm, some opportunities are concentrated in time. For example, a short course may be able to attract many faculty simultaneously, allowing students to observe them engaging with and learning from each other. Some research experiences or team-building activities may also be possible only on a concentrated schedule. Also where traditional course curricula can be slow to change, short-courses can permit rapid inclusion of new and needed skills before they can be added elsewhere. For those of us who work within the short course mandate, then, the question becomes: how can we optimize that format to best meet learners’ needs? When setting goals for impact, we tend to think in terms of how much and what type of impact we can have, and to focus our efforts accordingly. One reason why the paper by Feldon et al. raises concern within our community is because it frames the question as “whether”. And if the answer to “whether” we can have an impact with a short course is “no”, then we’ve clearly got a problem on our hands. However, in our experience, that simply is not the case. To the contrary, our evidence suggests that there is quite a lot you can accomplish with a workshop when you accept its constraints, focus on specific goals, and leverage the strengths of this format. In the next section, we’ll take a look at the study described in the paper, evaluate its claims, and examine its relevance to the kind of training we provide. Then we’ll circle back around to our goals, our strategies, and the kind of data that we collect to assess and inform the development of our workshops. The study There is a lot to love in this work! This was not a simple survey study. They graded papers – multiple times, with validation, for 294 students from 53 institutions. They also repeatedly administered tests and surveys over the course of two years. The dataset must be impressive; we assume there is a LOT of other interesting stuff there that relates to graduate student development and correlates of early success. However, it is hard to know since the data are not publicly available or displayed in the paper. We’re eager to see more publications and perhaps more extensively summarized data come out of this project in the future. That being said, in discussion with our community members, several persistent questions and concerns emerged. These are a few of the most pertinent questions: 1. How diverse are the program goals? This study lumps together an unknown number of programs administered at the outset of life-science PhD programs as a single treatment. We know only that 53 institutions were sampled and that, of the 294 students in the study, 48 were short-course “participants”. According to Feldon et al., the unifying goal of these programs is to “accelerate the development of doctoral students’ research skills and acculturation”, with emphasis on research design, statistics, writing, and socialization. However, specific emphasis seems likely to vary, and herein lies the concern most frequently voiced in our community: any given program might focus its efforts on any or all of the components identified (research, statistics, writing, or socialization). Indeed, the more astutely a program identifies and engages with short-format limitations, the more focused their program may be. By surveying students across 53 different institutions, it seems highly likely that the specific aims of different programs are heading in different directions. If some programs are particularly good at socializing students and preparing them to cope with the hurdles ahead, while others emphasize grant writing, otherwise ‘significant’ impacts within a sub-group of similar programs are likely to be lost when combined and assessed with the group overall. This is particularly clear if we consider the sample size of 48 students as being further split (e.g. 10, 10, 15, 13) by distinct program emphases. Lumping together successful programs with different aims is likely to show that all are ineffective in each category. 2. How generalizable is this context? The public reading of these findings seems to be, “Too bad short courses don’t work”. However, pre-PhD short-courses are a highly specific and unusual context for a short course. In most other cases, short courses arise out of necessity or unique opportunity, such that there is no subsequent distributed content that re-teaches or even remotely overlaps with the content taught in the short course. In pre-PhD programs, specifically, any effects are potentially in direct competition with gains made via traditional course content. The extent to which the same or overlapping content is otherwise available in each program is also unclear. The authors of this paper might not have intended their work to generalize to other contexts, but the tendency of readers to generalize makes this question a vital one. Benefits of a short course are easily lost in a sea of positive outcomes resulting from graduate training, but that has little bearing on the impact such courses may have when they stand alone. 3. Is this the right experiment to test graduate student outcomes? While we found the methods to be impressive and worthwhile in many respects, several people expressed concern about the two-year assessment regime. This included questions as to whether a graduate student is likely to have matured and, particularly, to have written substantively in their content area within the first two years of study, as well as whether a regime of continuous surveys might itself have a sizeable impact on student development. As with any study that takes volunteers, willingness to participate – both in the short course programs and in the study itself – may bias toward more motivated or engaged students overall, and this could have an impact on the interpretation of the results. These are the sorts of problems that plague any effort at assessing students at scale, and are worth noting only as a standard “grain of salt” with which any study should be (but is not always) considered when it stands alone. 4. How do we go about making short courses more successful? This paper provides no means of evaluating variation between programs, which is really where our interests lie. This is not a criticism: it is simply not the purpose of the paper. It is the next question, the natural response to such results: if these programs really aren’t making a difference, how might we capture the opportunity, with existing funded and institutionally invested programs, to change that? Is it that short course workshops have no impact on anything, or that we need to better understand and plan for what they can accomplish? We have a few suggestions. What We Do Software and Data Carpentry offer short-course training for academics and professional researchers in software and data management skills. Many of our affiliates, who have also contributed to this response, offer other short courses in related subjects. We are all driven to the short-course format out of necessity. We recognize that this format places severe constraints on the quantity of information that can successfully be conveyed, but we design our curriculum and train our instructors specifically to maximize our effectiveness in this format. Here’s how we do it: Streamline content. We aim to teach only the most immediately useful skills that can be taught and learned quickly. We teach our instructors to resist the urge to “get through everything” or pack extra details into their explanations. Teach strategically. We keep learners active by using live coding (in which learners work through lessons along with the instructor) and frequent formative assessment. We teach instructors to be mindful of the limitations of short-term memory and to focus instruction and assessments to minimize cognitive load. Meet learners where they are. Our workshops attract a diverse population of learners, from novices to experienced IT personnel. Our learners use colored sticky notes to indicate when they are stuck. We teach instructors how to use this to adjust their pacing. We also recruit workshop “helpers” who can directly coach learners who may be struggling. The absence of performance-based grades gives us added flexibility to meet diverse needs by generating diverse learning outcomes. Some may learn about the “big picture” of a new programming language by completing a lesson, while others may come away having added “tips and tricks” to their existing skills. This is one area in which workshops may have an advantage over traditional courses, particularly when it comes to confidence- and motivation-based outcomes. Normalize error and demonstrate recovery. We know and expect that our learners will acquire the bulk of their skill independently. Willingness to make mistakes and awareness of problem-solving strategies are far more crucial to their success than any particular content. We coach our instructors to embrace and even delight in their own errors as an opportunity to model healthy and effective responses. Explicitly address motivation and self efficacy. One substantial advantage that we have is that our learners attend our workshops because they are motivated to learn precisely what we teach. However, preserving and nurturing that motivation is crucial. Perseverance results not only from embracing error as normal, but also from learners’ personal belief in their ability to succeed. Creating a workshop in which learners can be successful in both learning and in demonstrating to themselves that they have learned is one piece of this. We spend a good deal of time discussing motivation with our instructors. We explain why saying “it’s easy, anyone can do it” is often demotivating. We explore the differences between novice and expert perspectives and coach instructors to be mindful of and to respect the novice experience. We teach instructors to foster a growth mindset in their language and learner interactions. We emphasize that a relaxed, welcoming, and positive workshop experience is one of the most important things we can provide. Build community. The more people at all levels are able to share what they know, the more efficiently we can distribute knowledge. As a volunteer organization, we have a strong community of instructors, lesson maintainers, and others. As learners progress, they often become involved in this community. In the long range, we hope to create a community that can provide widespread support directly to learners. What we know about our impact We have conducted both short-term and long-term follow-up assessments of learners. Data Carpentry post-workshop survey results have always been positive and 85% of learners report that they agree that they would recommend our workshops to a colleague. The Carpentries’ Long-Term Impact survey (n = 530) is designed to determine whether this positive experience and self-reported increase in confidence affects long term outcomes. This survey (full report here) measured self-reported behaviors around good data management practices, change in confidence in open source tools, and other specific program goals. It also explored other ways the workshop may have impacted learners, such as improved research productivity. While Feldon et al. rightly critique self-assessment with regard to performance metrics, many of our target outcomes are more conducive to self-evaluation, e.g. confidence, motivation, and daily work habits. Researchers report increased daily programming usage after attending our two-day coding workshops, and sixty-five percent of respondents report higher confidence in working with data and open source tools as a result of completing the workshop. Our long-term assessment data shows a decline in the percentage of respondents that ‘have not been using these tools’ (-11.1%), and an increase in the percentage of those who now use the tools on daily basis (14.5%). Additional highlights from our long-term survey report include: 77% of respondents reported being more confident in the tools that were covered during their workshop compared to before the workshop. 54% of respondents have made their analyses more reproducible as a result of completing a workshop. 65% of respondents have gained confidence in working with data as a result of completing a workshop. 74% of respondents have recommended our workshops to a friend or colleague. We see that short-format workshops can be effective at increasing researchers’ confidence, use of coding skills, and adoption of reproducible research perspectives. As a part of the Open Source community, we make all of our survey data and analysis code available in our assessment repository. We welcome people to work with our survey data and ask new questions. Understanding impact is important, and we will continue to keep our community informed with regular releases of survey data and reports. We also have a virtual assessment network which newcomers are welcome to be part of. Please join here if you are interested in discussing assessment efforts in the area of training in research computing. In Closing … Our data suggest that we are having a positive impact, and we expect that other short-format programs can be similarly effective. However, this likely requires a focused effort on optimizing within the limitations of a short course, along with clear goals and targeted assessment to demonstrate such efficacy. It is not clear that this was the case for any of the programs surveyed by Feldon et al. , and if it was, it is not clear to us that any such specific and variable successes would be discernable in their study. We agree, however, that under most circumstances, particularly where a large quantity of content needs to be taught, a short-format course should not be favored over any available time-distributed alternative. We applaud, encourage, and endeavor to support those who have the access and opportunity to conduct long-format training in the subjects we teach. Many members of our community are actively involved in traditional undergraduate and graduate instruction of this kind. Traditional training opportunities will begin to catch up with demand for training in data science generally, but there will always be limitations - concepts or tools that don’t clearly fit into curriculum or new approaches that haven’t yet had a chance to be incorporated. We work on training in these gaps through short courses. It is necessary for us to be as effective as possible to achieve that mission. So far, we feel comfortable declaring that effort a success. [1] While the paper refers to programs as either “boot camps”, “bridge programs”, or “short-format training”, it has been brought to our attention that this usage of “boot camp” can cause some consternation for those with military training or under military regimes. We will therefore use the less-vivid but more-accurate “short course” label for this piece. Read More ›

My Favorite Tool - Todoist
Auriel Fournier / 2017-12-11
My favorite tool is Todoist, a task manager. Why do I like it? Todoist is a cross-platform ‘to-do’ tool. I went through many different ones before settling on Todoist a few years ago and I now use it to organize everything from my household chores, grocery shopping and gift ideas for family members to my many work projects and non-work projects (things I do that are work-related but not apart of my actual job). Todoist can also work really well in a team. Right now at work I’m the only one who uses it, but my husband and I use it for household communication about shopping and other tasks that need to be completed, we can assign each other tasks, with time-triggered reminders and it really helps us stay on top of things. Everyone works differently, and I hope you have a system that works well so you can get work done - I just wanted to share what has become a crucial part of my system. How does the tool help me in my work? The ability to have recurring tasks that I can easily input such as ‘second Thursday of every month at 2pm’ and the ability to add emails as tasks, with a link that opens up the email, have changed everything about the way I work. It allows me to clean out my inbox more effectively by assigning emails as tasks so I can answer them at the appropriate time, or have the email at hand for an important meeting in a few weeks, and it easily allows me to break down tasks into their component parts and keep track of them. What do you wish someone had told you when you first started learning how to use this tool? If you use Gmail, the integration with Todoist and Gmail is fantastic. Todoist also has integration with many other platforms, few of which I personally find useful but I have a few friends who think that is where the true power of Todoist lies. – Auriel Fournier, Postdoctoral Research Associate, Ocean Springs, Mississippi, USA. Have you got a favourite tool you would like to tell us about? Please use this form to add a bit of detail and we will do the rest to create a blog post about it. You can read the background to these posts here, or see what other tools people have written about. Read More ›

Poster Competition Now Open
Belinda Weaver / 2017-12-11
Design a Promotional Poster for CarpentryCon 2018 The Carpentries are excited to announce our poster design competition where some lucky member of our community will win a free registration for CarpentryCon 2018. Are there any artists out there who could design a fantastic poster? If so, we want to see your design! The winning designer will take home some fantastic Carpentries swag and gain free registration to CarpentryCon 2018. This includes registration for the conference only. It does not include travel to and from Dublin. Information for the Poster CarpentryCon 2018 will take place in Dublin from 30 May - 1 June, 2018 at University College Dublin with the overarching theme of Building Locally, Connecting Globally. The three-day event will focus on professional development, community building and networking, as we aim to help skill up the next generation of research leaders. Find out more from the CarpentryCon 2018 website. About the Competition What will the poster be used for? The winning poster will be used for promotional and advertising purposes and featured in all CarpentryCon 2018 advertising across a range of social media, including Twitter and Facebook. When can you submit a design? Entries can be submitted now. What is the closing date for entries? 5pm UTC on 31 January, 2018. We will announce the winner on 10 February, 2018. Who can enter? Anyone may enter, though entrants must be over the age of 18. Can I submit more than one design? Yes, competition entrants are welcome to submit as many designs as they like. What does the poster need to include? Posters should reflect the conference theme of Building Locally, Connecting Globally, and should also include the venue and dates (see above) and a link to the CarpentryCon website. How will the judging be done? Entries in the poster competition will be assessed by the members of the CarpentryCon task force who will choose the poster that best embodies the conference theme of Building Locally, Connecting Globally. The winner will be notified by email once the decision has been reached, and unsuccessful entrants will also be notified by email of the result. How to submit your design Submit your design as an A4-sized PDF email attachment by 31 January, 2018 to team AT carpentries.org, with the words CarpentryCon Poster Competition in the subject line. Please include your name in your entry. Submitting a poster indicates your acceptance of the Terms and Conditions stated below. Competition Terms and Conditions: Entry to this competition means that the entrant warrants that he or she meets the entry requirements and accepts these terms and conditions. Entrants are responsible for any and all expenses that they incur in creating their poster and submitting it to the competition. No costs incurred by competition entrants will be reimbursed. Entrants to the competition will retain the full copyright of their design. Entrants to the competition warrant that any image(s) used in their design is free for use and does not infringe any organization’s or person’s copyright. Entrants to the competition warrant that they have all necessary rights to provide the intellectual property within their poster to the Carpentries for promotional and advertising purposes. Entrants to the competition warrant that they license the Carpentries to use the winning design at no cost to promote CarpentryCon 2018. The Carpentries retain the right to disqualify any entrants or entries where we reasonably suspect any unlawful or improper conduct, such as infringing a third party’s intellectual property rights, or otherwise breaching the competition’s terms and conditions. All personal information collected during this competition will be handled in accordance with our privacy policy. Read More ›

Challenges Assessing Data Science
Marianne Corvellec, Kari L. Jordan / 2017-12-11
The Assessment Network was established as a space for those working on assessment within the open source/research computing community to collaborate and share resources. During our quarterly meeting in November, we engaged one another in a conversation revolving around data science education. This meeting was organized and hosted online by Kari Jordan, and six community members attended. First, we discussed the definitions of data scientist, data analyst, and data engineer; second, we worked in pairs on a set of questions about assessing data science education. The session was exciting and fruitful, as it combined two topical efforts: on one hand, our organization’s focus on assessment and, on the other hand, our contribution to the global effort in defining, understanding, and shaping the rising field of data science. Kari Jordan attended a meeting of collaborators from industry, academia, and the non-profit sector to brainstorm the challenges and vision for keeping data science broad. During that meeting, a brainstorming session took place where attendees were asked to come up with core competencies for data science. This was difficult, as each sector identified competencies important for their particular interest. Kari thought it would be a good idea to talk about it with the assessment network. What is Data Science? So, what is data science? What are the core competencies? For a positive definition, we turn to the seminal “Data Science Venn Diagram” by Drew Conway, as reproduced by Jake VanderPlas in the preface of his Python Data Science Handbook. Data science lies at the intersection of statistics, computer science, and domain expertise (in industry-friendly terms, or traditional research, in academic terms). Data science is cross-disciplinary by definition. Hardly anyone gets formal training in all three areas. Most working data scientists are self-taught to a certain extent. Basically, it takes a growth mindset to be a data scientist! For a negative definition (in logician’s terms, i.e., what data science is not), we turn to industry job descriptions. It turns out that Marianne Corvellec served on a panel dedicated to the definition of these emerging occupations. This panel was held in 2016 with Québec’s Sectoral Committee for the ICT Workforce. It brought together industry professionals and HR specialists who would frame the discussion, and resulted in this report (in French; note that “architecte de(s) données” == data engineer and “scientifique de(s) données” == data scientist). This report is in line with academic sources (e.g., data science curricula at U.S. universities), insofar as a data scientist is not a data engineer. A data engineer takes care of data storage and warehousing; s/he builds, tests, and maintains a data pipeline, which integrates diverse data, transforms, cleans, and structures them. S/he masters big data technologies, such as Apache Hadoop, Apache Spark, and Amazon S3. Data engineers ensure the data are available (and in good shape) for data scientists to work with. What is a Data Scientist? More subtly, a data scientist is more than a data analyst. It takes an aptitude for collecting, organizing, and curating data, as well as for thinking analytically. A strong quantitative background is useful but not necessary. Principles and practices from the social sciences or digital humanities are valuable assets; data scientists should be good writers, good storytellers, and good communicators. Perhaps surprisingly, attention to detail is not a key item to include in a data scientist’s skillset; ability to grasp the big picture is much more key, as data scientists will find themselves working at the interface of very different departments or fields (in an industry context, these could be engineering, marketing, or business intelligence). A data scientist does not master any specific technology to perfection, since s/he dabbles in everything! Unlike the traditional data (or business intelligence) analyst, s/he resorts to several different frameworks and programming languages (as opposed to a given domain-specific platform) in order to leverage data. Plus, the data scientist typically works with datasets coming from multiple sources (as opposed to the traditional data analyst who usually works with a single data source already populated by an ETL solution). Data scientists are flexible with their tools and approaches. Challenges Assessing Data Science Education In the second part of the meeting, we split into breakout pairs to discuss the challenges of assessing data science education with respect to Carpentries’ workshops. Brainstorming in parallel lets us cover more ground (breadth), while interacting one-on-one lets us explore different avenues (depth). One pair focused on the industry perspective, another on the education system, and the third on assessment practices. Kari offered a list of questions to frame the discussion. Working groups identified challenges for assessing data science education at the object level (i.e., what should this assessment consist of?) and at the meta level (i.e., what favors or hinders the application of assessment?). At the meta level, the following prompts were discussed (pulled from South Big Data Hub’s Data Divide workshop): Vision for Assessing Data Science Education Stakeholders for Data Science Education What specific skills or resources are most important/lacking to address this challenge? How do our challenges fit into the national landscape? What is the broader impact of addressing our challenges? Check out the notes from our working groups to see what we came up with! Now is your chance to tell us what you think. We opened several issues on the Carpentries assessment repo. We’d love to engage you in a rich discussion around this topic. Comment on an issue, and tweet us your thoughts using the hashtag #carpentriesassessment. Read More ›

CarpentryCon 2018 - Website is Live!
Belinda Weaver / 2017-12-07
We are excited to announce that our website for CarpentryCon 2018 is live. CarpentryCon 2018 will take place at University College Dublin from 30 May - 1 June, 2018, with the overarching theme of “Building Globally, Connecting Locally”. The three-day event will focus on professional development, community building and networking, as we aim to help skill up the next generation of research leaders through a combination of talks, workshops and breakout sessions. We are currently developing our program of speakers and workshops and hope to have a preliminary draft up on the website early in the new year. In order to be able to keep registration costs low and to provide travel scholarships, we are seeking sponsorship for CarpentryCon 2018. Please feel free to share this call for sponsorship widely. Filling in the form does not commit organizations to anything beyond a conversation with us at this point - but we want to get those conversations started as soon as we can. Feel free to ping us with suggestions of potential CarpentryCon 2018 sponsors by emailing team AT carpentries.org with the details. We will take it from there. We will shortly be announcing a competition to design a promotional poster for the event. The winner will get free registration for CarpentryCon 2018. Stay on top of CarpentryCon 2018 news and announcements by signing up for our email list. Read More ›

Celebrate the Wins of 2017: Join the year's final community call
Belinda Weaver / 2017-12-05
When Carpentries staff get a win, we call it a #greensticky. These are things to celebrate with a resounding ‘Yay!’ So we decided to turn our regular December community call (our last for the year) into a Happy Holidays GreenSticky Party. 2017 has been a BIG year for our community. We negotiated the Carpentries merger, got planning underway for 2018’s CarpentryCon, trained a whole bunch of new instructors and trainers, and restarted our mentoring groups - and of course a ton of workshops were taught all round the globe. As we wind down for the holidays, it is worth taking time out to think over the great moments of the year. Perhaps you got a paper accepted, or you attended a fantastic conference. Maybe you snared a new dream job or taught your first workshop. Perhaps you finally finished your instructor checkout or got your institution to join the Carpentries. Or maybe you finally submitted your PhD! Whatever your good news story of 2017, we want to hear about it at our December community call. Come along and tell us about YOUR #greensticky for 2017. There will be two call times on 14 December - 2pm UTC and 11pm UTC. See the date and time for the first call and the second call in your time zone. Come along and share your good news with other members of the Carpentries community. Party hats optional … Read More ›

Upcoming Membership Webinar
Belinda Weaver / 2017-12-04
Organizational memberships are a great way to build your local Carpentries’ instructor community. Members organizations can run Software and Data Carpentry workshops more often, and can organize workshops for specific communities within their institution. As part of our membership services, we provide members with instructor training and mentorship, and give you easy access to pre- and post-workshop assessment surveys. As your organization leverages their membership to grow your pool of experienced instructors, we seek to find opportunities to connect common needs across our membership. Learn how this works, and how you can help us bring impactful Carpentries workshops to your organization at our upcoming membership webinar. Led by Executive Director Jonah Duckles, thwe webinar will take place at 7pm UTC on 5 December. See local date and time in your zone. The format will be a short presentation on what membership is, and how your organization can benefit. There will then be plenty of time for Q&A. Connection details (via Zoom) and sign up are on this etherpad. Read More ›

Lesson Infrastructure Subcommittee 2017 November meeting
Raniere Silva / 2017-12-04
On 16 November 2017 at 15:00 UTC+0, the Lesson Infrastructure Subcommittee had their 2017 November meeting. This post will cover the topics discussed and their resolutions. Software Carpentry and Data Carpentry merger With the merger in 2018, some Git repositories will be owned by a new GitHub organization. The Instructor Training course material has already been moved, you can now find it at http://carpentries.github.io/instructor-training/. Date for the migration will be announced in 2018. Instructions for migrating the repository can be find here. Syntax Highlight Thanks to naught101, the next release of our lesson will offer syntax highlighting to our readers. Lesson maintainers might need help to change ~~~ print("Hello World") ~~~ {: .foobar} to ~~~ print("Hello World") ~~~ {: .language-foobar} for example. If you want to help, send a pull request to us. Exercises Throughout the Episodes After a small discussion, we reached the consensus that it will be better to have exercises throughout the episodes instead of all the exercises at the end of the episode. Lessons will migrate to the new format in a slow pace because this change requires a good amount of work. Non-English Lessons If you are involved with us since 2014, you might remember this post about the attempt to translate the lesson to Spanish and this other post announcing the lessons in Korean. During the meeting, we had a conversation about the workflow to translate lessons to other languages, and there is now interest and work on a translation. Some of the conversation was archived as issues here. If you want to get involved with the translation join the Latinoamerica email list or see the updates. Windows Installer In March 2018, a discussion about our recommended text editor created a lot of buzz on the mailing list. The email thread started because sometimes nano wasn’t installed on the learners’ machines. The new version of Git Bash will include nano by default and we have a pull request, thanks to Oliver Stueker, to adopt the new version in our workshop instructions. The pull request will be merged at the end of this year or beginning of 2018. Next steps Version 9.3.x lesson template and lesson documentation was released. Maintainers are working to release the new version of the lessons before the end of the year. The subcommittee will meet again in February to provide an update on some of the topics covered by this post and discuss new requests from the community. Acknowledgement Thanks to Alejandra Gonzalez Beltran, Christina Koch, David Pérez-Suárez, Erin Becker, Naupaka Zimmerman and Tracy Teal. Read More ›

satRday Cape Town 2018
Jon Calder / 2017-12-03
satRdays are community-led, regional conferences to support collaboration, networking and innovation within the R community. satRdays are the brainchild of Stephanie Locke, and were inspired to a large extent by the success of SQLSaturdays. SQLSaturdays focus on developing technical knowledge around SQL Server and related technologies. The first SQLSaturday was held in 2007, and there are now well over 100 events across the globe. The idea behind holding the conferences on a Saturday, is that rather than limiting access to those who are able to get buy-in from their employers to attend a conference during the week, this makes them accessible for anyone who wishes to attend on their own time. Ticket prices are also kept as low as possible to try to keep these conferences accessible to students. A proposal was drafted and put before the R Consortium in early 2016, and with their approval and financial backing to help get the first few satRday events up and running around the world, some early momentum was quickly established. The first satRday, organized by Gergely Daróczi, was held in Budapest, Hungary on the 3rd September 2016, with a little under 200 people from 19 countries in attendance. There were presentations from 25 speakers, with keynotes from Gabor Csardi and Jeroen Ooms, and the event was a huge success. You can find out more about the event at the conference website or by reading some of the follow up posts on the conference here and here. The second satRday, organized by Andrew Collier was held in Cape Town on the 18th February 2017. Just over 200 tickets were issued for the event and again the venue was packed. With 29 speakers in total, including keynote speakers Jenny Bryan, Julia Silge and Steph Locke, the schedule was also packed - with tutorials, standard and lightning talks, along with a visualization competition to round out the day’s proceedings. There were also three workshops held by the keynote speakers on the two days prior to the conference which were very well attended. Again, there is plenty more about the event at the conference website, and in the follow up posts here and here. A second satRday in Cape Town is now planned for 17 March 2018, with two exciting keynote speakers: Maëlle Salmon and Stephanie Kovalchik. They will be presenting workshops on R package development and sports analytics with R on the day prior to the conference. Next year’s conference will be held at a bigger venue on the campus of the University of Cape Town in order to accommodate more attendees. Tickets are on sale and the Call for Papers is open. More details for the event can be found at the conference website. Another exciting development for next year’s conference is that plans are being put in place to run a Data Carpentry workshop before the event. Software and Data Carpentry have been involved in enabling and developing R capacity in Africa since 2015. Two-day workshops teach researchers and postgraduate students fundamental skills to assist with better research software development and data analysis. The Carpentries are global non-profit volunteer organisations that not only teach tools like R, Python, SQL, and Git/GitHub, but also run a two-day instructor training workshop to help technical experts teach more effectively. More than 30 Carpentry workshops have been run in African countries over the past three years, including South Africa, Mauritius, Ghana, Gabon, Namibia, Botswana, Kenya, and Ethiopia . These workshops have reached hundreds of researchers and students who mostly had limited or no prior exposure to programming. If you’d like to get involved in Carpentries in Africa, please join our Google Group or request a workshop by completing this form. Read More ›

Jessica Upani: Nomination for 2018 Steering Committee
Jessica Upani / 2017-12-01
2018 Election : Jessica Upani Who am I? My name is Jessica Upani. I am Namibian and I am a high school teacher. I am also a 4th year Mathematics and Computer Science undergraduate student at the University of Namibia. I am an executive of the Python Namibia society. We run Python workshops, meetups and Conferences all over the country. Previous involvement. My involvement with the carpentries started in April 2016 when I was one the instructors that were trained by Aleksandra Pawlik in Potchefstroom, South Africa. I then continued with my training and became a certified Software Carpentry instructor on 7th July 2016. My final checkout was done by Greg Wilson. He gave me my qualification. I did not do this alone. I was mentored by David Perez-Suarez. Since then, I hosted and taught two workshops locally. We paired our first workshop with Bertie Seyffert from South Africa. I also had a chance to meet the new instructors that were trained in Cape Town, May 2017. I am currently training to be an instructor of instructors and I will be maintaining the Python lesson in 2018. Very exciting! What can I offer? I am a teacher, in every sense of the word. I enjoy doing it and the Carpentries gives me one more opportunity to give back to my community. Our community in Namibia is small and the Carpentries give me an opportunity to see a change for the better in that regard. I enjoy making life for others better and I enjoy sharing what I know with others. Through the Carpentries, I get to do just that. I have helped several scientists conduct their research with ease, and I intend to help many more. The way forward. The have noticed a lot of growth in the Carpentries over the past two years (especially in Africa) and it is all very exciting. And now with the merge, I think that that is a great move and I would like to help get work done. I would like to help reach out to more communities both locally and remotely. One of the strongest points of the Carpentries is bringing together people from all disciplines. I have never seen any other platform that does such a thing. Which is more of the reason why I want to be more involved in this community. Read More ›

OpenCon in Berlin - Impressions
Belinda Weaver / 2017-11-30
Along with fellow Software Carpenters Rayna Harris and Paula Martinez, I attended OpenCon 2017 held over the weekend of 11-13 November, 2017 in Berlin. The conference was held in the Harnack Haus in Dahlem, the home of the Max Planck Society, where the friendly ghosts of Einstein, Heisenberg and other stellar scientists smiled on our endeavours to promote open access, open education and open data. This was a conference with a difference. Most conference goers were very new to this area of work so there was a strong learning aspect to all that unfolded over the three days. Many of the speakers had eye-opening stories to tell about education’s role in transforming lives, whether it be Kholoud Ajarma’s experiences of growing up in a Palestinian refugee camp or Aliya Ibraimova’s work with remote grazing communities in the Kyrgyz mountains. While the largest cohort (50) were from the US, 47 different countries were represented at OpenCon. Of the 186 listed in the attendance sheet, 132 attendees had GitHub accounts and even more used Twitter (160). Sessions were a mixture of plenary sessions and small group work. As an early icebreaker, we were put into groups called Story Circles, in which everyone had eight (uninterrupted) minutes to explain what had led them to apply for and attend OpenCon. The sheer diversity of backgrounds and experiences unearthed by this kind of session was astounding. Hearing Thomas Mboa describe teaching Nigerian students without having access to electricity certainly put some of my own workshop issues into perspective. Another eye-opener was the Diversity and Inclusion panel where uncomfortable questions about ‘whose knowledge?’, ‘who has access?’, and ‘who is missing from the discussion?’ put paid to the idea that ‘open’ is a universal, unquestionable good. Speakers from the global south stressed that making knowledge open can seem like a replay of having that knowledge stolen from them during the colonial period. And if ‘open’ does not welcome people of all genders, sexual orientation, color and other forms of diversity, then how ‘open’ it is really? The quality and clarity of OpenCon recordings mean that these sessions can easily be watched by anyone with an interest in what was said. Footage of the Diversity and Inclusion panel also includes the post-panel discussion. To help build more local action post-conference, people could opt to work with groups from their own region. Since I was the only Australian there, I chose to work with an Asian group, and helped people from Armenia and Taiwan create ‘empathy maps’ to try to understand the concerns of researchers in their region who might want to work ‘open’ but who face formidable barriers, not least the kinds of behaviours outlined by Laurent Gatto’s and Corina Logan’s ‘Bullied by Bad Science’ campaign. The final day of OpenCon was a Do-a-Thon - what I would call a sprint or hackathon. For this day, Rayna and Paula marshalled a team from Chile, Argentina and other Spanish-speaking countries to work on the Spanish translations of Carpentry lessons. This was certainly a one-of-a-kind conference and for those who missed it, session recordings are available online, courtesy of the Right to Research Coalition. The conference was phenomenally well-organised, with terrific food, and people could opt to join Dine-Arounds to ensure that no one had to eat dinner all alone in a strange city. I was very interested in the organization of the conference as I was hoping to get many tips I could use to make next year’s CarpentryCon in Dublin a similar success. The conference’s leading sponsor was the Max Planck Gesellschaft (Max Planck Society), and the conference was jointly organised by SPARC (the Scholarly Publishing and Academic Resources Coalition, and the Right to Research Coalition. A number of other organisations and foundations were supporting sponsors. A floor tile at Harnack Haus was inset with Einstein’s signature - you don’t see that every day. Read More ›

Community Building Catchup
Jonah Duckles, Belinda Weaver / 2017-11-30
The last Carpentries’ Champions call led by Jonah Duckles was well attended, with representatives from Australia, South Africa, Ethiopia, the US, the UK, New Zealand and Norway. After the introductions and a review of the last call, attendees were divided up into four breakout rooms to discuss (while keeping the following three questions in mind) an event that promoted a sense of excitement about new research tools. What made the event engaging and exciting? What could have made it even better? What didn’t you like? Room 1 pluses were the variety of tools taught, the great diversity both in cultures and careers of the people attending, and ample opportunities for networking. One caveat was the mixed positive and not so positive effects of following an ‘unconference’ structure. Room 2 looked at workshops and what makes them run well or not so well, and also talked about study groups, with this useful post from Sarah Stevens providing useful advice on issues around setting one up. Room 3 listed some ‘must haves’ for an event to work well - a feeling of ‘OK’ culture, covering everything from coffee to atmosphere, and the attitude of host and participants plus having a visible result right away and/or some kind of immediate collective action. The question of whether domain-specific or more generalist groups worked better was brought back for discussion in the wider group. Room 4 reported that some people don’t like to engage with smaller groups and therefore we need ways to encourage them to open up. This might involve asking them a specific question to encourage their feedback or giving them ‘permission’ to explicitly contribute, since not everyone will just ‘jump in’. Reasons people do not / will not participate Group too large - people feel intimidated Group has no common background or interest so group feels unwelcoming People’s worries about seeming ‘stupid’ Fixes? Attendees put questions in a hat - they get the questions they want answered Mix experienced and new instructors as a way of mentoring Do something concrete and get an immediate outcome Put people in pairs to work together Make sure you have advanced tasks for people in mixed level workshops Recruit from people who teach at events like ResBaz to invite them to be Carpentries instructors Plan activities that allow some attendees to apply more advanced / more interesting skills Cautionary tales Always have a backup plan in case wifi is not working or available. Be mindful of different people’s learning styles - this can be overlooked in a group setting. Have a clear statement of workshop goals and objectives - very important when levels of knowledge are mixed. Feeback on using smaller groups in breakout rooms for these calls: Loved it! It was really nice to talk in a small group. It can be overwhelming to be on a big call, nice idea Less intimidating to share stories in a small group - +1 from me Worked very well! Nice way to be able to discuss and having a conversation without worrying about a large group getting a chance to talk +1 to all the above! Further resources Amanda Miotto’s HackyHour is an informal drop-in session at Griffith University in Brisbane. Held usually in a coffee shop or bar, HackyHours are gatherings where people can ask research and research-IT-related questions in a safe environment. Other similar outreach events are PhTea, Programming & Pizza, or study groups. Use whatever name works best in your setting and consider inviting 5-minute talks to get the meetup started if people don’t bring any problems that week. Amanda has now compiled a HackyHour Handbook. Bianca Peterson reported on Study Groups at North West University in South Africa. These took off in 2016. People enrolled in modules on Coursera as a group, and met for two hours every week to work on things together. The groups functioned well as post-workshop support after Software and Data Carpentry workshops since they reinforced workshop learning. The Mozilla Study Group Handbook is a helpful tool for people who want to start a study group and this guide helps you decide what is the best type of event for you to build your local community. Next steps If you’re interested in participating in the Carpentries champion community and our quarterly meetings, you can join our announcement mailing list. We’ll be holding our next meeting in February and will be focusing on a draft Carpentry Community Building playbook. This will be a document that serves as a guide to help community champions to develop their own local communities, and will share tips, tricks and ideas about how to build local community. To get involved, get on the list! Favorite events At the beginning of the call, as an icebreaker activity, we asked them to describe the most amazing event/conference/workshop/meetup they had ever attended. The following were singled out: Kiwi foo, an ‘unconference’ bringing together creatives, government, policy wonks and technologists to think about making a better world. Bayesian Models in Ecology Workshop at SESYNC, a great combination of lecture/theory and working in small groups, with an emphasis on peer learning. Fosdem, the Free and Open Source Developers European Meeting, run by volunteers, with a wide range of topics. While it had a very Self-Organised vibe, everything pretty much worked. useR 2017, a first time of connecting with the R community in person. Open, welcoming, attendees of diverse backgrounds, amazing food - I learned so much! Midwest Data Librarian Symposium, a form of un-conference, but in a active learning and participating environment. CODATA-RDA Research Data Science Summer School, Trieste, Italy - Most of the instructors are from the Carpentries and the quality of lessons is great - a variety of tools/skills taught over 2 weeks (from Unix to Machine Learning in R). Read More ›

People's Favorite Tools
Belinda Weaver / 2017-11-29
A big thank you to everyone who has responded so far to the call for short posts about their favourite tools. So far we have had Paula Martinez on R, with Bianca Peterson enthusiastically seconding, Jeff Oliver sharing his love of Git and GitHub, Kellie Ottoboni talking up IPython, and Thomas Arildsen on how the Jupyter Notebook facilitates his teaching. Juliane Schneider weighed in on the wonders of OpenRefine. Clifton Franklund likes RStudio, while Francesco Montanari is a fan of emacs. Rayna Harris nominated videoconferencing as her most useful research tool, while Greg Wilson talks up the benefits of asking for help. Robert Sare has posted on the benefits of using rasterio in earth sciences research. Expect more posts as people contribute further favourites. Even if your tool has already been mentioned, we would still welcome a post about it, as your use of the tool may be different. Have you got a favourite tool you would like to tell us about? Please use this form to add a bit of detail and we will do the rest. You can read the background to these posts here. Read More ›

My Favorite Tool - Asking for Help
Greg Wilson / 2017-11-28
My favorite tool is asking for help. You may not think of it as a tool, but it’s something I use frequently to solve a wide range of problems, so I think it qualifies. Whether it’s saying, “I don’t know what to do next – does anyone have any ideas?” when teaching, or cold-calling people to see if they’ll host a workshop, asking for help has probably done more for me than Emacs and version control combined. Have you got a favourite tool you would like to tell us about? Please use this form to add a bit of detail and we will do the rest. You can read the background to these posts here, or see what other tools people have written about. Read More ›

Our Steering Committee Candidates
Belinda Weaver / 2017-11-26
Eight people have nominated to serve on the 2018 Steering Committee of the new, merged Carpentries. The nominees so far are: Samantha Ahern (UK) Martin Callaghan (UK) Auriel Fournier (US) Amy Hodge (US) Lex Nederbragt (Norway) Raniere Silva (Brazil, but currently in the UK) Juan Steyn (South Africa) Jessica Upani (Namibia) Elizabeth Wickes (US) There is still time to put your name forward. Nominations will close on 1 December. If you are not sure whether you are eligible to stand for election, or if you can vote in the election, please check out this blog post which has all the logistics. This was edited on December 1, 2017 to reflect the complete list of candidates. Read More ›

Call for Applicants: Mentoring Subcommittee Co-Chair
Marian Schmidt, Jamie Hadwin / 2017-11-21
Dear Carpentries Community, How would you like to work first-hand on developing our ever growing instructor pool? A Mentoring Subcommittee Co-Chair position has just opened! The leader serving in this year-long position (December 2017- December 2018) will share duties with the co-chair of the mentoring subcommittee, with a workload averaging 1.5 hours/week. The expectations of the mentoring co-chairs are to host the monthly mentoring subcommittee meetings, facilitate opportunities for building connections across our community to better serve our instructors, and to manage the weekly instructor discussion sessions hosted on this etherpad. If you would like to be considered, please fill out this short form by December 4th. The current mentoring subcommittee co-chairs will select a new co-chair from the applicant pool, with input from the Carpentries staff liaison, and other members of the Carpentries community. Please be welcome to email Marian at marschmi AT umich.edu with any questions regarding the position. Sincerely, The Mentoring Co-Chairs, Jamie & Marian Apply Here: https://goo.gl/forms/aSzm8Gg7Y4tIboWy2 Expectations: Host one of the two monthly mentoring committee meetings (The co-chair will host the other.) One debriefing meeting per month with co-chair (usually soon after monthly meeting). An average of ~1.5 hours per week (usually ~2 hours/week during the week of hosting the mentoring committee meetings). When possible, help host or co-host instructor discussions. Communicate challenges and opportunities with Carpentries staff liaison. Timeline: 1 year position December 2017 - December 2018 Mentored by both co-chairs of the subcommittee (Jamie & Marian) for the first month. Until February 2018, work closely with other co-chair (Marian). In February, there will be another co-chair application. Benefits: Collaborate with dedicated members of our community to better help mentor and create a network within our amazing community of instructor. Assist with improving and creating new resources for our instructor and mentoring communities. Meet many other amazing Software and Data Carpentry instructors and instructors-in-training. Gain and practice skills in community organization. Learn more about how the Carpentries function as an organization. Read More ›

Samantha Ahern: Steering Committee Nomination
Samantha Ahern / 2017-11-20
Samantha Ahern: Steering Committee Nomination How am I involved with the Carpentries? I have been involved with Software Carpentry for four years. I participated in a Bootcamp back in 2013 and since then have been a helper for a number of Bootcamps and have taught in three in total - two in the last year. I am a very newly qualified Instructor, literally completing the sign-off tasks in the last few weeks, but have been involved in Education and this type of learning in particular for a long time. Who am I? I am an Educator and Digital Social Scientist, passionate about accessibility - this was noted in the feedback for my HEA Fellowship application. My Fellowship portfolio can be viewed here. I have produced software for my own research and taught others how to do so. I am member of the Digital Education team at University College London focusing on learner/learning analytics but also work with Research IT Services on learning design and online learning creation. Learning analytics refers to the measurement, collection, analysis and reporting of data about the progress of learners and the contexts in which learning takes place. I have been identifying potential data sources for a learner analytics project and identifying what the issues may be with the data re: matching across systems and some exploratory analysis of relationships between these sources and student success. I am also looking at how data can be used to review learning designs and /or their pedagogic intent. Additionally, I am in the current cohort of the Mozilla Open Leaders Program, and am trying to build a community for the Digital Literacy Playground project, an Open Education Resource (OER) aimed at 16-24 year olds. The playground zone Data and the Academy will focus on research data management, sharing data and using open data. This overlaps with some of the learning focus in Data Carpentry. The Mozilla Open Leaders Program is a 12-week program where you develop skills in building a community around and leading open projects. What can I offer? A different perspective. Although my academic background is computing (BSc Comp Sci, MSc Intelligent Systems) I am first and foremost an educator, I can bring a wealth of pedagogic knowledge and learning design expertise from across sectors to the committee. This has included teaching ICT at secondary schools for eight and a half years from Year 7 students to Year 13 A-Level. I was an IT Trainer for two years and was the Senior Information Security Officer - Awareness for one year. In addition to being a qualified teacher, PGCE Secondary ICT, I also hold PGDip ICT in Education. This would provide a learning/learner focus to decision making. It is also important that our materials are accessible to all who may need to benefit from them: to paraphrase Maha Bali - We do not want to be giving apples to people with no teeth. I am able to advise on accessible design and testing of tools for the creation of learning materials. Read More ›

Work first-hand on developing our ever growing instructor community
Marian Schmidt, Jamie Hadwin / 2017-11-20
How would you like to work first-hand on developing our ever growing instructor community? A Mentoring Subcommittee Co-Chair position has just opened! The leader serving in this year-long position (December 2017- December 2018) will share duties with the co-chair of the mentoring subcommittee, with a commitment averaging 1.5 hours/week. The expectations of the mentoring subcommittee co-chairs are to host the monthly mentoring subcommittee meetings, facilitate opportunities for building connections across our community to better serve our instructors, and to manage the weekly instructor discussion sessions hosted on this etherpad. If you would like to be considered, please fill out this short google form by December 4th. The current mentoring subcommittee co-chairs will select a new co-chair from the applicants with input from the Carpentries staff liaison, and other members of the Carpentries community. Apply Here Expectations: Host one of the two monthly mentoring subcommittee meetings (The co-chair will host the other.) Attend one debriefing meeting per month with the co-chair (usually soon after the monthly meeting). Commit to an average of 1.5 hours per week (usually ~2 hours/week during the week of hosting the mentoring subcommittee meetings). When possible, help host or co-host instructor discussions. Communicate challenges and opportunities with Carpentries staff liaison. Transition Timeline: 1 year position (December 2017 - December 2018) Mentored by both co-chairs of the subcommittee (Jamie & Marian) for the first month. Until February 2018, work closely with the co-chair (Marian). Benefits: Collaborate with dedicated members of our community to mentor instructors and create a strong network. Assist with improving and creating new resources for our instructor and mentoring communities. Meet many other amazing Software and Data Carpentry instructors and instructors-in-training. Gain and practice skills in community organization. Learn more about how the Carpentries function as an organization. Read More ›

Amy Hodge: Nomination for 2018 Steering Committee
Amy Hodge / 2017-11-20
2018 Election : Amy Hodge A Little About Me Hi all! My name is Amy Hodge, and I work for Stanford University Libraries. I have a PhD from Yale in Molecular Biophysics and Biochemistry, and spent about 10 years working at science database companies, where I discovered the elegance of SQL and the perfect role for me in enabling people to do better science. Since my undergraduate days I’ve been involved in teaching, mentoring, and software training, and my involvement with the Carpentries is a natural extension of these activities and my desire to enable people to do better science. My Involvement with the Carpentries I got my start with the Carpentries by hosting a Software Carpentry workshop in January 2014. The response from learners was so positive that I hosted three more workshops that summer. The next January I became a certified instructor so that I could increase our capacity to offer these workshops on campus. After teaching my first Data Carpentry workshop at Stanford in April 2015, I quickly realized teaching and hosting simultaneously was not very practical, so I have been teaching versions of the SQL and OpenRefine lessons as stand-alone workshops as my schedule allows. In late 2015 and early 2016, I organized and helped at one workshop for the Libraries and served as a helper at two other campus workshops. From July 2016 to June 2017, I served as an advisor to a Stanford professor who had been awarded an NIH training grant addendum to develop curriculum for reproducibility and rigor. She was incorporating the Carpentries as a major component and sought my advice on doing this. I helped with one workshop, planned/hosted an instructor training, and hosted/helped at another workshop for postdocs. This past summer I helped at an instructor training at Davis and earned my certification as an Instructor Trainer. I also successfully campaigned for Stanford Libraries to sign on as a Carpentries Partner, for which I am the contact. I recently taught at Data Carpentry workshops at two universities in South Africa as well as my first instructor training. Carpentries Future and Growth As someone who has been working on her own to provide these workshops at my institution, I’m interested in how we build local communities. How can we draw in more campus organizations to fund these workshops? How can we get more people interested in contributing once they become instructors? How can we get study groups going after the workshops are over? My recent experiences with teaching in Africa have taught me much about how workshops are run at different institutions and for different groups of learners. I’d like to see us implement more ways that instructors and hosts can learn from each other about how to have a successful workshop. I’m interested in becoming more involved in lesson development and maintenance and discussing ways to get more of our instructors involved in these activities as well. I’d like to see the Data Carpentry curriculum in particular provide more options for the use of data sets in different fields, because I can see the benefits to the learners of having materials that are more accessible for them. Read More ›

Raniere Silva: Nomination for 2018 Steering Committee
Raniere Silva / 2017-11-16
Who am I? My name is Raniere Silva. I’m a Brazilian who currently lives in Manchester, UK, where I work for the Software Sustainability Institute as a Community Officer and help run the Institute’s Fellowship Programme. The programme supports, among other activities, Carpentry workshops and this incredible community in the UK and beyond. Previous Involvement I discovered Software Carpentry in 2013 after reading this blog post, being redirected to it by this announcement. Months later, I joined the 6th round of instructor training along side Leszek Tarkowski, Michael Crusoe, Christina Koch, Mark Laufersweiler, Jonah Duckles and many other fabulous instructors that I would only meet in person years later. In 2014, Fernando Mayer and I taught the first Software Carpentry workshop in Brazil and South America. Later that year, I had an opportunity to teach more Software Carpentry workshops in Brazil with some international visitors: Alex Viana and Diego Barneche. In 2015 and 2016 I served 2 years at the Software Carpentry Steering Committee. During that time, I got especially involved with the mentoring activities, now being lead by Marian Schmidt and Jamie Hadwin, and pushed, in some form, the culture to have more than one session to help volunteers to attend those independent of where they live and their personal commitments. At the end of 2016, I decided that was time to let others contribute to the project as members of the Steering Committee. Kate Hertweck contacted me asking to help with the lesson styles and most of my contributions to the project this year has been on the technical side. Last week, a friend delivered a Carpentry-inspired workshop in Brazil with a very positive feedback from the learners and with requests for more similar events. I was part of the Steering Committee of that workshop and it motivated me to apply to serve on the Steering Committee once again. Strengths In the five years that I have been involved with the Carpentries, the top 3 strengths of the Carpentry programme that I see are: The Carpentries’ name/brand, which is now recognised in many places worldwide. Transparency in all Carpentries’ activities, which is very hard to do but is always on the priority list. Community, that is passionate about the project and sharing their experiences and lessons learnt on a daily basis (for this week examples, see this email and this pull request). Weaknesses Of course, there is always room for improvement. My top 3 items for us to be careful in the next 2 years is: Procedures, which need to be documented in a public place, as they are extremely important for the on-boarding process of new members of staff and community, but registering this knowledge takes time and in some cases slides down on our priority list. Communication, with the recent increase on numbers of our staff and volunteers we need to share the information with more people on various channels and still keep the number of emails, chat messages, meetings to a minimum, which is almost contradictory, we need to review our communication strategy. Diversity, we are very proud to be an inclusive community but, as Greg Wilson mentioned here, we need to review our definitions and work to improve our inclusivity numbers. Two Years From Now My metrics of success for the Carpentries in two years time are: At least 5 teams of staff located in at least the following time zones: UTC-8, UTC-5/UTC-3, UTC+1, UTC+8 and UTC+12 to secure easier reach and support for our worldwide community, 30% of partners from the United States and Canada, 30% from Europe, 30% from Oceania, East Asia and Southeast Asia and 10% from other places. I would love to see more partners from South America, Africa and Asia but I know is unrealistic at the moment. Clarification of the ways in which volunteers could engage with the community. Although teaching workshops will continue to be the primary way to contribute followed by lesson development, volunteers can contribute in other forms but they need a clear path to do so. Conduct two studies, one in 2018 and another in 2019, about the retention rate of instructors and other volunteers without duplicating the work being done by the Community Health Analytics Open Source Software (CHAOSS) project. A repeated CarpentryCon in 2019 (the one for 2018 is already planned). Conclusion Thanks very much for your consideration to have me again as a member of the Steering Committee. If I’m elected, I will put community, diversity and sustainability as top priorities. I’m happy to answer any questions by email or your preferred form of contact (see here how to reach me). Read More ›

Martin Callaghan: Nomination for 2018 Steering Committee
Martin Callaghan / 2017-11-16
2018 Election: Martin Callaghan So, who are you? I’m Martin Callaghan and I’m a Research Computing Consultant (part research software engineer, part trainer, part consultant and part outreach) at the University of Leeds in the UK, where I provide programming and software development consultancy across a diverse research community, including the Arts and Social Sciences, for Cloud and High Performance Computing. It’s an exciting job where I get to work with researchers to understand their research questions and support them in developing and selecting computational platforms, tools and applications to help them answer those questions. Tell me how you’ve been involved with the Carpentries? I discovered the Carpentries in late 2013, just after I joined Leeds. I became an instructor and since then I’ve taught on around 20 Software Carpentry, Data Carpentry and Library Carpentry workshops at Leeds and elsewhere. I’ve had the privilege to work with many of you over the past few years and I want to continue to enable and support you. I’ve been a co-applicant and lead instructor on four successful grant awards to run bespoke three-day Software Carpentry workshops to support PhD students and early career researchers to improve their programming skills. We’ve been an organisational member at Leeds almost from when they became available. Watching the Carpentries grow and develop has been a great opportunity and it’s been amazing to have been a part of this. Recently, I’ve become an instructor trainer. It’s been a great experience to work with two groups of prospective trainers and hear from them how excited they are about becoming trainers and running their own workshops. How would you help develop the Carpentries if elected? Constituency building: I would like to help develop further the organisational membership model. I’ve seen at first hand how the Carpentries teaching model and the lesson materials have made a real difference to people’s research. I would like to extend this impact to smaller and less research focussed institutions so that they are able to benefit from the Carpentries approach. Community building: Over the past few years, I have seen both Software Carpentry and Data Carpentry evolve and now move into the future as a merged organisation. We’ve seen fantastic developments with Library Carpentry and now HPC Carpentry. I want to support and encourage other research communities to work with the Carpentries family. I’m particularly interested in exploring how we can use the Carpentries approach in undergraduate teaching, Business Schools and in the Fine, Applied and Performing Arts. Serving our diverse communities: I’ve seen how the Carpentries have been a force for good. There’s been some amazing work in developing Carpentries in Africa and South America and I want to continue to support this work and help to reach out to new communities both locally and further afield. Continuous pedagogical improvement: In a previous career, I was a high school teacher and teacher trainer. I know the importance of good teaching & learning and rigorous evaluation. If elected, I want to make sure that our ever improving teaching model stays at the heart of the Carpentries approach. Read More ›

Lex Nederbragt: Nomination for 2018 Steering Committee
Lex Nederbragt / 2017-11-14
2018 Election: Nomination by Lex Nederbragt Who am I? My name is Lex Nederbragt. I am a bioinformatician (senior engineer) at the Institute of Biosciences, University of Oslo, Norway. I also have a 20% associate professor position at the Institute of Informatics at the University of Oslo. My research and teaching involve genomics, bioinformatics and programming for biologists. Previous involvement with The Carpentries I hosted the first Software Carpentry workshop at our university in 2012 (we were very fortunate to have Greg Wilson, one of the founders of Software Carpentry, teach the workshop himself!). I became a Software Carpentry instructor in 2013 and a Data Carpentry instructor in 2016. I have taught at numerous Software and Data Carpentry workshops, both in Oslo, in Norway and abroad. In 2016, I also became an Instructor Trainer and I have co-taught a couple of Instructor Training workshops since then. Together with Karen Lagesen, I started a ‘Carpentry’ initiative at the University of Oslo which now has grown to a dozen local instructors, and another dozen helpers, and many workshops, some full two-day, standard Software or Data Carpentry workshops, but increasingly half to one-day workshops teaching just one lesson of the Carpentries material. I have contributed to the unix, git, python and make lesson, and this year, I became one of the maintainers of the Data Carpentry ‘Wrangling Genomics Data’ lesson. Finally, I am very proud of having been one of the authors of the ‘Good Enough Practices in Scientific Computing’ paper that was published earlier this year. What I would do as a member of the Steering Committee to contribute to the growth and success of the community If elected, I would focus on these areas: communication: I am a strong believer in transparency and I would strive for members of the new organisation to have easy access to relevant information on what is going on ‘behind the scenes’, both directly (spreading information on the blog/email lists) and by making documents easily available online feeling of belonging: a community is stronger if more people feel they belong to it, and although I have no clear answers, I’d like to pose the question as to how we can ensure all our members feel at home in the new organisation branding: we need to find a strategy on how to ‘sell’ the new organisation (and by what name), or whether it is better to keep the focus on the existing brands of “Software Carpentry’ and “Data Carpentry” lesson material: Software Carpentries lessons are often more polished than the Data Carpentry ones. There is currently a big push to improve the Data Carpentry lessons and release new versions of them, and I would like to see how we as an organisation can further these developments. Additionally, I think the time is ripe to consider starting (or reviving) the development of intermediate Software Carpentry lesson materials - for those with enough experience using what they learned in a workshop to take their skills to the next level teaching beyond the Carpentries: many instructors, not least myself, use what they have learned and experienced through the Carpentries to further develop their own teaching in university courses or elsewhere. I would be interested in trying to build a community of Carpentry-inspired teachers, teachers interested in sharing what they have learned with regard to teaching (under)graduate courses Software and Data Carpentry have been an incredible force for many people’s careers, including mine, and it is very rewarding and satisfying to be able to give back to the community. Given the opportunity, I am looking forward to helping shape the new merged organisation in 2018 as a member of the Steering Committee. Read More ›

Nominating for 2018 Steering Committee
Belinda Weaver / 2017-11-13
Three nominations have come in so far for the new 2018 Steering Committee of the merged Carpentries. The nominations received are from Juan Steyn (South Africa), Auriel Fournier (US) and Elizabeth Wickes (US). There is still time to put yours in - nominations close on 1 December. Here is all the information you need to nominate. There will be a Meet the Candidates opportunity on our upcoming community calls on 16/17 November (date and time will vary, depending on time zones). Call 1: November 16 2pm UTC see local date and time in your zone. Call 2: November 16 11pm UTC see local date and time in your zone. Don’t miss out! Read More ›

My Favourite Tool - Videoconferencing
Rayna Harris / 2017-11-13
My favorite tool: Videoconferencing I know most of the blog posts so far in this series have been about the tools people use to enable them to conduct research. However, I feel quite strongly that the video conference systems that allow me to speak to other people have had the most profound impact on my development as a researcher and educator over the past three years. Google Hangouts was the tool of choice for the Software Carpentry Mentoring Subcommittee when we started hosting post-workshop debriefing sessions in 2015. These virtual meetings gave instructors around the globe the opportunity to share their challenges and successes from recent workshops. Even though the discussion focus was on teaching, I always learned something new about the tools we teach - R, Python, SQL, UNIX, and Git/GitHub - through these discussions. BlueJeans was the tool of choice for monthly meetings when I joined the Software Carpentry Steering Committee in 2016. As the most junior person on the Steering Committee, I felt that it was a great privilege to be a part of a global group of people trying to make the world a better place through best practices for computing and teaching. It wasn’t always easy to pick a time of day that accommodated the timezones of all committee members, but we made it work. Finally, Zoom has given me the capacity to spread Carpentry teaching practices to Latin America without leaving the comfort of my living room! Okay, of course, I would prefer to actually travel to teach in person, but that isn’t always logistically or economically feasible. Additionally, I’ve been hosting bi-lingual teaching demo sessions where these instructors can practise teaching with live coding in their native language. It is a beautiful thing to listen to someone teach Python, R, UNIX, Git, or UNIX while speaking in a foreign language. In summary, those are just a few of the reasons why my favorite tools are video conferencing systems that connect me to like-minded people around the globe so that I can learn more about technology or practices that I can use in my research and teaching. – Rayna Harris, Scientist & Educator / Graduate Student / Behavioral Neuroscience and Genomics, Austin TX Have you got a favourite tool you would like to tell us about? Please use this form to add a bit of detail and we will do the rest. You can read the background to these posts here, or see what other tools people have written about. Read More ›

Elizabeth Wickes: Nomination for 2018 Steering Committee
Elizabeth Wickes / 2017-11-12
2018 Election: Nomination by Elizabeth Wickes Who am I? I’m Elizabeth Wickes and I work as a Lecturer with the School of Information Sciences at the University of Illinois at Urbana-Champaign. I began this position in mid-2017, and previously worked as a Data Curation Specialist for the University Library at UIUC for 2 years. I have my MSLIS, and worked for 5 years at Wolfram|Alpha before beginning my master’s program. While I enjoy teaching and working with a variety of domains, I am primarily experienced in teaching for digital humanities researchers, librarians, and other non-STEM fields. I constantly challenge myself to make computational and digital tool training interesting, accessible, and valuable to all research domains. My Previous Involvement I have been active within the Carpentries since 2015 when I went through my instructor training, completing my certification in Summer 2015. Since then I have been active with our local workshop runs, as an instructor for 7 and a helper for several more. I have also been a lead instructor at several digital humanities workshops that have remixed Carpentries materials, and I have adapted several of the core SWC lessons for use in my classes. In January 2017 I had the opportunity to work for several days at BIDS as part of a reproducible research in Jupyter hackathon. I was excited to meet many of my fellow Carpentry members in person and have a chance to learn from them in a hackathon environment. During summer of 2017 I completed my Carpentry Training certification in August this year. This training was a fantastic opportunity to revisit the content of my own training and reflect on how that has influenced my teaching. Looking Forward UIUC’s local Carpentries community has grown over the previous two years, allowing my role to shift from instructing 2-3 times a semester to being a mentor to new instructors and focusing my instruction efforts on Instruction Training events, both remote and local. Our community’s strength is in the diversity of roles, domains, and backgrounds of our instructors. The strength of these combined factors shines in our lesson maintenance and creation, consistent growth in learners and instructors, and how everyone can find a home within the work we do. Working with the 2017 global Library Carpentry sprint was a powerful experience to observe our core values in action. A community can accomplish so much when newcomers are supported by all and many voices come together toward a common goal. I am keen to continue expanding our training opportunities into research domains that have a clear need for digital research training, but don’t have it as part of their standard educational pathway. My hopes for becoming a board member are to continue representing non-STEM domains, working with the assessment group to understand what we are doing well and where we are needed most, and helping the joint mission of Software and Data Carpentry find its voice as a merged group. Our community is a powerful one, and together we can continue to accomplish great things! Read More ›

CAB-Alliance Bioinformatics Workshop in Franceville, Gabon
Nicola Anthony, Katy Morgan, Courtney Miller, Anelda van der Walt / 2017-11-09
Over the past five years, researchers from the University of New Orleans (UNO) in the USA and the Université des Sciences et Techniques de Masuku (USTM), Franceville, Gabon and University of Buea (UB) - have been working closely together as part of a larger collaborative initiative to map patterns of genomic and phenotypic variation in rainforest species across central Africa. This international partnership known as the Central African Biodiversity (CAB) Alliance is made up of a number of institutions in the US, Africa and Europe and is primarily funded through the National Science Foundation’s Partnership in International Research and Education program. The project’s main goal is to identify areas across central Africa where turnover in genomic and phenotypic diversity in rainforest species is greatest since these are the areas where we expect the capacity for rainforest species to adapt to climate change to be the greatest. Working with project partners at the University of California Los Angeles and Drexel University, the group began using next generation sequencing (NGS) to understand some of the environmental drivers of genomic variation within species and how these relationships might change under future climate projections. Once the data analysis workflow was finalized, researchers wanted to find a way for all project partner organizations to engage in the processing and analysis of the genomic data together. The natural next step was to run a bioinformatics workshop introducing NGS technology and data analysis to researchers and students at USTM as well as the scientists at the Research Institute for Tropical Ecology (IRET) and the International Centre for Medical Research in Franceville (CIRMF). The datasets that were used came from two of the three focal species that the team had based their work on: namely the African puddle frog Phrynobatrachus auritus and the soft-furred mouse Praomys missonei. The UNO team (Nicola Anthony, Katy Morgan, Courtney Miller) along with colleagues at USTM (Patrick Mickala, Stephan Ntie, Jean-Francois Mboumba) and UB (Eric Fokam, Geraud Tasse) set out to develop a week long course that would introduce students to working in a linux environment and aid in building regional capacity in bioinformatics. In March 2017 Jason Williams introduced Nicola Anthony to the South African Carpentry community to share some lessons learned about running computing workshops in Africa. During preliminary conversations we introduced the UNO/USTM teams to principles of the Software and Data Carpentry workshops such as live coding, sticky notes for getting help or indicating progress, and a variety of other practical things. The workshop ran from 3 - 8 July 2017 and it was a great privilege that one of our South African instructors, Samar Elsheikh, was able to join the team in Franceville for the entire event. Topics that we covered included: data organization and spreadsheets, working in the command line, data visualization in R, next generation sequencing methods, processing restriction site associated DNA (RAD) sequencing, detecting loci under selection and geospatial modeling of genomic variation. A more detailed copy of the program is available here: . There were approximately 25 participants including both instructors and students. Participants were a mixture of faculty, research scientists and graduate students from several institutions in Gabon (USTM, Centre National de Recherche Scientifique et Technologique and the Centre International de Recherche Médicale de Franceville), Cameroon (UB), South Africa (University of Cape Town) and the US (UNO). A survey was distributed to all participants after the workshop ended to get feedback on the structure and content of the workshop itself. Most of the participants used their own laptops, however, ten laptops were provided by CAB-Alliance with software and programs installed within VirtualBox. We also took a tour of the CyVerse platform including genomic data processing and cloud computing in the Discovery Environment and through ATMOSPHERE. Unfortunately, USTM does not currently have the infrastructure for Internet access, so four 4G mobile wifi hotspots were provided. Connectivity was still a challenge, but students were able to access cloud computing sites and other resources online. One or two things that worked really well VirtualBox Application - A virtual machine that included a linux operating system, all of the softwares and programs, as well as all of the data and results files was used to streamline the workshop. Feedback from our participants indicated that, in addition to learning new techniques, taking time for interpretation of data and results was very helpful. One or two things that you would do differently Reducing the amount of material or sections of the workshop would have been beneficial. It is always a challenge to balance presentations and explanations of concepts with hands-on practical exercises so it might be better in the future to reduce the amount of material and any unnecessary presentations. Feedback from our participants indicated that they would have preferred less lecture-based information and more time to practice the new methods they had just learned. Many participants had difficulty creating an account on Cyverse and registering for an account in ATMOSPHERE. Finding a way to greatly simplify the registration process would be very beneficial. What happens next in Gabon? One way to move forward would be to create an on-line learning community for participants to continue working together and developing their UNIX skills. The Moodle site could be used for this but perhaps there are better ways to facilitate continued exchange between instructors and participants. Read More ›

Auriel Fournier: Nomination for 2018 Steering Committee
Auriel Fournier / 2017-11-09
2018 Election: Auriel Fournier I am excited for the opportunity to stand for election for the 2018 Steering Committee of The Carpentries. About Me I am currently a postdoctoral research associate at Mississippi State University, based out of the Coastal Research and Extension Office in Biloxi, MS on the Gulf Coast. I am part of the leadership of a large (>45 partner, 200 person) cooperative network which is using structured decision making to inform decision making about bird conservation in response to the Deepwater Horizon Oil Spill. I received my PhD in Biology from the University of Arkansas in May 2017 and before that received a B.S. in Wildlife Ecology and Management from Michigan Technological University. I’m an avid gardener, crochet/knitter and lover of dogs. Previous Experience With The Carpentries certified Software Carpentry instructor since Spring 2015 certified Data Carpentry instructor since Spring 2016 I’ve hosted one workshop I’ve taught four workshops, with another coming up in December 2017 I’ve been a maintainer for the Data Carpentry R Ecology Lessons for about a year. I started to be involved with the Carpentries as a graduate student, seeking a community that could help me grow in my reproducible science and programming skills, and I have been amazed again and again how this community has welcomed and supported me and I am excited for a chance to take a more formal role in its leadership. Looking forward for The Carpentries I’m excited about The Carpentries merger and the opportunity it brings us. We have a strong community, and a large part of my goal on the board would be continuing community efforts to make our membership and our work more accessible and diverse to everyone. Accessibility would include making our lessons and materials accessible to those with disabilities, including those who are Deaf/Hard of Hearing and Blind. In addition I hope to help make being part of the instructor community more accessible to those without large financial resources to either bring a workshop to their group, or to be a instructor, which often requires being able to front several hundred to a thousand dollars a workshop waiting for reimbursements. This is a large burden on many members of our community, especially those who are students or early career scientists. I want to work to ensure that this is not a barrier to anyone’s participation in our community I am also eager to bring the perspective of a recent student and still early career scientist to the committee, with the hopes of growing the student membership in the community. I know what a powerful force the community support can be for a student, and I want that to be available to all interested students. In short I am passionate about making open reproducible science truly accessible to all. I love The Carpentry community and would work to ensure that the entire membership of that community is heard, respected and included. Read More ›

Juan Steyn: Nomination for 2018 Steering Committee
Juan Steyn / 2017-11-09
2018 Election: Juan Steyn Hi! I’m Juan Steyn, currently employed at the South African Centre for Digital Language Resources as project manager. I’m an enabler at heart and I want to contribute and give back to the African and international Carpentry community. I especially want to help ensure that the core values of the Carpentries get transferred and integrated into how our workshops are conducted within new loci. Africa has A LOT of potential and I would like to play my part to enable and catalyze this potential into action. My involvement with the Carpentries The first Software Carpentry workshop I attended was organised by Anelda van der Walt towards the end of 2015. This workshop provided me with a foundation to try new things when it came to programming and working with data. In 2016, I completed my instructor training and during my instructor checkout demo session the “penny dropped”. During this session at 1 AM in the morning, it was an inspiring experience to see how people connecting from four different continents shared similar approaches and cared about the same thing. At that point, I started to appreciate the potential of the Carpentries’ approach to enable and grow communities from the bottom up. You do not need to be a tenured professor or a manager or anybody “important”. You only need to be yourself and volunteer a bit of your time to make a difference and get a wealth of experience and life lessons in return. During 2017, I got more involved with workshop organisation. I also joined the mentoring sub-committee and completed my Trainer training in September. To date, I’ve been involved with more than seven carpentry instructor events as instructor, host, helper, co-planner, mentor and trainer trainer. I’m also an advocate for the Carpentries approach, especially in growing computation capacity within the commons of Digital Humanities in Southern Africa as well as its potential to be a vehicle to introduce the workshop participant to digital scholarship and e-research. I am also involved in the second round of Carpentry mentorship groups in order to contribute back to the community and further grow initiatives and support structures for Africa. Going forward With the Carpentries expanding into new under-represented environments, cultures and languages, I believe it is important to emphasize the role of mentorship to transfer core values and approaches. In the African context, it is our experience that firstly a context-specific understanding is required to host successful workshops. Secondly, it is important to share knowledge and especially the “ways of doing” through hands-on mentorship. I believe this is essential to successfully expand the Carpentries community further into new territories. As Bruce Becker recently tweeted: “What did we learn from @aneldavdw at #UC2017? Get up and do something. Your ripple might raise a tide.” So let’s get up and start our ripples … Read More ›

Running effective online meetings with Zoom (or Google Hangouts, or ...)
Belinda Weaver / 2017-11-08
Online meetings are a fact of life for most of us in the Carpentries and other distributed projects. So how do we make them as effective as we can? This post came about after the Twitter discussion generated by this tweet from Titus Brown. Here are some points I have thought of to make things work more smoothly. Feel free to comment. Before the meeting: Familiarize yourself with the agenda for the meeting. Are there things you need to prepare or figure out beforehand? Are there things you are wondering about? If there is no agenda, ask the chair to send one out. Always use a headset. You will hear and be heard much better. The headphones from a smartphone are sufficient. Anything is better than using the in-built microphone from your computer as it will pick up a lot of noise and distract other people in the meeting. Prepare for the meeting by testing your audio and video set up well before the meeting starts. Can you hear? Be heard? Be seen? Do you know how to mute yourself? This is important as people can make a lot of distracting noise if they are typing during the meeting. Mute your cell phone in case it rings or pings during the meeting. Move to a quiet place if you can so outside noise does not intrude on others. Please don’t be the person who comes in late, without a headset, and makes a lot of noise from screechy feedback! In the meeting: The chair sets the tone of an online meeting so it is important if you are chairing that you be on time (or, even better, online a little early) and that you are prepared for the meeting ahead. Once people join the meeting, the chair introduces him or herself, explains how the meeting will be conducted, how long it will run, and what expectations there are of attendees. The chair greets each person as that person joins the meeting. Once the meeting starts, the chair calls on each person in turn to say their name, where they are from, and why they are attending, e.g. for a Carpentries discussion session, this might be for instructor checkout, or a workshop debrief. For other meetings, the motivation for attendance might be quite different, e.g. brainstorming ideas, problem solving, planning or team meetings. We highly recommend that there is a designated notetaker to record discussions and any decisions taken. After the introductory round, we recommend that the chair asks the notetaker to introduce him or herself, and to explain how and where notes from the meeting will be stored and distributed. Notes are recorded on an etherpad or a similar shared medium, e.g., a Google Doc. Any notes posted to the chat window during the meeting should be transferred to the shared medium before the meeting ends so they are not lost when the video conference is closed. To ensure that the meetings runs well, and that everyone gets their turn to speak, the chair asks people who wish to contribute to type the word hand in the chat window. The chair then calls on those speakers in turn. Some video conferencing applications such as Zoom have a Raise Hand feature that can also be used, but using hand is usually sufficient. Zoom also offers several views of participants. If you are chairing, Gallery View is probably best so you can see all your participants. Speaker View generally features the current speaker but can also be triggered by ambient noise, which can be distracting. Free tools Google Hangouts and Jitsi Meet are two free tools for online meetings. Jitsi Meet works well for very low bandwidth connections and comes with a built-in etherpad, and a Raise Hand feature. Three rules to make for better online meetings Be on time - it is very disruptive when people join an online meeting after it has started. Mute yourself when you are not speaking so that you do not disturb others. Use a headset to minimise noise from your location and to provide clearer sound for attendees. Read More ›

My Favourite Tool - RStudio
Clifton Franklund / 2017-11-07
My favorite tool is RStudio. This integrated development environment (IDE) is really a force multiplier for data analysis and reporting. RStudio provides an excellent interface to the R programming language (that much should be obvious). However, it is much more than that. This application affords me easy workflows and shortcuts to interact with most of the other tools that I regularly use. These include: GitHub/git - for keeping all of my projects under version control and public-facing. make - for automating project builds and updates. bash shell - for when the GUI just isn’t cutting it. pandoc - to create beautiful reports as .pdf, .html, .docx, or .epub files. Like Jupyter notebooks, RStudio fully supports literary programming and its own implementation of the markdown language. With packages like blogdown or bookdown, RStudio can easily be used to support the construction of static websites as well. Even better, I can also use RStudio to build presentation slides. I tend to use revealjs, but there are several other choices as well. When coupled with Shiny, I can create free, reproducible, and interactive presentations that are available to anyone with a web browser. I rely upon many different tools to get my work done, and all of them can be used without the help of RStudio. But, when they work in concert with RStudio, my work gets done in a fraction of the time. – Clifton Franklund, Professor of Microbiology, Big Rapids, MI, USA. Have you got a favourite tool you would like to tell us about? Please use this form to add a bit of detail and we will do the rest. You can read the background to these posts here, or see what other tools people have written about. Read More ›

Apply to Become a Carpentry Maintainer
Erin Becker / 2017-11-07
Software and Data Carpentry are currently accepting applications to join the lesson Maintainer team. Carpentry Maintainers work with the community to make sure that lessons stay up-to-date, accurate, functional and cohesive. Maintainers monitor their lesson repository, make sure that PRs and Issues are addressed in a timely manner, and participate in the lesson development cycle including lesson releases. They endeavor to be welcoming and supportive of contributions from all members of the community. More detailed information about what Maintainers do can be found here. Please review this document, paying attention to the time commitment involved, before submitting your application. New Maintainers will be invited to join the existing Maintainer group as we develop formal guidelines and training documentation for onboarding new Maintainers. You can apply to become a Maintainer here. New Maintainers will be mentored in the maintenance process to help them understand the current structure of the lessons, how that structure has arisen, any open topics of discussion around major changes to the lessons, and Git and GitHub mechanics. Applications will be open through Wednesday, November 22, 2017 at 6am UTC. Use this link to see the deadline in your local time. Please get in touch with Erin Becker (ebecker@carpentries.org) if you have any questions. Thank you for your interest in joining the Maintainer team! Read More ›

Skills Training for Librarians: Expanding Library Carpentry
John Chodacki / 2017-11-06
Post from the University of California Curation Center of the California Digital Library In today’s data-driven, online and highly interconnected world, librarians are key to supporting diverse information needs and leading best practices to work with and manage data. For librarians to be effective in a rapidly evolving information landscape, training and professional development opportunities in both computational and data skills must be available and accessible. Over the past couple years, an international Library Carpentry (LC) movement has begun that seeks to emulate the success of the Carpentries — both the Data Carpentry and Software Carpentry initiatives — in providing librarians with the critical computational and data skills they need to serve their stakeholders and user communities, as well as streamline repetitive workflows and use best data practices within the library. This Library Carpentry community has already developed initial curriculum and taught more than 40 workshops around the world. We are excited to announce that California Digital Library (CDL) has been awarded project grant funds from IMLS to further advance the scope, adoption, and impact of Library Carpentry across the US. CDL’s 2-year project will be conducted by their digital curation team, University of California Curation Center (UC3), and will focus on these main activities: development and updates of core training modules optimized for the librarian community and based on Carpentries pedagogy regionally-organized training opportunities for librarians, leading to an expanding cohort of certified instructors available to train fellow librarians in critical skills and tools, such as the command line, OpenRefine, Python, R, SQL, and research data management community outreach to raise awareness of Library Carpentry and promote the development of a broad, engaged community of support to sustain the movement and to advance LC integration within the newly forming Carpentries organization Why Library Carpentry? Library Carpentry leverages the success of the Carpentries pedagogy, which is based on providing a goal-oriented, hands-on, trial-and-error approach to learning computational skills, and extends it to meet the specific needs of librarians. It is often difficult to figure out what skills to learn or how to get started learning them. In Library Carpentry, we identify the fundamental skills needed for librarians and develop and teach these skills in hands-on, interactive workshops. Workshops are designed for people with little to no prior computational experience, and they work with data relevant to librarians (so that librarians are working with data most applicable to their own work). Workshops are also friendly learning environments with the sole aim of empowering people to use computational skills effectively and with more confidence. How does this relate to the Carpentries? Two sister organizations, Software Carpentry and Data Carpentry, have focused on teaching computational best practices. The ‘separate but collaborative’ organizational structure allowed both groups to build a shared community of instructors with more than 1000 certified instructors and 47 current Member Organizations around the world. However, as Software Carpentry and Data Carpentry grew and developed, this ‘separate but collaborative’ organizational structure did not scale. As a result, the governing committees of both Software Carpentry and Data Carpentry recognized that as more mature organizations they can be most effective under a unified governance model. On August 30, 2017, the Software Carpentry and Data Carpentry Steering Committees met jointly and approved the following two motions, which together form a strong commitment to continue moving forward with a merger. As part of this merger, the new “Carpentries” organization will look to increase its reach into additional sectors and communities. The nascent Library Carpentry community has recently met to decide they aim to join as a full-fledged ‘Carpentry’ in the coming year. This grant will help LC solidify approaches to learning and community building, while also bringing resources to the table as we embark on future integration of LC within the merged Carpentries organization. How does the Carpentries model work? In the Carpentries model, instructors are trained and certified in the Carpentries way of teaching, using educational pedagogy, and are asked to commit to offering workshops in their regions and reworking/improving and maintaining lessons. These instructors teach two-day, hands-on workshops on the foundational skills to manage and work effectively with data. The goal is to become practitioners while in the workshop and then continue learning through online and in-person community interaction outside the classroom. With the “train-the-trainer” model, the Carpentries are built to create learning networks and capacity for training through active communities and shared, collaborative lessons. They have used this model to scale with parallel approaches of developing lessons, offering workshops, and expanding the community. The LC community has also used this model and our grant project aims to extend this further. Next Steps As an immediate next step, CDL has begun recruiting for a Library Carpentry Project Coordinator. This will be a 2-year and grant funded position. You can apply at the UC Office of the President website. Due date is November 31, 2017. While this position will report to CDL’s Director of University of California Curation Center (UC3), this position will focus on extending LC activities in the USA and working globally to gain capacity and reach for the Library Carpentry community and Carpentries staff. For more information on this project, please feel free to contact CDL’s UC3 team at uc3@ucop.edu. You can also follow UC3 on Twitter at @UC3CDL). To learn more about Library Carpentry, you can visit https://librarycarpentry.github.io and follow on Twitter at @LibCarpentry. We look forward to these next steps for Library Carpentry and a growing network of data savvy librarians. Read More ›

My Favourite Tool - R
Bianca Peterson / 2017-11-06
R can do anything - from making presentations, analyzing and plotting data to version control (Git). I use it for absolutely everything! It’s not difficult to use and it’s free! – Bianca Peterson, Temporary lecturer (Microbiology) & Project Coordinator (IT), South Africa Have you got a favourite tool you would like to tell us about? Please use this form to add a bit of detail and we will do the rest. You can read the background to these posts here, or see what other tools people have written about. Read More ›

My Favorite Tool - Emacs
Francesco Montanari / 2017-11-03
My favorite tool is Emacs. Emacs provides an extensible and unified framework to access nice interfaces to several other tools. For instance, it allows you to: Keep notes, maintain TODO lists, plan projects, edit and automatically export documents to many formats (Org-mode). Conveniently edit Python source code and, at the same time, send code regions to a Python shell, permitting a piece-by-piece interactive programming (Python mode). Make the GDB debugger a user-friendly and effective tool when programming in C++ (GUD mode). Use Git through a beautiful interface (Magit). Edit LaTeX files through sophisticated packages that synchronize the text buffer with a PDF viewer, pretty-print mathematical expressions directly in the text buffer, automatically handle references and much more (Auctex). Access a handy but powerful LISP interpreter anytime from any text buffer. – Francesco Montanari, Postdoctoral researcher, cosmology, Helsinki. Have you got a favourite tool you would like to tell us about? Please use this form to add a bit of detail and we will do the rest. You can read the background to these posts here, or see what other tools people have written about. Read More ›

Pack Your Bags for Dublin!
Fotis Psomopoulos, Belinda Weaver / 2017-11-02
The Carpentries are excited to announce that the 2018 CarpentryCon will take place from 30 May - 1 June, 2018 at University College Dublin (UCD). Yes, we are going to Ireland for the inaugural CarpentryCon! A huge thank you to UCD for their bid - we are confident that this will be a fantastic venue. We are also grateful to all the other community members who proposed bids to host CarpentryCon. All were compelling, and it was very hard to select one from so many fantastic options. However, Dublin is a great fit for all the things we want: It is a busy travel hub, which should offer lots of easy connections and opportunities for cheap fares. It is a mid-point between Europe and the USA and Canada, and easy to connect to from the southern hemisphere too. As a popular tourism destination, it is well-placed to offer a wide range of budget-priced accommodation options. Lots of tech companies operate there - we will tap them for sponsorship. The host has great experience running this kind of event. What we hope to do for attendees It is important that this be an inclusive event, so we aim to keep registration costs low. We will announce price ranges for registration soon. We also hope to be able to provide travel scholarships to facilitate attendance from those who might not otherwise be able to come. Stay tuned for announcements. To help attendees who need to provide evidence of speaking or presenting in order to get funding for travel, we will offer at least one session where attendees can share how they have incorporated Carpentry techniques into their own research and teaching, and/or how they have grown their local Carpentry community. Contributions may be either talks or posters. Details soon. We would like to get some rough estimate of how many people might want to attend. Please fill in this short, anonymous form that will help us gauge the level of community interest in CarpentryCon2018. What can you expect from CarpentryCon? CarpentryCon will focus on three main themes: Community Building We will bring members of the Carpentry community together with people sharing similar interests from around the globe. Unlike most conferences, our format will be “come and learn”. Sharing Knowledge Community leaders will offer sessions on teaching methods, curriculum development, community organization, and leadership skills so we can grow our next generation of community leaders and champions. Networking Participants will be able to come together informally to meet peers and community leaders and to share stories about challenges and successes. What else? Planning will now ramp up in earnest. Keep an eye on our blog, Twitter and Facebook channels for announcements and updates. The CarpentryCon repo also has a lot of information. Want to tweet about it? Use the hashtag #CarpentryCon2018. Read More ›

My Favourite Tool: OpenRefine
Juliane Schneider / 2017-10-30
My favorite tool is OpenRefine. OpenRefine all the way, baby! I’m the curator of eagle-i, an RDF open access repository of stem cells, viruses, mice, core facilities, lab equipment and people. I use OpenRefine to facet by URI, by date, by label, to clean the data up, search for anomalies and export in csv, Excel or whatever format is needed for my purpose. I use OpenRefine’s General Refine Expression Language functionality (GREL) to perform complex search and replace, to transform the data, to break up and combine strings. OpenRefine is the gateway to my deeper understanding of eagle-i, and has led the way to new uses of the data for assessment of facility use and resource discoverability. – Juliane Schneider, Lead Data Curator, Harvard Catalyst, based in Chicago. Have a favorite tool of your own? Please tell us about it! Read More ›

My Favourite Tool: Jupyter Notebook
Thomas Arildsen / 2017-10-28
My favorite tool is … the Jupyter Notebook. One of my favourite tools is the Jupyter notebook. I use it for teaching my students scientific computing with Python. Why I like it: Using Jupyter with the plugin RISE, I can create presentations including code cells that I can edit and execute live during the presentation to demonstrate the various aspects of Python that I introduce. It is a very good compromise between running scripts and typing in the command prompt on the projector screen vs. just showing static slides with code. – Thomas Arildsen, Associate Professor, Aalborg Have a favorite tool of your own? Please tell us about it! Read More ›

My Favourite Tool: IPython
Kellie Ottoboni / 2017-10-27
My favorite tool is … IPython. IPython is a Python interpreter with added features that make it an invaluable tool for interactive coding and data exploration. IPython is most commonly taught via the Jupyter notebook, an interactive web-based tool for evaluating code, but IPython can be used on its own directly in the terminal as a replacement for the standard Python interpreter. Why I like it You can run Unix commands directly in IPython. For instance, if you want to load a file from another directory, it is convenient to cd into the directory from within the IPython window. IPython has an extensive tab autocomplete for function names, function arguments, file paths, and object names. It comes equipped with “magic” commands: functions that assist in programming and that can be called with a single word starting with %. %paste takes whatever is on your clipboard and formats it nicely so IPython can read it – useful for pasting in large blocks of code. %timeit runs time tests and %lprun runs line profiling automatically. The interpreter saves your command history across sessions. In case you close the window before you’re done, you can fire IPython back up and search through the history. IPython makes it easy to test my code interactively, piece by piece. – Kellie Ottoboni, PhD Candidate in Statistics, Berkeley Have a favorite tool of your own? Please tell us about it! Read More ›

My Favourite Tool: Git/GitHub
Jeff Oliver / 2017-10-25
My favorite tool is … I love Git and GitHub. I only use them for work I care about. Examples include lesson development for R workshops, my recent performance review packet, and collaborative projects on species distribution modeling. Oh, and Software Carpentry workshop websites (obviously). Why I like Git: The version control system and the companion remote host system (GitHub or Bitbucket or CloudForge, etc.) provide a great versioning and collaboration platform, the highlights of which have been enumerated many times over and in such depth that I won’t talk about them here. The reason I like this dynamic duo is that it reinforces best practices. I should say that using version control won’t necessarily make you 100% compliant with everyone’s idea of best practices, but with a little consideration of a workflow, it can go a long way. Here is why: Reproducibility: By ignoring my output folder in pretty much all my Git repositories, it forces all figures & analyses to be completely reproducible from materials that are in the folders that are tracked. Offsite backup: Rather than lugging my aging laptop to and fro, pull-add-commit-push allows me to preserve my work in a location accessible from any internet-enabled terminal. This has the added benefit of protecting against natural disasters and the inevitable bricked hard drive (mark my words, death, taxes, and a failed HD are the only certainties in life now). Sharing: Sure, some of what I currently work on is not ready to be released, so I use the private repository option. But when I am ready to share my code and data, it’s literally one to two mouseclicks and my work is open for re-use by the community. The visibility of platforms like GitHub and Bitbucket make work that much more discoverable. Documentation: Everybody’s favorite part of software development is … not likely writing documentation (granted there are some of you out there). Because good documentation is imperative for re-use and evaluation, the little reminders from GitHub (“Help people interested in this repository understand your project by adding a README”) further encourage best practices for open research. The support for markdown rendering on GitHub makes it especially nice for writing professional-looking documentation of your work. Sure, I struggled struggle sometimes with Git syntax and concepts, but 98% of the time I only use four commands (pull-add-commit-push, remember?) and the Git/GitHub combo reduces the time I spend developing, preserving, and sharing the work I do. – Jeff Oliver, Data Science Specialist, Tucson, Arizona Have a favorite tool of your own? Please tell us about it! Read More ›

Our long-term assessment results are in!
Kari L. Jordan, Tracy Teal, Erin Becker, Karen Word / 2017-10-18
A discussion of learner outcomes more than six months after attending a Carpentries workshop What concrete changes are people implementing in their computational research practices as a result of completing a Carpentries workshop? Our long term survey report shows that two-day Software or Data Carpentry workshops are effective for increasing skills and confidence, and the adoption of reproducible research perspectives. We see gains in our survey measures for learners’ motivation to continue their learning, change in reproducible research behavior, and frequency of use of computational skills and tools. We find this very exciting, especially since a recent general survey of bootcamps and short-format trainings reports no measurable impact on skill development or research productivity. Software and Data Carpentry have taught workshops to over 27,000 learners in 35 countries around the world. Post-workshop survey reports for Software Carpentry and Data Carpentry have consistently shown that people like the workshops, that they know more about importing data sets into R and Python to work with data, write functions, and initialize repositories in git, and that they think they can apply skills immediately to their work. Our focus has always been on long term change, including: Improving learners’ confidence and motivation to use computational tools, Changing behaviors around reproducible research and effective computational work, and Increasing the frequency and types of computing skills used. Therefore, we launched our first long-term assessment survey in March 2017 to gather quantitative evidence about specific behaviors our learners have adopted and continue to embody six months or more after completing a Carpentries workshop. Assessment specialists on staff and in the community developed an instrument, based on existing instruments, for collecting information regarding learners confidence and motivation to use the tools they learned, and behaviors they adopted after attending a Carpentries workshop. Rather than focusing on learners’ skills in particular tools, we focused on assessing learner confidence, motivation and adoption of good research practices, as these elements represent the primary goals of our workshops. Confidence and motivation are important factors for learners to have to continue their learning. They also promote community building, a significant focus area of the Carpentries. The final survey instrument included items for self-reported behaviors around good data management practices, change in confidence in the tools they learned, and other ways the workshop may have impacted learners (ex. Improved research productivity). Over 530 people who took a Software or Data Carpentry workshop 6 months or more ago responded to our long-term survey. These results show that workshop respondents had a positive impression of the workshop and the majority felt their skills and perspectives have changed as a result of attending the workshop. Results also show that these two-day impactful workshops are effective for increasing skills and confidence. The impact of these workshops is apparent in respondents’ coding practices. The majority of respondents (70%) reported having improved their coding practices by using programing languages like R or Python or the command line to automate repetitive tasks, by reusing code for other purposes, or by using databases to manage large data sets. Respondents have continued their learning and incorporated use of these tools into their weekly or daily work. Additionally, sixty-nine percent of respondents have made their analyses more reproducible as a result of completing a Carpentries workshop by reusing code and making their data and analyses available on public repositories. Not only do these two-day coding workshops increase respondent’s daily programming usage, eighty-five percent of respondents have gained confidence in working with data and open source tools as a result of completing the workshop. The long-term assessment data showed a decline in the percentage of respondents that ‘have not been using these tools’ (-11.1%), and an increase in the percentage of those who now use the tools on daily basis (14.5%). Highlights from our long-term survey The majority of our respondents: Gained confidence in the tools that were covered during their workshop (85.3%). Improved their coding practices (63.1%). Received professional recognition for their work as a result of using the tools they learned (64.7%). Respondents also substantially increased their frequency of use of programing languages (R, Python, etc.), databases (Access, SQL, etc.), version control software and/or the Unix shell, incorporating these tools into their regular workflows. Nineteen percent of respondents transitioned from using these tools once a month or less to weekly or daily use per the figure below. Respondents perceive the workshop had an impact on their confidence, as well as their productivity, reproducibility and coding practices. Interestingly, respondents also felt that the workshops had a positive impact on their career as a whole, and some received recognition for their work. The figure below shows what impact survey respondents felt for several factors including career, confidence, and continuous learning. Respondents were asked to rate their level of agreement (1-Strongly disagree to 5-Strongly agree) with the statements below. The x-axis labels for the figure are in italics, and correspond to the statement following. Reproducible: I have made my analyses more reproducible as a result of completing the workshop. Recognition: I have received professional recognition for my work as a result of using the tools I learned at the workshop. Productivity: My research productivity has improved as a result of completing the workshop. Motivation: I have been motivated to seek more knowledge about the tools I learned at the workshop. Confidence: I have gained confidence in working with data as a result of completing the workshop. Coding: I have improved my coding practices as a result of completing the workshop. Career: I have used skills I learned at the workshop to advance my career. The figure shows that respondents agree or strongly agree that they gained confidence in working with data (85.3%), made their analyses more reproducible (69.4%), and receiving professional recognition for their work (64.7%) all as a result of attending a Software or Data Carpentry workshop. From this figure we also see that there are opportunities to improve. Motivation to seek more knowledge seems unchanging, likely because learners who attend our workshops are already motivated. Perhaps these learners remain enthusiastic, having high pre-workshop motivation scores, rather than a decrease in motivation post-workshop. We’d also like to see a shift in positive trend for learners using the skills they learned to advance their career, which is why we are implementing round two of the Carpentries Mentoring Program this fall. Interested in reading more? The full report is available, and it provides more detailed information about the motivation behind this survey, respondent demographics, and growth opportunities. Take a look at the report. We will continue to conduct this assessment at 6 month intervals to capture feedback from people who took workshops 6 months or more ago. Additionally, assessment will be the focus of our October community call. Bring your thoughts to the community call October 19th! The surveys used in this work, anonmyized data, and R scripts for generating the figures are available in our assessment repository. This report was made possible by community input from Ben Marwick, Belinda Weaver, Naupaka Zimmerman, Jason Williams, Tracy Teal, Erin Becker, Jonah Duckles, Beth Duckles, and Elizabeth Wickes. We thank you all so much for your contributions to the code in this report and development of our long-term survey! If you have other questions about the data or results, please use the data, re-analyze the results or ask your own questions! What strikes you? Comment below, and tweet us your thoughts at @datacarpentry @swcarpentry and @drkariljordan using the hashtag #carpentriesassessment. Thanks to the Gordon and Betty Moore Data Driven Discovery initiative for support of Data Carpentry and these assessment efforts. Read More ›

Call for Nominations to Joint Board
Karen Cranston / 2017-10-18
EDITED 2017-11-07: Board of Directors is a legal term and can’t be used for a sponsored project. Changed to “Steering Committee” Call for Joint Carpentries Leadership: Stand for election to the joint Steering Committee of the merged Carpentries organization As most of you know, Software Carpentry and Data Carpentry are merging into a new organization, provisionally called “The Carpentries”. This new organization will officially begin on January 1, 2018, with a hugely talented staff and dedicated community already in place. This is an exciting time, and we are looking for people who want to help direct the new organization by being an elected member of the Steering Committee of The Carpentries. The Steering Committee will include both appointed and elected members in order to balance community engagement with needed expertise in leading a growing non-profit organization. For more information about the responsibilities and composition of the committee, see this issue, part of the merger RFC. Who can run and vote? Following current SCF bylaws current Carpentries members may vote and serve on the Steering Committee. Election or appointment to the Steering Committee is currently limited to members. The membership is made up of: Every qualified instructor who has taught at least two Software or Data Carpentry workshops in the past two calendar years. Anyone who has done 30 days or more work for the Carpentries in the past calendar year. Anyone who has, in the opinion of the Steering Committee, made a significant contribution in the past year. The signatory for a Silver, Gold or Platinum Member Organization If you’re not sure if you’re a member, log in to AMY and see if records show that you have taught in the last two years. If you need records updated or have any questions, please email team@carpentries.org. If you have taught workshops that aren’t registered, please include a link to those workshops. We’ll also be sending out an email to each instructor with their status. How do I stand for election? In order to stand for election we request that you write a blog post that introduces yourself to the community. The post: must be about 500 words and can be written in any format (question and answer, paragraph etc.) must be titled “2018 Election: Your Name” must be submitted by December 1, 2017 You can submit your post as a pull request to either the SWC website repository, the DC website repository or by email. In the post, you should explain: your previous involvement with The Carpentries what you would do as a member of the Steering Committee to contribute to the growth and success of the community The post from last year’s SWC elections contains examples. Candidates will be given the opportunity to share their thoughts with our community, including ideas for continued involvement, at our two community meetings on November 16, 2017. Timeline for election: October 23: nominations open November 16: nominees can introduce themselves on community calls December 1: nominations close December 4-8: community votes on candidates Read More ›

RFCs and lessons learned
Kate Hertweck, Karen Cranston / 2017-10-17
Thanks to everyone for sharing questions and comments on our recent Request for Comments regarding the upcoming merger of Software Carpentry and Data Carpentry. Now that the official response period for the RFCs has ended, the GitHub issues specific to the RFCs will be closed. We received comments from individuals across our community representing multiple stakeholders, including the Software Carpentry Advisory Council (representing member organizations) and the community at large (through the GitHub repository, private comments via the Google Form, and personal communication). Those of us working on the merger have been actively considering and discussing this valuable feedback and have been identifying areas requiring additional attention and clarification. We will be publishing blog posts over the next few weeks summarizing our thoughts and identifying future actions for each of the RFCs. You are welcome to continue communicating your questions or concerns to us through new issues in the GitHub repo. Below we describe the context behind a few themes and issues emerging from the RFC discussions which may not be immediately apparent but have factored heavily into our approach. Compromise. This restructuring has involved months of discussion and planning involving the existing leadership for Software Carpentry and Data Carpentry. While both groups share related visions and have begun to implement joint policies, the origins and governance of each group are quite different. As a result, our approach to the merger is based on compromise and an understanding that no solution will be the perfect fit for everyone. Scalability. Merging two organizations requires thinking carefully about the scalability of our current policies and processes. This includes the potential addition of other lesson organizations following the union of Software Carpentry and Data Carpentry, as well as continuing to reach more learners, train more instructors, and expand into new geographic areas. Staff. As our organizations have grown, we have continued to hire staff to support the community in achieving our mission. With this increase in staff comes an obligation to these employees to provide job security and the ability to make longer-term plans related to their role in the organization. Moreover, the role of community governance (e.g., Board of Directors) can now shift focus from policy establishment/implementation to strategic planning. Financial obligations. Both organizations (and the future merged organization) are non-profit agencies supported by a fiscal sponsor. Additionally, support of our staff and other infrastructure comes largely from institutional memberships, for which we are obligated to provide associated services. Community involvement. The role of the community in leadership and governance varies between Software Carpentry and Data Carpentry, in part because Software Carpentry is a more mature organization and in part due to differences in how the groups were formed. This reorganization is an opportunity to combine the best parts of each organization’s approach to leadership and governance. Specific decisions about governance have been based on previous experiences from both organizations, successful leadership models in other large volunteer projects, and the priorities described above. All of these issues represent perspectives that we have considered during planning, and which we continue to weigh as we incorporate your feedback. We understand that this process is resulting in incongruity with existing Software Carpentry governance procedures. We also recognize there are many decisions to be made that the original documents did not anticipate. Therefore, we are endeavoring to uphold the mindset of these guidelines and are using them as the foundation for the new organization’s bylaws. If you have questions or concerns about this process, please feel free to contact Kate Hertweck, the Software Carpentry Steering Committee chair. Please watch for the upcoming summaries of feedback from the RFCs, as well as the call for nominations for the four elected members of the first Board of Directors of the merged organization. Read More ›

My Favourite Tool: R
Paula Andrea Martinez / 2017-10-16
My favorite tool for data analysis is R. I have used it for a few years now, I feel at ease when I need to work with tabular data. Using R makes my work enjoyable. I can clean my data in one place - RStudio - and then work with it creating new graphics. I also enjoy learning new things in R. Its constant development gives you the chance to learn a bit more every day. This is thanks to a huge collaborative community that supports you, with quick answers, with examples, and with new packages. – Paula Andrea Martinez, Data scientist / Postdoc / Data Analysis and Bioinformatics. @orchid00 Have you got a favourite tool you would like to tell us about? Please use this form and we will do the rest. You can read the background to these posts here. Read More ›

Blogging for the Carpentries - We Want to Hear From You
Belinda Weaver / 2017-10-16
This is an open invitation to our community members to share their knowledge. We have a global community working in many different disciplines using a vast range of tools. On our blog, we would like to tap into that community experience and then share that hard-won knowledge. This kind of information could help people new to a discipline, or might inspire others to try out a tool they have never used before. Time wasted or time saved? As with all learning, there is an opportunity cost; time spent learning new tools is time stolen from essential research. So the first decision might be whether you can spare the time to learn something new, though that might not be the hardest decision. After all, if a tool pays off in increased efficiency and time savings down the track, then that time is definitely well spent. But which tool should you pick? For what purpose? This is where experience in a discipline is so valuable. After all, anyone working with statistics can probably make a great case for R. We would like to hear from community members willing to share their experiences. Posts on My Workflow from senior researchers in a discipline would be a fantastic resource for newcomers. Posts about My Favorite Tool and Why I Love it would help others decide whether to put in the time to master it. You may be thinking “I am way, way too busy” to do this so we want to make it easy for you. We have a form with some prompts for you to fill in. Just a few short lines are all we need, and we will do the rest. Worried you have made errors in the form after submitting it? Anxious you have omitted something important? We will let you review the post before it goes out. So what are you waiting for? Please tell us your story today! Read More ›

Carpentries Mentorship Program - 2.0
Erin Becker, Kari L. Jordan, Tracy Teal, Christina Koch / 2017-10-12
We’re starting a new round of mentoring groups, centered on specific lessons Mentorship is an important part of the Carpentry experience. As Instructors, we both teach and mentor our Learners. We also mentor each other as Instructors, learning something new from each other every time we teach and interact with one another. The Mentoring Subcommittee offers guidance to new and continuing Instructors through weekly discussion sessions, where Instructors from the global Carpentry community gather to share their experiences and learn from each other. This is a fantastic opportunity to interact with other Carpentry Instructors from around the world. Many in the Carpentry community have expressed interest in having more extensive and longer-lasting opportunities for mentorship. Based on this, we ran a pilot version of a new Mentorship Program, starting in January 2017. Nearly 100 Carpentry Instructors participated in the program, with 58 Mentees and 34 Mentors in 18 small groups. Groups were put together based on a variety of factors, including common teaching interests and geographies. These groups met once a month to discuss topics of interest to the group members and to help Mentees prepare for their first workshop. In June 2017, we asked participants in the pilot program for their feedback. Participants said that they enjoyed the opportunity to share and learn from each others’ experiences and expertise. They also reported that the experience enabled them to get involved with the Carpentry community and to network with Carpentry Instructors at other institutions. When asked about negative aspects of the program, many participants reported difficulty scheduling meetings with their groups as well as a lack of focus and difficulty in deciding topics to discuss within their groups. Many participants offered concrete suggestions on how the program could be improved, including: offering more guidance to mentorship groups on what to do during the program assigning groups specifically around common interests and goals enabling more integration and communication among groups. As with any pilot program, one of the goals of this program was to identify aspects that could be improved, based on the shared experiences of the participants, so we are very grateful for the feedback we received. We listened to your feedback and have made changes to the program. We are now offering curriculum-specific mentoring: both mentors and mentees can choose which tools they are most interested in discussing from the following list: Git Shell Python R SQL Additionally, groups will focus on either lesson maintenance, teaching workshops, organizing workshops, or community building. This program will run from October 25th to January 10th, 2018, and will culminate in a Virtual Showcase, in which groups will share their work with the broader Carpentry community. So far, 18 people have signed up to participate in this round of mentoring groups. Applications close October 18th, so don’t wait to apply to either be a mentor or mentee. Get involved by attending one of the information sessions being held October 12th at 06:00 UTC and 21:00 UTC. Sign up to attend on the etherpad. You can also join the conversation by tweeting @datacarpentry and @swcarpentry using the hashtag #carpentriesmentoring. Read More ›

All about Membership
Belinda Weaver / 2017-10-08
We are fortunate in the Carpentries to have many member organizations who support our work. However, if we are to continue reaching out to new disciplines and to build communities in under-served countries, we need a broader and more diverse membership base. What are the benefits of membership? Members receive priority access to instructor training and guidance about capacity building at their organization. Once institutions have a pool of local instructors, they can readily run low-cost local workshops that teach foundational computational and data skills to their staff and students. Memberships give Software and Data Carpentry revenue to ensure the ongoing development and maintenance of the lessons demanded by research communities. We work to give your local instructors support, ongoing mentorship and a forum for community lesson development. In addition, we have just launched round two of a targeted mentorship program to help new instructors develop their skills. Want the Carpentries at your organization but not sure how to do that? To help answer any questions you might have, Software Carpentry Executive Director Jonah Duckles is hosting a series of short webinars on membership. We hope to see you there. The next is at noon UTC on Tuesday, 10 October. (check local time/date.) If you can’t make that one, the next will be held at 9 pm UTC on Tuesday 31 October. Check your local date and time here. Further webinars will be announced on the webinars etherpad. Our membership page is here. Read More ›

Mentoring is Back! Round Two of the Carpentries Mentoring Program begins October 25th
Kari L. Jordan, Belinda Weaver / 2017-10-05
Mentoring groups provide experienced instructors with the chance to help small groups develop confidence in teaching, lesson maintenance and community building The inaugural Carpentries mentoring program was a great success, and we have used the feedback we received from both mentors and mentees to craft a new and improved mentoring experience in round two. The next round will run October 25th - January 10th. According to round one participants, the benefits of mentoring included greater understanding of the challenges new instructors face, more clarity about why we teach what we teach, getting timely responses to questions, and community engagement. Participants felt the program could be improved if mentoring groups had specific goals, and if we gave mentors more guidance on how to run mentoring sessions. We listened to that feedback and have made changes to the program. We are now offering curriculum-specific mentoring: both mentors and mentees can choose which tools they are most interested in discussing from the following list: Git Shell Python R SQL Once a topic has been selected, participants can choose what aspect of mentoring they want for their chosen tool: Lesson Maintenance Contributing to current lesson development Contributing to lesson maintenance Teaching Workshops Developing confidence and skill in teaching Preparing to teach a specific lesson (e.g., Python) Additionally, we plan to offer mentoring on two big issues: Organizing Workshops Logistics of organizing a workshop (e.g. marketing, registration) Logistics of running a workshop (e.g. recruiting instructors, distributing tasks) Community Building Strategies to create and build local communities Tried-and-true events that help foster local community development To help groups get organized we have provided sample mentoring program outlines to help groups use their time together productively. Interested in mentoring? We will hold two information sessions on Thursday, October 12th at 06:00 UTC and 21:00 UTC. Sign up to attend either information session on the etherpad. Applications for both mentors and mentees are open. The deadline to apply to participate in the program is October 18th. Share your excitement about mentoring via Twitter (@datacarpentry @swcarpentry @drkariljordan @cloudaus) with the hashtag #carpentriesmentoring. Read More ›

Trainer Training Announcement
Karen Word / 2017-10-03
As the Carpentry community continues to grow, our instructor training is increasingly in demand! In September, we welcomed 13 new Instructor Trainers who will help us to meet that need. We are now accepting applications for the next group of new Trainers. In this round, we welcome all applicants, but are particularly keen to recruit trainers who can work in Latin America, Africa, Australia and New Zealand. We would also like to recruit new Trainers who are fluent in Spanish. Carpentry Instructor Trainers run instructor training workshops, lead online teaching demonstrations, and engage with the community to discuss and guide the continuing development of the instructor training curriculum, the instructor checkout process, and downstream instructor support. We meet regularly to discuss our teaching experiences and to stay up to date on policies, procedures, and curriculum updates. The Trainers are an eclectic group. Some of us have formal training in pedagogy, some are experienced Carpentry instructors. Some run trainings as part of their jobs, and others pitch in during their own free time. We all share a commitment to propagating evidence-based practices in teaching and to helping new instructor trainees become familiar and comfortable with Carpentry practices and principles. Our Trainer agreement explains what is involved. It describes our expectations for anyone who aspires to become a Carpentry Instructor Trainer. The trainer training process consists of eight one-hour weekly virtual meetings (with a break for the December holidays). In these meetings we will discuss readings on pedagogy, largely drawn from our ‘textbook’, How Learning Works. We will also review the Carpentry Instructor Training curriculum, and discuss ways in which we can both teach and apply best practices to create a welcoming and effective class. After completing the meeting series, new Trainers will shadow part of an online instructor training event and a teaching demonstration session. Trainers-in-training also attend the regular monthly meetings of the Trainer community. This group of Trainers will start meeting in November. They will be eligible to teach instructor trainings by February, 2018. If you are interested in joining the Trainer community, please apply here! Applications will be open until October 17. If you have previously applied and are still interested, you may either re-apply (especially if anything relevant has changed) or just let us know that you are still interested. If you have any questions about the training process or the expectations for being a Trainer, please get in touch with Karen Word. Read More ›

Maintaining Lessons - Community Perspectives
Christina Koch / 2017-10-03
Our September community call on lesson maintenance brought up many good ideas around the lesson maintenance process for Software and Data Carpentry lessons. If you weren’t able to make the call, below is a summary of our discussion and potential avenues for growth. Our discussion focused around two major questions: what do the lesson maintainers do, and what are some of the reasons to be a lesson maintainer? What do lesson maintainers do? Managing Issues and Pull Requests (PRs) A big piece of a lesson maintainer’s job is to respond to the issues and pull requests that are submitted to their lesson repository. Depending on the extent of the suggested change, or the number of submissions, this process can be brief or time-consuming per week. There have been more and more contributions over the past few months as future Software and Data Carpentry instructors are required to submit a change or suggestion as part of their checkout process. Room to grow: General guidelines for the maintainers; how quickly to respond to PRs, when to close them, etc. In instructor training, emphasize the equal importance of review, not necessarily submitting new issues/PRs, as lesson contribution Coordinate with maintainers when there may be “bursts” of work (lots of instructor training, Bug BBQs) Curriculum Decisions and Feedback An extension of managing a lesson’s changes (via issues and pull requests) is making larger decisions about the lesson as a whole. More significant issues can come up as the lesson grows (for example: in an R lesson, whether to emphasize tidyverse or base R), requiring a decision from the maintainers about which direction to go. In addition to these larger changes, there isn’t always a good way to provide centralized feedback from the lessons after they’re taught. We have discussion sessions, but that information isn’t always communicated back to the maintainers. Room to grow: Possibly provide some kind of “advisory” structure to the individual lesson maintainers for more big-picture decisions; currently being tested by the Data Carpentry genomics maintainers Communicate clear channels for providing feedback (both good and bad!) about the lessons. Have a process for working on larger changes (possibly separating the lessons into a “stable” and “development” release, one version of this described here). What are the benefits of being a maintainer? In our discussion, several benefits for being a maintainer arose: Professional credit: We publish the lessons at least once a year and these can be listed as publications on a CV. Maintainers are equivalent to editors of a volume. Maintainership can be an item on a job or tenure application. Improving your git skills, especially for managing a collaborative project. These are skills that can translate to other areas, including software development and teaching. Interacting with a wide variety of community members (via issues and pull requests) Being able to support something you believe in (teaching data or computing skills) by maintaining the lesson material. Seeing different perspectives on a particular lesson; understanding why it is the way it is. Room to grow: Address some of the previous challenges in order to make maintaining lessons more accessible. Create some standard descriptive wording for use in applications for jobs, tenure, and grants that maintainers can use to highlight their contributions. Publicize our lesson publication information more widely. We hope to address some of the growth areas in the next few months; contact Erin Becker or Christina Koch if you have questions or feedback about the future of lesson maintenance. Read More ›

1 - 30 September, 2017: Future of the Carpentries, New Staff Members, Community Service Awards, CarpentryCon
Martin Dreyer / 2017-10-02
Highlights All the details on the joint future of the Carpentries explained. Please feel free to ask any questions that you need answered. The very first ever Data Carpentry in Ethiopia was held and it was an extraordinary experience. The Lesson Infrastructure Subcommittee had their 2017 September meeting and have come up with some resolutions. We are pleased to anounce we have three new staff members who will be working for the Carpentries part time: SherAaron Hurt is the new Workshop Adminstrator for the Carpentries, Elizabeth Williams has joined as a part-time Business Administrator, and Karen Word has joined the Carpentries and will be the Deputy Director of Instructor training. If you feel there is a community member who is working extra hard to help our organization, please consider nominating them for a Community Service Award. Tweets Why is Python growing so quickly? Carpentry instructors - some great advice here on making workshops better for people with Dyslexia. Want to submit a Carpentries blog post but nervous about GitHub? You can use a form. Know an unsung hero of the Carpentries? Nominate them for a Community Service Award. Want to report on a workshop? Write a Software Carpentry blog post. Read our newsletter, Carpentry Clippings, to keep tabs on our community. Want more Carpentries? Support our growth by becoming a member organization. Report of recent in-person staff meeting for both Carpentries - what we discussed. We now have a discussion group just for #assessment, led by @DrKariLJordan. Our newsletter will appear on Tuesday. Not a subscriber yet? Sign up here. General Everything you need to know about CarpentryCon. Brian’s Poetical notes on the unix shell is availabel on his google drive. The University of Mauritius along with other sponsors hosted an HPC workshop in July and had attendees from various backgrounds. University of Namibia held the second Software Carpentry workshop in Namibia and learned some valuable lessons. Plese share your thoughts on the future of the Carpentries. Share your ideas and exerience on the Software Carpentry lessons. The Carpentries, the National Node of Bioinformatics Mexico (NNB) and the Ibero-American Society of Bioinformatics (SoIBio) invite you all to participate in the project Carpentry for Latin America. Stencila was used to teach SQL and R at a UBC workshop in Canada. 16 workshops were run over the past 30 days. For more information about past workshops, please visit our website. Upcoming Workshops: October The University of Washington eScience Institute, Lawrence Livermore Labs, The River Club, Aarhus University, The Francis Crick Institute, Oxford University, University of Nebraska Omaha , Karolinska Institutet,UCLA, University of Michigan, University of Pennsylvania, Boston University, European Molecular Biology Laboratory, University of Otago & NeSI, Institute for Theoretical Physics UAM-CSIC, Minnesota State University Moorhead, Rutgers University, Camden , Harvard Medical School. November University of Manchester, University of Missouri - Columbia. Read More ›

Toads in Vancouver: using Stencila to teach SQL and R at UBC
Danielle Robinson / 2017-09-29
One of Stencila’s goals is to create an easy way for people who don’t yet code to learn data science and statistics skills, and to feel comfortable trying out powerful scientific computing languages like R, Python and Julia. We’re doing that by providing interfaces that are similar to the word processors and spreadsheets that they already use. It’s a way for people to “dip their toe” into code - without having to dive into the daunting ocean of IDEs, text editors, packages, version control etc. Earlier this year, we connected with Giulio Valentino Dalla Riva (@gvdr, @ipnosimmia) a data scientist based at the Master of Data Science programme at the University of British Columbia. Giulio is a Postdoctoral Teaching Fellow who teaches courses in statistics and data science for a broad range of students, many with no prior exposure to programming. More about Giulio here. Giulio was interested in piloting Stencila in one of his fall courses on Data Management for Business Analytics: a course for Master’s level business students at the UBC’s Sauder School of Business. These students need to develop data science skills, but many had never used languages like R and SQL before. The intuitive Word- and Excel-like visual interfaces in Stencila are a powerful tool for data science education for students familiar with those environments. So we jumped at this opportunity to beta-test Stencila and get the feedback we need to improve the platform for this use case. Over the last two weeks, we watched from afar as Giulio introduced his students to data analysis concepts, R, SQL, and even delivered homework assignments and quizzes with the help of Stencila. Stencila can be used in a number of ways but our initial focus has been on providing beta-testers with the downloadable Stencila Desktop. But Giulio knew from previous experience that when students are required to download a new program, debugging installation issues can take up half the class time. So he was keen to try out our experimental, cloud deployment which runs Stencila inside Docker containers on a Kubernetes cluster. For this course, Giulio summoned the toads! They’re our new favorite thing - Tiny Open Access Data Samples! These small samples, available on Github, bundle awesome open datasets with tutorial style Stencila notebooks written in Markdown. With the cloud version, students were able to use Giulio’s TOADS to learn how to write SQL queries and plot the results in R, all in the browser, just by clicking a link. Students were able to focus on learning data analysis methods and code and not worry about how to clone repositories, connect to databases or pass data between languages. “Being able to get right to the code, thinking about the logic behind a query or the way in which data is organised is great!” - Giulio Valentino Dalla Riva This was a definitely an early beta test for us and we had a few hiccups! But we learned how to handle 40 people all working on reproducible documents at once! (Thank you to the students for being our beta testers.) The UBC students also tested our new RStudio integration. This integration makes it possible to view, edit, and save Markdown-based documents using Stencila locally in the browser. Giulio used this to assign homework. Students were able to open .md files in the browser, and edit them, and save the changes. “For students without a prior exposition to programming, it is important to reduce the cognitive overhead as much as possible. RStudio is wonderful, but it may scare some. Being able to work in a browser, in a slick clean interface, and interface smoothly SQLite and R boosted the students confidence.” - Giulio Valentino Dalla Riva, PhD We are happy to report a successful test case of Stencila as a basic data science educational tool for students with no coding experience. We learned that students without coding experience were able to jump right into R with the Stencila interface. We also learned that the cloud deployment is a valuable tool for beginners, there’s nothing to install (though it’s not quite ready for wide use now). “There were some hiccups, and students did find some bugs. But they were aware of operating on a bleeding edge technology, and that was part of the experience. Overall, I think they were very excited.” - Giulio Valentino Dalla Riva, PhD As with any good beta test, new feature requests came up and we uncovered a few bugs. For example, some students were confused about the syntax for inserting code cells in external languages like R and SQL (e.g. sql()) and said they would prefer a drop-down menu to choose the language. Do you want to make a toad? Tiny Open Access Data Samples are fun for all. We are happy to help you work with Giulio’s toads or check out one you’ve created! Join the conversation and share your thoughts on Stencila’s features, toads, and reproducible documents on our community forum community.stenci.la/. Read More ›

Invitación a Participar / Invitation to Participate
Heladia Salgado, Paula Andrea Martinez, Sue McClatchy / 2017-09-25
Invitación a participar Las Carpentries, el Nodo Nacional de Bioinformática México (NNB), y la Sociedad Iberoamericana de Bioinformática (SoIBio) les invitan a participar en el proyecto Carpentry para América Latina. Las Carpentries han generado material para enseñar a investigadores y estudiantes, las habilidades computacionales necesarias para realizar su trabajo de manera eficiente. Actualmente, las carpentries cuentan con más de una docena de lecciones creadas con técnicas de pedagogía actual. Estas lecciones se han promovido en talleres en más de 37 países. Carpentry para América Latina tiene la intención de promover este movimiento con la comunidad hispana. Tenemos varias actividades en pie en las que todos están bienvenidos a participar: Traducción al español de las lecciones de Software Carpentry y Data Carpentry Revisión del material traducido Mantenimiento de las lecciones traducidas Si eres instructor de Carpentry, participa como instructor en los talleres Carpentry en español en Latino América. Si no eres instructor Carpentry, hablas español y quieres enseñar, participa para certificarte como instructor. Si eres trainer y hablas español, participa de las sesiones de demostración en español. Si quieres escribir un post en español sobre tu experiencia con las Carpentries, comunícate con nosotros. Si tienes otras sugerencias, todas son bienvenidas! ¡Únete a este esfuerzo! Escríbenos a latinoamerica@carpentries.org y participa junto con nosotros. Para unirse a la lista de correo electrónico, visita https://groups.google.com/a/carpentries.org/forum/#!forum/latinoamerica Si estás interesado en mas información sobre los avances, visita https://github.com/carpentries/latinoamerica Team latinoamerica@carpentries.org Escrito por Heladia Salgado. Editador por Sue McClatchy and Paula Andrea Martinez Invitation to participate The Carpentries, the National Node of Bioinformatics Mexico (NNB) and the Ibero-American Society of Bioinformatics (SoIBio) invite you all to participate in the project Carpentry for Latin America. The Carpentries have lesson materials to teach researchers and students the computational skills necessary to perform their work. Currently, the Carpentries have more than a dozen lessons created with current pedagogy techniques. These lessons have promoted workshops in more than 37 countries. The project Carpentry for Latin America has the intention to promote this movement with the Spanish community. We have several current activities, including the following, where you are welcome to take part: Translating Software Carpentry and Data Carpentry lessons into Spanish Reviewing translated lessons Maintaining translated lessons Participating as an instructor in the Carpentry workshops in Spanish in Latin America if you are already a Carpentry instructor. Certifying yourself as an instructor if you are not a Carpentry instructor and you speak Spanish fluently. If you are a trainer and speak fluent Spanish, join the demo sessions in Spanish. If you would like to write a blog post about your experience with the Carpentries, get in touch with us. If you have any other suggestions, those are also welcome! Join in this effort! Write to latinoamerica@carpentries.org and participate with us. To join the mailing list, visit https://groups.google.com/a/carpentries.org/forum/#!forum/latinoamerica If you are interested to learn about the updates, visit https://github.com/carpentries/latinoamerica Team latinoamerica@carpentries.org Written by Heladia Salgado. Edited by Sue McClatchy and Paula Andrea Martinez. Read More ›

Software Carpentry Lesson Maintenance: Be Part of the Conversation!
Christina Koch / 2017-09-19
Come share your experience and ideas about maintaining the Software Carpentry lessons We invite all members of the Software Carpentry community to participate in two upcoming events centered around maintaining our lessons: Community Call this week on Thursday, September 21 Task Force for the future of the Software Carpentry lesson organization Community Call We will be discussing lesson maintenance on this week’s community call, happening on Thursday. On the call, we’ll gather community aspirations and concerns about the current lessons and their maintenance. We also will ask for everyone’s input on how we can best support and recognize the valuable work of our maintainers and contributors. See the etherpad for times, agenda and to sign up. Anyone from the community is welcome to join. Task Force As the plans for creating an umbrella “Carpentries” organization continue, it’s time to start thinking about what the Software Carpentry lesson organization will look like moving forward. To that end, I’m organizing a community task force to discuss the future of the Software Carpentry lessons and their oversight as we transition into a lesson organization of the future merged Carpentries. Have you wanted to be more involved with some of the decision-making around Software Carpentry lessons and the merger process? This is an excellent way to do so! It’s only a two-month commitment, so this is also a great opportunity to take on some community leadership responsibility and see what it’s like. Sign up on this etherpad to join us! Read More ›

Request for Comment: Share your thoughts on the future of The Carpentries
Kate Hertweck / 2017-09-19
We are now requesting comments on plans related to The Carpentries! A blog post last week provided history and some context behind the planning still in progress for the eventual merger of Data Carpentry and Software Carpentry into a unified organization, tentatively called The Carpentries. An outline of the planned structure, roles, and responsibilities of The Carpentries is now available, and we request your feedback through a series of Requests for Comment and related GitHub issues by October 6, 2017. Requests for Comment (RFCs, also called Requests for Public Comment) are a tool used by government groups and other organizations to solicit feedback on planned actions which may affect a broad community. So far we have attempted to keep you apprised of the planning process, but want to incorporate community input into the unified vision and plan presented in the following topics: RFC1 Organization and responsibilities of The Carpentries RFC2 Board of Directors RFC3 Membership Council (transition from current Software Carpentry Advisory Council) RFC4 Staff RCF5 Financial organisation RFC6 Subcommittees and task forces RFC7 Lesson Organizations Please head over to the GitHub repository and add your comments to relevant issues by October 6, 2017. If you prefer not to respond on GitHub, or would like to remain anonymous, you may respond to the RFCs using this Google Form. Read More ›

Introducing Elizabeth Williams and Karen Word
Tracy Teal, Elizabeth Williams, Karen Word / 2017-09-19
At our recent in-person staff meeting in Davis, California, we introduced three new members to the team, SherAaron Hurt, Elizabeth Williams and Karen Word. All will be working with the Carpentries part-time. Elizabeth has joined Software and Data Carpentry in a part time role as Business Administrator to assist with onboarding and supporting Member organizations and general business and financial operations. Here’s what Elizabeth has to say about herself: “After earning a B.S. in Cultural Anthropology at UC Davis, I have worked as a small business manager, a tutor, a bookkeeper, and an organization consultant. I am currently managing the Personality and Self-Knowledge Lab at UC Davis, and I am thrilled to have the opportunity to dedicate my (rather eclectic) skills and passions to the exciting and worthy mission of the Carpentries community.” Elizabeth has recently joined Twitter where she tweets as @ecwilliams8. We are delighted to have Elizabeth join the team and look forward to working with her. Karen Word is a post-doctoral researcher in Titus Brown’s lab at UC Davis. As a part of her work in the lab, she is working with the Carpentries on Instructor Training and will be the Deputy Director of Instructor Training. Karen will be involved with many aspects of the instructor training program, including training a new cohort of Trainers (watch for a call for applications soon!). As co-Maintainer of the Instructor Training curriculum (with Christina Koch), Karen will continue to improve and update those materials. She will also be actively involved in other curricular development efforts, including ongoing work on Data Carpentry Genomics and Data Carpentry Social Sciences curricula. Welcome to the team, Karen! Karen writes: “I have built a career on roughly equal parts teaching and research, with happy periods of exclusive focus on each. As an educator, I’ve taught high school and community college, at museums and outreach programs, and have served at the university level as both a TA and Associate Instructor. My scientific research has focused on ways in which organisms respond to environmental change, with emphasis on hormone signaling and metabolism. Most recently I have served (and continue to do so) as a postdoc in the Lab for Data Intensive Biology at UC Davis, where I am working on program assessment for our in-house bioinformatics workshops. I am delighted to be able to bring what I’ve learned through all of these experiences to bear on the Carpentries’ mission.” Instructor training is a huge part of our outreach effort, and we are delighted to have Karen assisting us with this important work. Read More ›

Community Service Awards - 2017 Edition
Christina Koch / 2017-09-18
Is there a Software Carpentry community member you’ve noticed working extra hard to help our organization? If so, consider nominating this person for a Community Service Award! The Community Service Award was inaugurated last year to recognize the crucial role of volunteer contributions to the work of the Software Carpentry Foundation. This award acknowledges individuals whose work, in the Steering Committee’s opinion, significantly improves the Foundation’s fulfillment of its mission and benefits the broader community. For full details, including how to nominate someone and a link to previous awardees, see this page. We especially welcome nominations in the next two months, as the Steering Committee will choose and announce awards in December. Read More ›

New Staff Member
Tracy Teal / 2017-09-13
We are delighted to announce that SherAaron Hurt has accepted the job of Workshop Administrator with the Carpentries. SherAaron is joining our team of workshop coordinators who manage workshop logistics, communicate with hosts and instructors, and respond to general workshop inquiries. SherAaron lives in Detroit, Michigan, USA. She has been very active in the National Society of Black Engineers, and has a strong background in logistics, marketing, and training and managing both staff and volunteers. Not only has she planned and run events both large and small, but she has a Masters in hospitality management to back up her experience. She is passionate about Software and Data Carpentry’s mission of teaching foundational computational and data skills to researchers, and is keen to help ‘lighten the load’ for instructors and workshop hosts. You’ll be seeing emails from her soon, and you can contact SherAaron at team@carpentries.org. Welcome SherAaron! Read More ›

Lesson Infrastructure Subcommittee 2017 September meeting
Raniere Silva / 2017-09-12
On 5 September 2017 at 14:00UTC+0, the Lesson Infrastructure Subcommittee had their 2017 September meeting. This post will cover the topics discussed and their resolutions. Software Carpentry and Data Carpentry merge With the upcoming merge, this subcommittee needs to start thinking about streamlining the thinking process and disconnect from the organisations that maintain the lessons. The responsibilities for this subcommittee will stay unchanged: maintain lesson template maintain lesson documentation maintain workshop template overview of what features the lessons will continue to have stay in contact with maintainers of lesson stay in contact with staff Lesson template, lesson documentation and workshop template will have a new home in 2018. If you have questions or want to help with this migration, Christina Koch is the person you should contact. During the migration, we will solve the divergences between the Software Carpentry workshop template and Data Carpentry workshop template. If you have questions or want to help with this migration, Tracy Teal is the person you should contact. Keyboard key visual look To improve the look and feel of the lesson and the learners’ ability to use them, we will make the keyboard keys that need to be pressed by the learner, look different from the other components of the text, so they are highlighted more effectively. We expect to merge the new CSS and documentation in the next few weeks and that release, 2018.6, will contain all of lessons with this new look. More information about this new feature is available at this pull request. Thanks to Brandon Curtis for proposing this idea. Jekyll/Liquid include for images/figures To improve readability by providing a more uniform image rendering, we will pursue the proposal on GitHub issue styles#161 after we review lessons unit test suite and its use by a continuous integration platform. Citing the templates If you are using the lesson template and you want to credit us, please use Software Carpentry: Example Lesson at Zenodo. Lesson release and hosting scheme For years, we have wanted to point learners to the latest release of our lessons but due to technical limitations of GitHub Pages and the challenges of multiple branches for contributors new to Git (for example, the current branch isn’t obvious when you visit the lesson homepage in GitHub, and maintainers can’t change the target branch of a pull request) we stayed with a single gh-pages branch in the Git repository. Jonah Duckles opened an issue to discuss possible solutions to this issue. If you want to contribute to the discussion please leave your comments on the GitHub issue. Fully-offline-capable functionality in lesson navigation vuw-ecs-kevin GitHub user requested that we improve the readers experience, if people come to our lesson from Zenodo, i.e. from one of our releases. Changes on the line of vuw-ecs-kevin’s pull request or Raniere’s pull request will be included in the next release of our lessons. Managing workshop websites and install instructions This is another old request [1, 2, 3, 4]. Edit only one line of index.html and have the correct setup instructions for the workshop. Jonah Duckles opened a new issue to discuss ideas to resolve our old request. Kate Hertweck, Christina Koch, Raniere Silva and Tracy Teal are going to work on strategic plan to address taking into consideration this request taking in consideration the comments on the GitHub issue. Next steps We will freeze lesson template and lesson documentation in October so maintainers have time to work on the next release. The subcommittee will meet again in November to provide an update on some of the topics covered by this post and discuss new requests from the community. Acknowledgement Thanks to Kate Hertweck, Maneesha Sane, Mark Laufersweiler, Naupaka Zimmerman, Person Paula Andrea Martinez, SherAaron Nicole Hurt and Tracy Teal. Special thanks to Christina Koch for the great notes. Read More ›

Reporting on the Second Software Carpentry Workshop in Namibia
Jessica Upani, Gabriel Nhinda / 2017-09-10
Background After attending this year’s African Instructors Meetup in Cape Town South Africa, Jessica Upani and Gabriel Nhinda from Namibia started laying the groundwork for another Software Carpentry workshop for Namibia. The previous workshop took place about a year ago as part of a 12-month programme to build computing capacity in Africa. This year however, the workshop was initiated, organised and taught by exclusively local instructors and helpers, Gabriel and Jessica with assistance from Ruber. The workshop ran on 18 – 19 August 2017 at the University of Namibia, at the Main Campus in Windhoek, Namibia. About four days before the workshop, 30 people (including instructors and helper) had signed up and we closed the online registration form. The Associate Dean of the School of Computing at UNAM provided support by making a venue available for the event and catering was sponsored by Talarify. Pre-workshop We held an installation party the day before the workshop started and three people showed up. Here we installed all the required tools and software. Compared to the last time we had a SWC Workshop in 2016, this step was faster and we were more ready to handle different errors and scenarios. Additionally, to try and reduce the number of participants who would show up at the installation party, we e-mailed the download links to all installation instructions as well as all the necessary data files, for all the participants to download. This meant that the people that showed up were ready for the workshop to begin. Day 1 The instructors and helpers made the final preparations to the venue before the attendees showed up. However the attendees were delayed and most did not show up at all. This was partly because there was a festival going on and day 1 fell on the same day, this was also the due date for all university examination scripts. In the end fifteen people showed up for day one. Gabriel introduced the workshop, and the aims and objectives of Software Carpentry to the participants and the helper (Due to work commitments, Jessica joined us two hours after the workshop started). Gabriel commenced the workshop with the UNIX Shell for the morning session. We had two people that were also at the 2016 workshop attend the 2017 workshop. This time we had attendees from Chemistry and Biochemistry, Biological Sciences, Physics and School of Computing. The lesson was interactive and we went as far as pipes and redirects for day 1 (We continued with “The UNIX Shell” at the end of Day 2 after the git lesson). The second half of the day was for Python led by Jessica, it was an awesome class since people also had specific questions that relates to their research work. Some of the participants asked that the instructors and helper remain behind during the break to have a look at their actual work and code they were working on. The day ended at around 17:30, one hour after the schedule, however considering that we started late, it balanced out. Day 2 Day two was reserved for Python and Version Control with Git (VCG). Jessica started off the day with the python lesson picking up from the previous day. Jessica did a marvellous job of explaining the content of the lesson and also keeping the participants aching for more. Some examples from how to use lists and declare variables came from the audience. The version control (VCG) lesson was led by a local developer, Paulus Shituna. Surprisingly, this lesson went faster than expected, it could be because of the fewer number of participants. Being that the VCG lesson finished ahead of schedule, the participants requested that we go as far as possible with the UNIX shell lesson. We worked all the way through Loops but could not complete the shell scripts. Lessons Learned Sometimes after setting your dates for the Carpentry workshop, other events and due dates might affect the attendance of the workshop. People tend to assume “African Time”. So be prepared to start your workshop an hour late. Before starting the workshop scope for those that are using different Operating Systems compared to the one you are using and make sure to cater to all participants. Charging a small fee might actually encourage commitment from attendees. Just because you have reached your maximum number of participants, don’t close your registration process. Rather screen those that registered, maybe by asking a few questions during the registration process. Have another instructor or helper time the lessons to avoid going over the time limit or deviating from the lessons too much. This eats time and results in not completing the lessons. Observations Some peer instruction was also observed as the attendees tried to help their colleagues when they saw a red sticky note. For the major part they managed to solve the issue, often it was syntax, indentation on Python or Case sensitivity, which caused most of the errors. A question was raised as we went through the Python Lesson with regards to the examples that were used in one of the chapters. We had attendees whose first language was not English/ not fluent and as such we often had to use alternative explanations to get concepts across. In our case we had one attendee who spoke Portuguese and one of our instructors was able to provide assistance albeit their Portuguese not being fluent. Some images from Namibia, a country with vast spaces (total land area of 825,615 km2 with a population of around 2.5 million people) and incredibly beautiful and diverse landscapes. (Coulage images from https://pixabay.com, created in https://www.befunky.com). Conclusions Although the workshop was poorly attended, we think it was successful. This is mainly because the people that showed up really wanted and needed the knowledge they acquired during this workshop for both their studies and research work. To this end, we are planning on having a study group to discuss Python, UNIX, and any other topics related to applying computing to research. We would like to thank Anelda from Talarify for the sponsorship, mentorship and overall doing an awesome job of making sure we had all that we needed for this workshop. Another thank you goes to our helper and participants for making this workshop a success. Read More ›

Software Carpentry Introduces Mauritian HPC Users to Tools for Data Analysis
Anelda van der Walt, Bryan Johnston / 2017-09-10
The Square Kilometre Array (SKA) has been branded as one of the biggest scientific projects to date and spans not only country borders but also continents. Although the SKA’s primary aim is to address many of the questions around our universe, the spin-offs of this project will touch people from all research disciplines as well as communities around the SKA sites. One of the spin-offs is expanded human and infrastructure capacity in terms of High Performance Computing (HPC) in other African countries. The South African Centre for High Performance Computing (CHPC) is involved in a programme named the HPC Ecosystems Project. The Ecosystems Project focuses on the distribution of decommissioned HPC equipment to be used as mid-tier systems at various sites across Africa. This is followed with training of the system administrators to run the equipment. Over the past few years the CHPC has been working with countries in Africa to: develop an African Framework on HPC that has been adopted by the SADC Ministerial Committee on Science and Technology; facilitate access for African researchers and students to HPC training programmes in South Africa; providing access to HPC facilities for researchers on the continent; and through the partnership with the Texas Advanced Computing Center and the University of Cambridge, parts of the HPC systems have been provided to African sites to develop computing capabilities. The first country to host a Software Carpentry workshop in conjunction with the deployment of the donated HPC infrastructure is Mauritius. The event was sponsored by the CHPC, Talarify, and the University of Mauritius. From 19 - 21 July this year, we ran a Software Carpentry workshop to potential users of the new, and first HPC system, at the University of Mauritius. Participants hailed from disciplines such as Bioinformatics, Computational Chemistry, Mathematics, Life Sciences, Engineering, Business, Medicine, and more. A total of 27 participants, mostly postgraduate students and faculty from the University of Mauritius, learned about the Linux Shell, Python, and version control with git and Github. The feedback in general was good (90% of the participants said they will recommend this workshop to colleagues) and several people indicated that they would be interested to become instructors. 50% of participants were females. Mauritius is a fascinating country with total area around 2,040 km2 and a population of around 1,348,242. Over the past few years it has evolved from mostly an agricultural community to a knowledge economy with information and communication technology, seafood, hospitality and property development, healthcare, renewable energy, and education and training fast becoming large drivers of the economy. People mostly speak English and French with the local language, Creole also in the mix. The country has six universities and many other educational institutions. It was a great opportunity to work with our host, Roshan Halkhoree (Director, Centre for Information Technology and Systems) and colleagues from the University of Mauritius and we look forward to future collaborations around data and computational capacity building. Read More ›

The First Ever Data Carpentry in Ethiopia
Lactatia Motsuku, Glenn Moncrieff, Margereth Gfrerer, Anelda van der Walt / 2017-09-10
The Ethiopian Education and Research Network (EthERNet) from the Ministry of Education in collaboration with the German International Cooperation (GIZ) Sustainable Training and Education Programme (STEP), the Education Strategy Center (ESC) and Talarify organised its first ever Data Carpentry workshop for young academics and researchers in Ethiopia. The workshop was conducted over two and a half days from 14-16 August 2017 at Addis Ababa Institute of Technology (AAiT). The main aim was to increase data literacy for researchers and establish a community of good research data practice in Ethiopia in order to increase the presence of Ethiopian researchers in the global research community. Note: UNESCO Statistics Institute reveals that in 2016 1.1% of the global research community are researchers coming from Sub-Saharan Africa. On average 30.4% from all Sub-Saharan researchers are females, whereas Ethiopia counts 13.3 % female researchers out of all Ethiopian researchers. Over 25 participants from all over Ethiopia joined the workshop. 98% of participants were women representing different research disciplines including animal nutrition, soil sciences, economics, sport sciences and information technology to name a few. The event was lead by Data Carpentry instructors from South Africa with helpers from Ethiopia and mainly covered lessons included in the Data Carpentry Ecology workshop - better use of Spreadsheets, data cleaning in OpenRefine, and data analysis and visualisation in R. Lactatia Motsuku from the South African National Cancer Registry and Glenn Moncrieff from Ixio Analytics recently trained as instructors and this was their first opportunity to teach as part of the Data Carpentry team. Our instructors’ experience: Lactatia: “As much as this was the first data carpentry for Ethiopia, this was the first instructor training for me. Before I joined Data Carpentry network, I had no idea what I can do to make changes in other people’s lives. It made me happy to see the transition in only three days i.e. From participants having no idea where to start with data analysis to facial expressions as they sigh “Oooh, okkayyy” and the nodding as they realise that there are beautiful, efficient and very effective tools to work with data. Ethiopia is a very nice place, very religious and full of kind people. I really enjoyed the food. It was organic, healthy and delicious. I bow down to the coffee ceremony and Margareth was a beautiful host, she managed to organise us dinner at 2000 Habesha which is a traditional restaurant with touch of Ethiopian music. It was great to see my colleagues doing The Shoulder dance.” Anelda: “Ethiopia was such a wonderful surprise to me. There are some very ancient traditions and experiences. For example, they have a different calendar that derives from the Egyptian calendar and has a 7-8 year difference from our own calendar. They also regard the day to start at sunrise which means 6 am is regarded as 12 o’clock in Ethiopian time. A meeting scheduled for 2 pm might be misunderstood to start at 8 o’clock due to the 6 hour difference between the Western clock and the Ethiopian time. When I realised the impact that both the time difference and calendar difference may have on research and reported data, it was an eye-opener. Meta data in this instance will be critically important so that collaborators and future users of the research data generated in Ethiopia, can understand exactly which calendar and what time system was used. I hope it will be possible for me to return to Ethiopia to learn more about this beautiful country and its people.” Glenn: “This was also my first workshop as a Data Carpentry instructor. I was encouraged by the enthusiasm of the students and their ability to absorb the vast amount of information we shared with them. When their faces begun to light up at the realization of the capabilities they were acquiring through the software we were teaching, I realized why Data Carpentry is so important. The kindness of the Ethiopian people, the richness of the culture, and the delicious food were all amazing added extras. Seeing the potential impact of Data Carpentry in Ethiopia inspires me to come back again soon and help to grow the seed that has been planted.” Read More ›

Joint future for Software Carpentry and Data Carpentry
Rayna Harris, Tracy Teal / 2017-09-02
“If you want to go fast, go alone. If you want to go far, go together.” Software Carpentry and Data Carpentry are sister organizations focused on teaching computational best practices to scientists. They are currently independent organizations with their own fiscal sponsorship, Steering Committees, governance model, and bank accounts. However, as is perhaps no surprise the organizations’ operations have evolved to share memberships, infrastructure for workshop coordination, an instructor training program, and even some staff members. This ‘separate but collaborative’ organizational structure has allowed us to build a shared community of instructors with more than 1000 certified instructors and 47 current Member Organizations around the world. As Software Carpentry and Data Carpentry continue to grow and develop, this ‘separate but collaborative’ organizational structure will not scale. The governing committees of both Software Carpentry and Data Carpentry have recognized that as more mature organizations they can be most effective under a unified governance model with reduced operational overhead and streamlined support for curriculum development and maintenance. Over the last few months, a joint group of representatives appointed from (and regularly reporting back to) the governing committees of both organizations has been exploring and moving towards merging the governance and staff organizations to officially recognize this shared alignment and vision, and going forward to best support the community, member organizations, and curriculum. On August 30, 2017, the Software Carpentry and Data Carpentry Steering Committees met jointly and approved the following two motions, which together form a strong commitment to continue moving forward with the merger, and to eventually hand off governance to a joint Carpentries Steering Committee: Approve merger of Software Carpentry and Data Carpentry: The Software Carpentry and Data Carpentry steering committees approve the merger of the two organizations into a single umbrella organization with associated lesson organizations, with a starting date of January 1, 2018. Approve appointed members of combined board: We appoint Karen Cranston, Kate Hertweck, Mateusz Kuzak, Sue McClatchy, and Ethan White to the board of the umbrella organization. We are now very excited for the next steps in putting together the Carpentries! Even though the two motions passed this week lay the groundwork for a merged organization, there’s still a lot of work to be done on the details of how this will all come together. The two Steering Committees (which remain in command until Dec 31) will be putting together Requests for Comment, because community input in decisions and structure around everything from governance to curriculum oversight will be key. A very brief history of Software and Data Carpentry What does this mean for our instructors? Software Carpentry and Data Carpentry have already unified their instructor training to have one Carpentries instructor certification and program. Hence, the instructor training program will continue as it is. People who are already instructors can continue to teach workshops as they already do! Nothing substantive will change. There may be some updates on email list locations, but it will remain the joint Carpentries instructor community that it already is. The new Carpentries Board of Directors will include elected positions, and instructors will be the electorate casting votes for those, as they have in the past for the Software Carpentry Steering Committee. More information about the elections will be coming out in October. Please also consider running yourself for a position on the Steering Committee to help guiding the Carpentries through this next phase! If you’re interested in more information on elections now, please contact Kate Hertweck (k8hertweck@gmail.com). What does this mean for our Member Organizations? Memberships are already joint between Software Carpentry and Data Carpentry, so there will be no changes with memberships. All signed, pending and upcoming membership agreements will remain valid and will simply changeover to the Carpentries after January. We will be transitioning the Software Carpentry Advisory Council to a joint Carpentries council, and we will keep members updated on that shift. Proposed organizational structure and leadership Members of both the Software Carpentry and Data Carpentry Steering Committees and staff have been meeting regularly to outline the steps needed for transitioning from two independent organizations to one united organization. The Software Carpentry and Data Carpentry steering committees have approved the following structure to be effective on January 1, 2018: A single umbrella organization (tentatively named The Carpentries) with associated lesson organizations A governing Board of Directors composed of 9 members (5 appointed, 4 elected), each serving a two year term without limits on the number of terms. An Executive Director who reports to the Board of Directors. Initially, this position will be offered to Dr. Tracy Teal. A Director of Business Development who reports to the Executive Director. Initially, this position will be offered to Jonah Duckles. Software Carpentry and Data Carpentry will remain as distinct lesson organizations with their unique brand. Currently, we are articulating the roles and responsibilities of the unified organization and the associated lessons. Below is a brief summary of the responsibilities we propose will fall under the Carpentries and under the lesson organizations. Proposed structure and responsibilities of The Carpentries and the Lesson Organizations Just in case you are wondering who the current Steering Committee members are and who will be on the 2018 Board of Directors, here’s a table for your reference. (M = Member of the current merger committee that serves as the liaison between the community and the Software Carpentry and the Data Carpentry Steering committees during the planning and execution of the merger.) 2017 Software Carpentry Steering Committee(all elected) 2017 Data Carpentry Steering Committee (all appointed) 2018 Carpentries Board of Directors (5 appointed, 4 elected) Rayna HarrisMKate HertweckMChristina KochMateusz KuzakKarin LagesenSue McClatchy Karen CranstonMHilmar LappMAleksandra PawlikKarthik RamEthan White Karen CranstonKate HertweckMateusz KuzakSue McClatchyEthan Whiteto be electedto be electedto be electedto be elected Next Steps in the merger There are many areas of work to be done before January 1, 2018. Some of the things we are working on include: Articulating the bylaws for the Carpentries Articulating the policy and leadership within each Lesson Organization Posting Requests For Comment on bylaws, structure and policy for community comment, incorporating feedback. Approving a unified budget Launching a new website Electing the 2018 Board of Director members Updating or crafting mission and vision statements And so many more details… Questions, comments, want to learn more? If you have questions, comments or just want to learn more than what’s already posted here, please get in touch with staff at team@carpentries.org, Tracy Teal at tkteal@datacarpentry.org, Jonah Duckles at jduckles@software-carpentry.org, the Software Carpentry Steering Committee president Kate Hertweck at k8hertweck@gmail.com or the Data Carpentry Steering Committee at board@datacarpentry.org. Also, feel free to comment on this post, or start a new conversation at https://github.com/carpentries/conversations Community is and will continue to be a key component in this merger, so comments and discussion are always appreciated! Read More ›

Waxing poetical in a Software Carpentry workshop
Brian Ballsun-Stanton / 2017-08-31
Hi everyone! I just finished running my first segment of a Software Carpentry workshop. I massively overprepared last evening and my notes for the first four lessons of unix shell are on google drive if anyone would care to reuse them. In any case, one of my students (who had audited one of my other classes) made the following tweet: “Never heard anyone present fundamental computing concepts so poetically as @DenubisX @swcarpentry #MacquarieUni” At which point, Belinda responded: “Liking the sound of this - will you write up a workshop post for us, Brian?” And so here I am, accused of poetry. I think because I semaphored my arms when describing arguments/flags. I stood in front of the group and waved my arms around like I was holding flags, directing trains down different tracks. I also used call and response, asking the audience to complete my statements (after introducing them a few times), which served to introduce nice pauses into the presentation, allowing people to process and ponder. But … none of this is poetry. I think I was accused of poetry because during the first discussion of command line interfaces (CLIs), I stepped away from the computers and varied my voice and volume and pitch when looking at the nature of computers. And lots of emotive body language. Speaking to the learner during the lunch break, she tweeted when I was unpacking the why of CLIs in unexpectedly non-mundane (perhaps florid, or purple) detail, sharing my enthusiasm for the ideas with the class. And most people seemed to respond well to that sort of engaging and obviously “I am interested by this topic and I hope to share my excitement with you” framing. Yay! I think the lesson to take away here is: speak passionately when you can, infecting the learners with your passion at the start of the lesson sells them a reason to engage. (And their body language will tell you if the enthusiasm was, indeed, sold. Pay attention to body language.) Read More ›

All About CarpentryCon
Belinda Weaver / 2017-08-30
What is CarpentryCon? CarpentryCon aspires to become a major learning, skill-building and networking event for our global Carpentries community. We want to help Carpenters develop the skills they need to get the careers they want. We want to help them create supportive local communities based on our values of openness and sharing. We also want to give newcomers the chance to network with more experienced people in our community, so that a range of hard-won knowledge can be passed along to a new generation of leaders. Under the theme Building Locally, Connecting Globally, this first-time event will stand on three pillars – community building, professional development, and networking. Community Building By bringing together members of the Carpentry community, including instructors, partners, advocates, and staff, together with people sharing similar interests from around the globe, we will discuss tried and tested methods for creating local communities within the larger Carpentry community. Our “come and learn” format will include success stories from people who have built communities across six continents. Sharing Knowledge We will provide current and future Carpentry community leaders opportunities for continued learning and professional development. This will include sessions on teaching methods, curriculum development, and community organization. Professional development sessions might cover technical skills (such as writing R apps with Shiny), pedagogy (such as lesson maintenance), and leadership (such as recruiting and retaining local helpers, or leading projects). Networking We will provide networking opportunities so Carpentry community members can meet peers and share skills and perspectives. Participants can come together both formally and informally to share stories about challenges and successes, and to make new friends. Un-conference While there will be a range of structured sessions, we also want to include some ‘un-conference’-like sessions at CarpentryCon. This will allow for spontaneity, where attendees can decide the direction and discussions they want. Where will it be held? And when? As yet, we don’t know (though it will be some time in 2018). We have just posted a form to allow potential hosts to offer venues. Please send this out as widely as you can, or consider bidding to be a host yourself. We do know that CarpentryCon will be a three-day, high-intensity event. Connecting people from different communities in both industry and academia, CarpentryCon will allow us to celebrate how far we’ve come as a community and make plans to go on to do even greater things. How can you get involved? Join the task force headed up by Fotis Psomopoulos and Malvika Sharan Spread word about the venue bid form Publicise the event through your channels Catch up on what’s been happening Volunteer as a helper/organizer at our meetings Read More ›

Publishing our lessons, Version 2017.08
Kate Hertweck, Rémi Emonet / 2017-08-24
We are pleased to announce the latest publication of Software Carpentry lesson materials, from release Version 2017.08. Although most of our lessons are fairly mature, we had almost 100 new contributors. We have used this opportunity to improve the process through which lessons are released, some parts of which were previewed in this blog post. This release includes linked ORCID identifiers (visible on the Zenodo page next to the author’s name) and has allowed for individuals to opt-out as contributors. You can learn about the release process on GitHub, and can view this and previous archived versions on the release page. Publication records Latornell, Doug (ed): “Software Carpentry: Version Control with Mercurial” Version 2017.08, August 2017, https://github.com/swcarpentry/hg-novice/tree/2017.08 10.5281/zenodo.838760 Gonzalez, Ivan and Huang, Daisie (eds): “Software Carpentry: Version Control with Git” Version 2017.08, August 2017, https://github.com/swcarpentry/git-novice/tree/2017.08 10.5281/zenodo.838762 Capes, Gerard (ed): “Software Carpentry: Automation and Make” Version 2017.08, August 2017, https://github.com/swcarpentry/make-novice/tree/2017.08 10.5281/zenodo.838764 Kiral-Kornek, Isabell and Srinath, Ashwin (eds): “Software Carpentry: Programming with MATLAB” Version 2017.08, August 2017, https://github.com/swcarpentry/matlab-novice-inflammation/tree/2017.08 10.5281/zenodo.838766 Bekolay, Trevor and Staneva, Valentina (eds): “Software Carpentry: Programming with Python” Version 2017.08, August 2017, https://github.com/swcarpentry/python-novice-inflammation/tree/2017.08 10.5281/zenodo.838768 Wright, Tom and Zimmerman, Naupaka (eds): “Software Carpentry: R for Reproducible Scientific Analysis” Version 2017.08, August 2017, https://github.com/swcarpentry/r-novice-gapminder/tree/2017.08 10.5281/zenodo.838770 Chen, Daniel and Dashnow, Harriet (eds): “Software Carpentry: Programming with R” Version 2017.08, August 2017, https://github.com/swcarpentry/r-novice-inflammation/tree/2017.08 10.5281/zenodo.838772 Devenyi, Gabriel and Srinath, Ashwin (eds): “Software Carpentry: The Unix Shell” Version 2017.08, August 2017, https://github.com/swcarpentry/shell-novice/tree/2017.08 10.5281/zenodo.838774 Cabunoc Mayes, Abigail and McKay, Sheldon (eds): “Software Carpentry: Databases and SQL” Version 2017.08, August 2017, https://github.com/swcarpentry/sql-novice-survey/tree/2017.08 10.5281/zenodo.838776 Silva, Raniere and Emonet, Rémi (eds): “Software Carpentry: Example Lesson” Version 2017.08, August 2017, https://github.com/swcarpentry/lesson-example/tree/2017.08 10.5281/zenodo.838778 Silva, Raniere and Emonet, Rémi (eds): “Software Carpentry: Workshop Template” Version 2017.08, August 2017, https://github.com/swcarpentry/workshop-template/tree/2017.08 10.5281/zenodo.838780 Read More ›

Feedback of Champions
Belinda Weaver / 2017-08-24
Jonah Duckles and I hosted our first Community Champions call on 22 August (23 August for us southern hemisphereans). Twenty-five people signed up for the call. We had attendees from the US (several locations), the UK, Canada, the Netherlands, plus me in Australia and Jonah in New Zealand. We also had a range of expertise - some old hands, and some keen to kickstart a brand new community. People shared experiences about what had worked locally - these involved regular drop-in sessions like Hacky Hours, as well as more formal arrangements like local study groups or big events like the three-day Research Bazaars, which combine workshops with more informal sessions such as lightning talks, knowledge bazaars, meet ups, stalls, and fun and games. There were 14 Research Bazaar events held in 2017 in locations ranging from Oslo to Tucson and five cities in New Zealand. The first was held in Melbourne in 2015. This spawned 10 in 2016 in countries such as Ecuador, Canada, and Australia. Local activities Mateusz Kuzak from the Netherlands talked about the Study Group that runs at Science Park Amsterdam. The group mostly comprises plant physiology and neurobiology researchers, with biodiversity researchers now joining in as well. More informal meetings are also held bi-weekly in a local cafe, where people can come for help with tools like R, Python, Snakemake, and git. Mateusz is keen to expand the instructor base in the Netherlands too with instructor training happening in November. In Brisbane, Hacky Hours are run weekly at both The University of Queensland and at Griffith University, with a new HackR Hour at the other Brisbane university, QUT. Software Carpentry instructors and helpers tend to be the key drivers of these events. Queensland universities also collaborate to run Research Bazaar, with successful events in both 2016 and 2017 where 11 workshops were run, including two Software Carpentry workshops and an advanced R class. Meetups are another great networking tool to build community - Brisbane has monthly data science, Python and Hackers/Hack meetups. Australia also has a bioinformatics student group called COMBINE, many of whom train as Software Carpentry instructors. Forging links with groups like these help with cross-promoting events and community building. At UC San Diego, work to expand the instructor community is underway. Two Software Carpentry workshops per year are offered through the library. The University of Oklahoma runs three workshops per semester for fall/spring, with summers set aside for special requests. Open office hours are also run for four hours a week in two campus locations. Carpentry instructors meet monthly to network and share ideas. The University of Arizona/CyVerse now run an annual Research Bazaar, as well as regular Hacky Hour and PhTea drop-in advice sessions. Work is underway to build a Data Science/Literacy initiative at the university. Three large (50-100 people) Software Carpentry workshops are run annually, with ten smaller, more focused workshops run as well. There is a strong instructor/helper community, with the aim of building a strong community of practice, and linking up with local initiatives such as Python or Big Data meetups. At the University of Michigan, “flagship” workshops are run 3-4 times a year, along with workshops sponsored by specific departments/groups. They are interested in creating a pipeline of learners –> helpers –> instructors to ensure the sustainability of the community. There are R user groups at the University of Florida (UF) and at York University in Canada (which also has a PyData group). UF also has a Carpentries Club for instructors, while the UF Libraries are hoping to fund a community organizer position through an internship or fellowship. UW Madison has ComBEE, a Hacky Hour-style group. Among other activities, they also host both R and Python study groups, which complement the Carpentries workshops they run on campus. People used a range of methods to stay in touch with local groups, with Twitter, slack channels, email/email lists and regular meetups being the most common. Getting started Newcomers to community building were keen for tips on creating a community out of nothing. One way to fund workshops is to try to source funding via grants proposals. At the University of Oxford, ideas are wanted on how to turn enthusiasm into actual workshops, since the legwork involved in making workshops happen is challenging. As an outcome of these discussions, we aim to create a playbook for community building. This growing document will outline the successful strategies people have already used. It would include checklists, some best practice guidelines, and some success stories. This playbook would be made available as an open source tool, but also could be worked up as a paper to publish. This idea got the thumbs up from attendees. To sum up, these are the mechanisms most in use. We welcome more ideas. Stay tuned for our next Champions call in November. Open Help Sessions (Hacky Hour, PhTea, Digital Scholarship Office Hours …) User Groups (R Users, Python Users, discipline-specific meetups) Study Groups Research Bazaar (also known as ResBaz) events ThatCamp Read More ›

15 July -15 August, 2017: Writing a blog post, Instructor Training Curriculum, Merger, League of Champions.
Martin Dreyer / 2017-08-18
Highlights It should not be a painful experience to write a blog post, so please reach out to us! Have a look at the approved motions for the Carpentries merger. We are pleased to announce the Instructor Training Curriculum has been updated and will be released end of August. Tweets Our Carpentry Clippings newsletter is out. Read it here if you missed it. Contributed to a lesson? Let us know so you get credit. Donate to Software Carpentry to support workshops in new places. Want to get a Software Carpentry workshop at your institution? Here’s how. Show your support for opensource datascience — become a NumFOCUS member. Did you know we have a mailing list to discuss our R lesson + other R-related issues? Post questions there. Building a community: 3 months of Library Carpentry. General Please feel free to provide suggestions to our Community Lead via the Google form. We invite all our community champions to join the Carpentry Champions call on 22 August to come and share their knowledge and learn from each other. 21 workshops were run over the past 30 days. For more information about past workshops, please visit our website. Upcoming Workhshops: August University of Tasmania, University of Oklahoma, University of Michigan, Vanderbilt University Medical Center, University of Namibia, University of Arizona, Tucson, UW Madison, National Center for Supercomputing Applications. September University of Würzburg, Ghent University, Oregon State University/CGRB, University of Chicago, University of Southern Queensland, Macquarie University. October The River Club, Aarhus University, UCLA, European Molecular Biology Laboratory, Institute for Theoretical Physics UAM-CSIC. Read More ›

The Totally Bearable Lightness of Being an Instructor ... So Finish that Training!
Juliane Schneider / 2017-08-17
Also, there’s this thing at UCLA going on September 7-8 … Greetings, my Library Carpentry community peeps! Hope your summer is going as well as mine, which means about as fine as the coffee in Twin Peaks, or a Fellini film - take your pick. Several months ago, I wrote a post about the amazing time I had in Portland at csv,conf and at the Library Carpentry Instructor Training that Belinda Weaver and Tim Dennis taught, ably assisted by myself, John Chodacki and a spatula. We had a wonderful cohort of instructors-in-training, all of whom I know would rock a room with their Bash and OpenRefine instructional skills. So I say to thee, oh #porttt cohort, if you haven’t completed the final steps of your instructor certification, get on it, because we need you! Remember how much fun we had in Portland? You can have that again, by holding your own Library Carpentry workshop and building your local community of data-savvy colleagues. Steps to certification: 1. Contribute to a lesson. Library Carpentry just had a very successful writing sprint, and we have a lot of open issues to resolve so there are many opportunities to contribute, even if it is spelling corrections or link updates. The downside of this is that it can be confusing to figure out HOW to do this and what issues are actually available to work on. If you find it daunting to know where to start, or how to get through the GitHub workflow, please - PLEASE - contact me, Tim Dennis or John Chodacki and we’ll help you through it. Promise, it’s eezee peezee! 2. Participate in an online discussion. This is the easy one. Jump online and hang out! Ask questions about doing workshops, ask about specific lessons, or, like me, have an ecstatic geekfest discussion about the glories of OpenRefine, in my case with Kate Hertweck. Here’s the schedule. Sign up for a session and knock this one off the certification list. 3. Teach a short demonstration session. This is easier than you think. (LOUD WHISPER: you are on videoconference doing live coding/demo, so you can use notes and nobody sees you looking at them!) We are mostly nice people, except for me when when confronted by MARC XML or a poorly made gimlet. So, you will be fine … unless you do something outlandishly egregious, like perhaps saying “just” a lot or admitting out loud that you didn’t like OpenRefine. Sign up for your demo. That’s it! If you want what I just typed in more formal and less metricious language, see the full checkout procedure. I know, it’s SWC, but we’re following the same procedure, so you are in the right place. Now. NOW. To the fun part. Team Spatula Returns! Belinda, Tim, John and I will be at UCLA September 7-8, spreading the word about Library Carpentry and having discussions about community building and running a Library Carpentry workshop (or watching some of our new instructors teach one, we hope). In addition, we will be having yet more fabulous times after hours with all of our new best Library Carpentry friends. As we finalize the schedule for this reunion event, we’ll keep you all informed, and remember, if you are experiencing any kind of roadblock or confusion over your final LC instructor certification steps, please contact one of us. We will be more than happy to help you complete the steps and start planning a workshop! (Hint: if you are still completing certification and are in the LA area and want to get certified in time to be an instructor for the September 7-8 event, we’ll help you get there, and get you your first instructor experience.) Basically, if you need any support, PING US. We are are here to help. And we hope to see some of you soon in Los Angeles. Till next time, Juliane Read More ›

Instructor Curriculum Updated
Belinda Weaver / 2017-08-14
Sixty-seven pull requests were successfully merged during the recent update of the Software and Data Carpentry instructor training curriculum. Most of these PRs were merged during the recent 24-hour Bug BBQ that ran over 3-4 August. More than 20 instructors and trainers were involved with the update. A few pull requests remain to be merged before the material is ready to be re-released. This should happen by the end of August. The spadework for what needed fixing was established during an issue bonanza in July. Erin Becker then created a plan of work for people to tackle during the Bug BBQ. Some new material has been added, for example, Ted Laderas’s contribution on the importance of ‘grit’ - sticking with something even if it is a little bit challenging. Where material was duplicated, that has been addressed as well. After the Bug BBQ, Christina Koch commented: ‘It’s really cool to see how having lots of different people contribute makes the whole thing so much better. We had a new contributor [Ted Laderas] who added great material on error framing / grit, Rayna [Harris] and Lex [Nederbragt] both added really helpful diagrams for Bloom’s taxonomy, there were good additions to the instructor notes, and all of the “little” changes ([fixing] typos, weird sentence structure, etc.) add up to make a BIG difference in the quality and professionalism of the material.’ She went on to say how important it was to ‘organize and tag issues so that there was a lot of good “low-hanging fruit” for people to bite off without getting too overwhelmed. Lots of small issues is definitely the way to go, and it’s important to resolve issues as they’re addressed via PRs so that it’s clear what still needs work.’ Christina would be interested to hear from other people about the pros and cons of using tags to label issues for hackathons and whether or not organisers made it clear to participants how to find issues using those tags. The Data Carpentry chatroom provided a central place for people to network and ask questions during the Bug BBQ, and a few people hopped on zoom calls as well to chat face-to-face. Our instructor training curriculum is crucial to growing our instructor community across both Software and Data Carpentry, so it is pleasing to see the material tidied up and, in some places, re-ordered to provide a more logical flow. Thanks to everyone who came along and took part in the Bug BBQ. Your contributions were very much appreciated. Read More ›

Motions approved for Data Carpentry & Software Carpentry Merger
Rayna Harris / 2017-08-07
I am happy to announce that the Steering Committees of both Software Carpentry and Data Carpentry have approved 4 motions regarding the structure and leadership of the mergered Carpentries organization. The approved motions are: Motion 1 The Board of Directors for the combined organization will be composed of 9 members, each serving a two year term without limits on the number of terms. Five members will be appointed through a process of nomination to the board followed by voting by board members. The other four members will be elected by the membership of the organization. Background: We anticipate that role of Board is governance / steering rather than execution / operations. Appointed members ensure that the Board has the expertise desired for leading an organization with the legal and financial responsibilities of the combined organization, while elected members continue on the democratic traditions of SWC and allow interested community members to be part of the leadership. Motion 2 The combined organization will have an Executive Director who reports to the Board of Directors. Initially, this position will be offered to Dr. Tracy Teal. Background: The ED is the link between the Board and the operations of the organization. The ED will have autonomy to make decisions about running the organization, given strategic direction from the Board. Motion 3 The combined organization will have a Director of Business Development who reports to the Executive Director. Initially, this position will be offered to Jonah Duckles. Background: Business development is critical to the long-term sustainability of the organization. In the merger of two organizations, each with an ED, this clarifies roles and reporting. Motion 4 Existing subcommittees and task forces will have a point of contact from among the staff, rather than reporting directly to the Steering Committee. Background: The subcommittees perform important work of the organization. They currently report directly to the SWC Steering Committee, which is inconsistent with a Board responsible for governance, not operations. The subcommittee’s should instead work directly with staff, overseen by the ED. Timeline for the merger See the overview all the steps we will be taking in the next few months here: https://software-carpentry.org/blog/2017/06/merger.html Read More ›

The Champions League
Belinda Weaver / 2017-08-07
When people sign up for our newsletter, Carpentry Clippings, many mention community building as one of their key interests. We are lucky in the Carpentries to have so many people who want to build local groups. These people are our community’s champions. They are the hard workers who volunteer their time to organise and run local workshops, recruit new instructors and helpers, teach and maintain our lessons, serve on committees and task forces, and generally help further the Carpentries’ mission of skilling up researchers to do more efficient, reproducible science. Many of these champions have fantastic local knowledge, and many provide the backbone without which local events such as Hacky Hours, Research Bazaars (ResBaz), study groups or communities of practice around certain disciplines or tools would not exist. What they know about community building is enormously valuable. We would like to connect these champions together in a network together so they can share tactics and expertise. Our ultimate aim is to develop a community building ‘playbook’ so that tried and tested methods can be transplanted easily to new spots around the world as we welcome more and more people into our Carpentries community. To kickstart a conversation, Jonah Duckles (Executive Director, Software Carpentry) and I (as Software and Data Carpentry Community Development Lead) are planning to host a Carpentry Champions call at 8pm UTC on 22 August - check the local date and time in your location. Sign up for the call on our etherpad, which will provide a list of talking points and all the connection details. We welcome experienced and aspiring champions to come learn from each other about how to best spread the Carpentries approach to teaching and collaborating to new organizations and new places. Come share your stories and learn from each other! We hope to see you there. We will write up the results of the call, and post it on this blog. We will also start a GitHub repo to capture all your great community building ideas. Stay tuned for more details. Read More ›

Keep Calm and Write a Blog Post
Belinda Weaver / 2017-07-25
Writing a blog post should not be hard. A blank screen can seem very daunting, but if you populate it with a few questions, such as Who? What? Where? When? Why? and How?, then you have the genesis of a post. Workshop report? Once you’ve answered the questions above, you’re pretty much done. Throw in something that went well or a funny story about something that didn’t, and that’s it. Conference reportback? Ditto. Project you’re working on? Ditto. Perhaps you’ve stumbled across some new tool that you love so much you want to tell the world? Tell us why you like it and what you use it for, and that’s a post right there. If you are not sure how to format your post for our blog, check out our CONTRIBUTING.md file, which now includes instructions on posts. If that all seems too hard, feel free to email your text to me, and I will arrange to post it for you. We want our blog to be useful to our community, and a multiplicity of voices helps with that. Read More ›

Feedback on feedback
Belinda Weaver / 2017-07-20
When I started as Community Development Lead, I posted a Google form to ask people in our community for feedback on what they thought was the main issue I should address. I got a range of responses, from concerns about lesson maintenance to a call for more briefings to help people stay connected. Also raised were instructor retention - how do we prevent burnout so good people don’t leave? - and instructor involvement - how do we increase the numbers of instructors reading our blog posts, signing up for the newsletter, getting involved in community calls and committees like mentoring? ‘Quality control’ of workshops was also raised. Obviously workshops where people teach their own material, use slides instead of live coding, or don’t otherwise follow our pedagogy can damage people’s trust in the Carpentries. But how do we find out about those? And how can we educate our instructors on what makes a good workshop? I was also asked to tweet more - especially to alert people to new blog posts. People also wanted reminders about upcoming events like community calls. Some of these issues were discussed in the two community calls just gone. You can see the discussions on the call etherpad. Some good ideas were raised - annual refreshers to keep instructors informed and engaged, providing text people can use in CVs and job applications on the benefits of Carpentry skills and training, and building a workshop ‘playbook’ to centralise checklists and tips. The Google form is still open - feel free to keep channelling ideas and suggestions to me. If you leave your contact details, I promise I will get back to you to discuss your ideas further. However you are also welcome to post feedback anonymously. One respondent asked us to develop an infographic of the Carpentries to help people orient themselves in the merged organisation. I think this is an excellent idea. However, my design skills could most kindly be described as rudimentary. Anyone want to volunteer? Read More ›

1 - 15 July, 2017: Learner Impact, Instructor Training, Community Building, Author Information.
Martin Dreyer / 2017-07-17
Highlights Our pre- and post-workshop surveys show that the Carpentries have a significant impact on the learners. We are working to better the instructor training material to ensure we have a good up to date curriculum. You do not have to be an expert to be a good instructor. Tweets Are you on our mailing list yet? Sign up here to get our newsletter. The teams at @swcarpentry and @datacarpentry keep fighting the good fight! Good enough practices in scientific computing. Help bring workshops to new countries + communities in 2017 - make a donation to Software Carpentry. Read our experiences in ELIXIR with @swcarpentry and @datacarpentry. Research libraries and tools curation in digitalhumanities: new report. Wonder what’s going on @swcarpentry? You can see our community calendar here - it will correct for your time zone. General You do not need to be an expert to be a workshop helper. We do a lot to build the community within the Carpentries. NASA DEVELOP hosted two concurrent workshops and it went very well. If you have contributed to any of our lessons in the past please take a few minutes to give us some basic information. 22 workshops were run over the past 30 days. For more information about past workshops, please visit our website. Upcoming Workhshops: July University of Mauritius, Imperial College London, Lawrence Berkeley National Laboratory, University College London, McMaster University, Noble Research Institute. August University of Southampton, Federal Reserve Bank of Chicago, University of Notre Dame, Space Telescope Science Institute, University of Sheffield, Washington State University, Interacting Minds Centre, Aarhus University. September University of Würzburg, Ghent University, Oregon State University/CGRB, European Molecular Biology Laboratory. Read More ›

Credit for lesson contributors
Kate Hertweck / 2017-07-16
In 2016, Software Carpentry instituted a twice-yearly release schedule for our lessons (see here for previous releases). Lesson releases, representing a snapshot in time of the material’s development, are published through Zenodo and given a DOI for citation purposes. Releases serve a few purposes, primarily to provide a trackable record of the development of our materials and to allow attribution to community members who contribute to lessons. The lesson maintainers are included as editors, while all other contributors who have made changes (as tracked through GitHub) are represented as authors. Given that the time is approaching for a new release, we have begun to reassess how to manage the rather lengthy list of authors for each lesson. Rémi Emonet designed and implemented a fantastic automated method for lesson releases for Version 2017.02 in February 2017. We would like to improve this process by including additional information about our contributors besides their names. Zenodo allows inclusion of an ORCID identifier for each author, which provides a more robust and trackable method of identifying authors than traditional affiliations. If you’re not familiar with ORCID, please head over to their website to learn more and obtain your own identifier. ORCID is a great project which would be certainly useful outside of this purpose! A second issue related to lesson authorship is that our list of authors is inherently additive in nature, and represents anyone who has authored commits merged into the lessons. As our lessons have matured, some contributors no longer desire to be included as an author on the release. There are multiple reasons why this might be the case. For example, someone may no longer be affiliated with Software Carpentry, or may have authored a commit early in the lesson development process that does not currently appear in the lessons. We have operated under the philosophy that the lessons as they currently exist are the cumulative result of all previous work, and all authors should be acknowledged. However, we also recognize that this is ultimately a personal choice, and our community members should be allowed to opt-out of authorship in all future releases if they wish. To reconcile these two issues, we’ve decided to gather some information from our contributors using this Google Form. If you have contributed to lessons in the past, please take a few minutes to provide some basic information. We will be collecting responses until August 1 to include in the next release, although the form will stay open indefinitely. Eventually we may decide to add features in AMY to capture this information, but for now, please help us test this approach by adding your information. Read More ›

First time for everything
Peter Evans / 2017-07-11
This spring I found myself back in a new situation, an anxious instructor in front of a room full of anxious students. My last teaching was many years back, as a teaching assistant in graduate school, and then a few lectures as part of an intermediate-level course for graduate students. But those were rather theoretical, blackboard affairs. This was my first attempt at teaching software, and with live coding demos. Didn’t Steve Jobs himself say “never give a software demo”? (Okay, maybe not.) In the end though, I was happily surprised by the experience. Getting there is half the fun At the end of the long German winter, a group of us had come together with the idea of holding a useful workshop or two for our institutes - the GFZ German Research Centre for Geosciences, the Potsdam Institute for Climate Impact Research (PIK), and the University of Potsdam (UP), and others. After much discussion, we settled on two two-day workshops to be held in May - one a Python Novice workshop, and the other on R. There was strong interest and many more people wanted to take part than we could fit. We had a large group of instructors and helpers with a diverse background. One or two had done this before; three of us had just completed the SWC instructor training course, but hadn’t checked out yet. Martin Hammitzsch led us in wisely lowering our sights to something achievable, and in managing much of the administrative work required to get us there. We were fortunate to have a great venue available. (Literally, as it was at the GeoLab meeting space in the base of the Great Refractor, one of the important scientific instruments of the end of the nineteenth century.) Little set-up time was needed, as we were able to distribute preconfigured disk images to all the computers in the room. Both courses were delivered in English. This created a few amusing issues like “Where is the control key on this German keyboard?” Getting to know you We began the first morning with all participants giving short “lightning talks”, limited to two minutes and one slide. This was an effort to build a software-oriented community by finding common interests across the many disparate research groups in our cross-disciplinary institutes. The results were an impressive demonstration of the range of talents present here. The big day I gave the “Introducing the Shell” introduction. As I got started, I had an insight: Every one of us in this room had mastered a second (natural) language. So surely getting started on a couple of computer languages and dialects (bash, git, Python, R) wouldn’t actually present much of a hassle for the participants. I found this thought reassuring, at least - I hope they did too. Things unfolded more or less smoothly over the rest of the workshop. The most satisfying feedback was that we obviously enjoyed guiding the students and benefitted from having strong material. Or that we went slowly and carefully so that everyone could keep up. Much credit is due to the SWC community for giving us such a solid, well-structured base to work from, and to Martin for his leadership in educating us about the goals of SWC, and for his organisational expertise. These workshops were a good learning experience for the instructors (for the participants too!) and we look forward to putting on an even better workshop in 2018. Read More ›

Curriculum › Help Update the Instructor Training Materials
Erin Becker / 2017-07-11
The Carpentry Instructor Training curriculum helps prepare new instructors to teach Carpentry workshops. It also impacts instructors’ teaching practices when they teach in other contexts, helping to spread the Carpentry pedagogical model and evidence-based teaching practices around the world! We last published this curriculum in February. Since then, we’ve taught over 150 new instructors at a dozen training events. We’ve also welcomed ten new Instructor Trainers to our community, with fifteen more to join in September. We’ve learned a lot over the past six months and want to incorporate what we’ve learned before our next publication (scheduled for August 10th). Please join the Trainer community in updating these lessons! Get involved! If you’ve made a contribution to the Instructor Training materials, you’re already an author. Help make sure the final product is polished and complete by getting involved in the lesson release events. The Instructor Training Issue Bonanza is starting Thursday, July 13th at 22:00 UTC and will continue until Friday, July 14th 22:00 UTC. Click this link to see the event in your local time. How does the lesson release process work? Here’s a run-down of the lesson release process and our timetable for this release. Issue Bonanza to identify issues that need to be fixed before publication. July 13-14 Staff and maintainers organize issues (e.g. add tags and remove duplicates). July 16-20 Bug BBQ to fix issues identified during Issue Bonanza. Aug. 3-4 Publish! Aug. 10 Issues to focus on are in the lesson release checklist. You don’t need to be an expert in the materials - we need people to help search for broken links and typos too! If you’re planning on joining the Issue Bonanza - add your name to the event Etherpad. We’re excited to work with the community to update these materials. Put these dates on your calendar, and we’ll send out reminders and updates too. These lessons belong to the community - help us keep them great! Read More ›

Two Workshops at NASA DEVELOP (or, Python De-Fanged)
Ryan Avery, Kunal Marwaha, Katie Moore, Kelly Meehan / 2017-07-11
On 8-9 June, 2017, Katie Moore, Deputy Data Management Team Lead for the NASA CERES Science Team, and Ryan Avery, Geoinformatics Fellow with the NASA DEVELOP National Program, ran a Self-Organised workshop in Hampton, VA. On 12-13 June, 2017, Kunal Marwaha, a software engineer with Palantir, Kelly Meehan, Geoinformatics Fellow with NASA DEVELOP, and Ryan ran another Self-Organised workshop in Norton, VA. Both workshops focused on building skills in NASA DEVELOP participants, who work on 10-week feasibility projects that demonstrate how to apply NASA Earth observations to environmental concerns to enhance project partner decision-making. In the process, both partners and participants gain a better understanding of NASA’s Earth-observing (EO) capabilities and improve their professional and technical capacity to use EO data. For Katie, Kelly, and I, these were the first workshops in which we taught entire lessons and we found it to be an extremely rewarding experience. ## NASA DEVELOP at Langley Workshop Attendees Day 1: 19 Attendees Day 2: 15 At the Hampton workshop, learners came from a variety of academic backgrounds and with a spectrum of skillsets. The majority of learners fell within environmental or planetary sciences, but we also had undergraduate and graduate students attending who were studying mathematics, chemical engineering, economics, physics, and social sciences. Because of the types of programs written for DEVELOP research projects, we opted to focus on covering programming fundamentals, leaving out most of the UNIX shell lesson after the working with directories section. For Python, we incorporated the Gapminder Variables and Assignment and Libraries lessons into the Inflammation lessons, which was a success. We also covered the Gapminder Plotting lesson, which ended up being a little redundant. The Git lesson excited and motivated most participants but also left some participants wondering when or if they would ever use Git. In our next workshop in Norton, VA, we changed our delivery of these lessons to include more hands-on examples. For most learners, the pace of the workshop was spot on, with a handful of respondents saying the pace was either too slow or too fast. There were some technical issues with being able to open the Python interpreter from Git Bash. We solved this problem pretty quickly by creating a .bashrc file to point the Python command to the python.exe executable file. Instructions to do this have been documented on the Configuration Problems page. Another big lesson from this workshop was not to hold intensive two-day workshops during the orientation week of our program, when participants are tired out from getting accustomed to a new job. There was some drop off in attendance on the second day, however, overall feedback and survey responses were positive and most learners reported feeling more motivated and less intimidated by the UNIX Shell, Python, and Git. Some of the feedback we got: Good The sticky notes help system worked well and was a good idea The pace, the instructors, and the content that was [provided] were fantastic. Thank you again. Thank you Ryan and Katie! That was very helpful and very informative. Improve I would have loved a couple extra days to learn R, more bash, and sql! None that I can think of right now NASA DEVELOP at Wise Workshop Attendees Day 1: 9 Attendees Day 2: 9 The audience was very engaged and committed to staying the full length of the workshop. It included a high school teacher and student from John I. Burton High School (where we hosted the workshop) and the 7 participants of the NASA DEVELOP program at Wise. The NASA participants had already met a week prior, so there was existing camaraderie that made the workshop less formal and more welcoming. Furthermore, this workshop began after a weekend, with participants fresh and ready to tackle new challenges. The audience was new to programming but were proficient at typing and navigating their computer via a GUI, which made this workshop a lot easier to teach. Three instructors for nine learners also made a huge difference, for example, one participant was slightly behind for most of the first Python lesson, but was able to catch up each time with one-on-one help from an instructor. Some changes we made between this workshop after learning from the one at Hampton - we decided to not teach plotting from the Gapminder tutorial since we had already covered this in Inflammation. We also decided to provide more challenge exercises to give more hands-on experience and less “lecture” time. Folks were really excited by a few hands-on tutorials; there were several moments where the room was full of “oohs” and cheering. A couple of highlights included: After teaching Git, we pointed people to the Github repository for pianobar, a program to run Pandora from your command line. Find the binaries here. Learners cloned the appropriate repository (or brew installed it), and were able to connect their Pandora accounts and use the command line to create stations and play music. This was a good tie-in with versioning and collaboration and got the learners excited about the world of open source programming. In the final hour, we ran a couple of exercises at the end: Write a function that takes in a number (1-12) and runs the plots for the associated inflammation-xx.csv file. This tied together relevant concepts including if statements, chained functions, and data types. It also showed learners that there are often multiple ways to accomplish a task when programming. Use vi to write a simple questionnaire using python’s input() function, so that it would interactively ask for your name and how you’re doing and then provide a response which uses your answer. Learners came up with other questions and wrote humorous questionnaires. We wrapped up with learners trying each other’s programs and had lots of fun! Feedback was overwhelming positive, with 8/8 respondents to the post-workshop survey identifying as promoters. Some of the feedback we got: Good Learned Python!!! Learned how to use the terminal. I’ve used it before but never understood what I was doing and probably couldn’t have done anything useful with it before the workshop. Liked Rodeo & I think it’ll be useful in my DEVELOP project & my other research project. Instructors were really good & easy to follow. I was scared of coding before but not anymore so THANKS! I liked learning about the various applications & uses of Python. Awesome how it was step-by-step. I knew nothing about coding, but it was explained very well. Apply[ing] the coding… to actual problems was cool to see. The information was easy to understand. I liked the challenges after the lessons. I also liked hearing how the things we learned apply to real world applications. Ya’ll made coding fun for someone who has had no previous experience. Willing to help when we are stuck. EVERYTHING! Very useful & learnt A LOT especially Python. Learning the basics, plotting, functions, etc is very helpful in school & at work. I feel like I removed the fangs of the Python in this workshop and now it can’t bite me anymore. Before I was scared of it. OK maybe a bad pun. Intense course in 2 days but learning was steep. Improve With more time it would be nice to go over indexing and how computers read rasters. More explanation on Python basics. Possibly slow down the Python coding section. This workshop was not long enough for the information given. Possibly tailor to specific projects. Include more exercises. I want more than 2 days of Software Carpentry. I want to learn more. Maybe just slow down explaining some parts. Maybe break down exactly how these programs could work with our specific projects. Read More ›

Assessment › Analysis of Software Carpentry Workshop Impact
Kari L. Jordan / 2017-07-10
We’ve begun to look at our pre- and post- workshop surveys and are sharing the draft reports to stimulate conversation about our workshops and their impacts on learners. For this first cut, we looked at our post-workshop survey data. From this survey data, we can see that learning to program can be quite intimidating for many learners. About 44.5% of Software Carpentry learners who responded to the post-workshop survey feel that at least one of the tools covered in the workshop they attended was either slightly or very intimidating to them before attending a workshop. Luckily for our learners, we have trained, enthusiastic, and considerate instructors who are great communicators. As a result, our learners are leaving Software Carpentry workshops with increased confidence and motivation to perform computing tasks like initializing a repo in Git and importing libraries in R or Python! We have learned so much from the analysis of our post-workshop surveys. We invite you to check out the Analysis of Software Carpentry’s Post-Workshop Surveys report to learn more. Special thank yous go to Ben Marwick, Naupaka Zimmerman, Erin Becker, and Jonah Duckles. These individuals made valuable contributions to the code that was used to create the figures in this report. All of the data are available in a de-identified way in this repository. Source data (csv) , Report (html) and Report Source (rmd) are all available for further analysis and exploration. We’d love to hear from you if you look at the data, and pull requests are most welcome if you come up with some interesting analyses. What strikes you after reading the report? Tweet us your thoughts @swcarpentry and @drkariljordan. Read More ›

Open Channels
Belinda Weaver / 2017-07-06
I was writing a talk about community building in the Carpentries for the ANDS Tech Talk series, and found myself surprised at how many things we do. Obviously, workshops are our key community-building activity. They help us reach new learners, give instructors the chance to practise teaching, and draw in new helpers. Many learners and helpers go on to become instructors themselves, which builds further momentum. Conference tie-in workshops, such as the annual bootcamp for the UQ Winter School attendees in Brisbane, are another building block. People have already been at the conference together so they come to the workshop as a loose group, where they learn tools relevant to their research practice with others from the same discipline. Tie-in workshops help us build out into different academic communities both nationally and internationally, since many go home from our workshops as advocates for Software Carpentry. Conference-goers can find kindred spirits via our meetups page, and busy Carpenters can stay in touch via our newsletter - you can subscribe here or read issues you might have missed. We also have a number of email lists, such as the discuss list, or regional lists such as Australia/New Zealand. Our blog and our Twitter feed provide other avenues to stay in touch, as does our slack channel. People can catch up with those conversations whenever it suits them. For those who want to connect more directly, community calls and instructor discussions provide a way to talk in real time. Find those on our community calendar. We also engage people directly via our lessons on GitHub. People raise all kinds of issues, and discussions can get quite lengthy. But it’s important for us to provide that forum: we’re Software Carpentry - we do what we do in the open. There is also all the great outreach our mentoring volunteers do - running instructor discussion sessions to debrief instructors after workshops or to help new people prepare to teach for the first time. Teaching demo sessions are another way to build community - connecting the growing trainer group to new instructors coming through from instructor training, another key building block of outreach. Sprints and hackathons like Data Carpentry’s bug bbq also draw people in, as did our 2015 instructor retreat. And soon there will be CarpentryCon! Expect to see a blizzard of info about this 2018 event soon. Read More ›

Why be a helper at Software Carpentry workshops?
Belinda Weaver / 2017-07-01
“I’m not an expert on R”, “I don’t know any Python”, “I’ve never used Git” - these may be true statements but they should never stop you from helping out at a Software Carpentry workshop. Even the workshop instructors themselves may not be “experts” - and all the better if they are not! Experts don’t necessarily make the best teachers. Many have lost sight of - or, worse, patience with - the beginner mindset. Software Carpentry’s worldwide community of volunteer instructors include experts, near-learners and plenty of people in between. What they share is a willingness to teach their peers. And that’s all a helper really needs - the willingness to lend a hand. It’s fine to say you don’t know the answer to something, or to call for help with a question that stumps you. Let the instructors deal with anything knotty that crops up. Most workshop hiccups are much simpler - a typo, issues with the wifi, or learners not being able to locate a downloaded file. Learners might have fallen behind, in which case all they need to get caught up is to be shown the right spot in the online lesson. Or perhaps they overlooked the etherpad link. These are all simple problems that you don’t need to be an expert to fix. Sometimes it’s enough just to be familiar with a Mac or with Windows, so that people using unfamiliar laptops can navigate their way around an operating system. All kinds of help are welcome at Software Carpentry workshops. Perhaps you can paste challenges into the etherpad. Collect sticky note feedback. Write links on a whiteboard. Point people to the best coffee or lunch place on campus. It certainly helps if you’ve had time to review what will be taught. That way, you will be quicker to spot a typo or locate the spot where the learner needs to find the lesson. But don’t pretend to know more than you do. Learners will appreciate your honesty. The main thing is to be friendly and approachable. Helping at a workshop is a way to see workshops in action. You may be trying to figure out if you’d like to become an instructor yourself. You might even want to have a crack at teaching a section, knowing there are more experienced instructors in the room should you run into problems. You can certainly learn a lot about instructing by watching others. Many helpers do go on to become instructors themselves, reinforcing their own learning by teaching the material to other people. If you have attended a workshop before, helping may reinforce your own learning. Hearing things explained again can really help consolidate your knowledge. You might also pick up new tricks and tips. A key benefit is helping people get started, feeling you are making a contribution to their learning, and to the learning community within your institution. It might kickstart a new community of practice, or just a networking group for when times get tough and you need to talk your work over with someone who is struggling with the same problems. So what are you waiting for? FInd a workshop near you and volunteer. You will be made very welcome in what is a great, global community. Read More ›

15 - 30 June, 2017: Reorganisational Timeline, HPC-in-a-day, Good Enough Practices for Scientific Computing, Opensource Survey.
Martin Dreyer / 2017-06-30
Highlights The reorganization timeline has been drafted and awaiting approval from the respective steering committees. Learn about the HPC-in-a-day course that brings the fun back to HPC. we are pleased to anounce the publication of Good Enough Practices for Scientific Computing. We are very excited for Maneesha Sane that is moving from Program Coordinator to Program Manager of the Carpentries. Jobs Software and Data Carpentry are looking to hire a part-time Workshop Administrator to help set up workshops and ensure that they run smoothly. Tweets Top 10 tips for image acquisition if you are going to do image analysis. Spread the word and happy quantifying! Just out, “Good Enough Practices in Scientific Computing” Ten simple rules for collaborative lesson development. Good essay on value of empathy in work. Empathy of @swcarpentry instructors humanizes technical skills for learners. How open is your opensource? Check out the Open Source Survey. General Our secretary has commited to write a monthly blog post to ensure better transparency between the Steering Committee and the community. Macquire University held their first ever instructor training and it went very well. 17 workshops were run over the past 30 days. For more information about past workshops, please visit our website. Upcoming Workhshops: July ResBaz 2017- Python, ResBaz 2017 - R, University of Technology Sydney, University of Würzburg, University of Auckland, UQ Winter School , University of Chicago, Lawrence Berkeley National Laboratory, University of California San Francisco , University of Mauritius, Imperial College London, University College London, McMaster Software Carpentry Workshop. August University of Southampton, Washington State University. September University of Würzburg, Oregon State University/CGRB. Read More ›

Job Posting: Workshop Administrator
Belinda Weaver / 2017-06-26
With the growth of Carpentry workshops all over the world, we are excited that Maneesha Sane is moving from Program Coordinator to Program Manager of Software and Data Carpentry. As Program Manager, she will continue to be involved in workshop coordination and instructor training and will oversee and ensure the quality and consistency of program operations. She will also work to develop processes, infrastructure and communications to consistently improve the workshop experience for instructors, hosts and learners. To fill some of her workshop coordination responsibilities, Software and Data Carpentry are looking to hire a part-time Workshop Administrator to help set up workshops and ensure that they run smoothly. The successful candidate will join a team of workshop coordinators around the world. In this job, you will manage workshop logistics, help communicate with hosts and instructors, and respond to general workshop inquiries. We are looking for someone with strong organizational and communication skills, who can prioritize competing tasks and work independently. Strong attention to detail is a must. Enthusiasm for our mission of teaching people how to program is also a plus! This is a remote position. The incumbent will be hired and paid as an independent contractor of our 501(c)3 fiscal sponsor, NumFOCUS. The position will begin as part-time, approximately 20 hours a week, but has the potential to become full time. Review of applications will begin on July 17, 2017, and the position will remain open until filled. For more details on the position and information on how to apply, please see the full job posting. Read More ›

Good Enough Practices in Scientific Computing
Greg Wilson / 2017-06-22
We are pleased to announce the publication of “Good Enough Practices in Scientific Computing” by Greg Wilson, Jennifer Bryan, Karen Cranston, Justin Kitzes, Lex Nederbragt, and Tracy K. Teal, which is intended as a complement to 2014’s “Best Practices for Scientific Computing”. As the summary says: Computers are now essential in all branches of science, but most researchers are never taught the equivalent of basic lab skills for research computing. As a result, data can get lost, analyses can take much longer than necessary, and researchers are limited in how effectively they can work with software and data. Computing workflows need to follow the same practices as lab projects and notebooks, with organized data, documented steps, and the project structured for reproducibility, but researchers new to computing often don’t know where to start. This paper presents a set of good computing practices that every researcher can adopt, regardless of their current level of computational skill. These practices, which encompass data management, programming, collaborating with colleagues, organizing projects, tracking work, and writing manuscripts, are drawn from a wide variety of published sources from our daily lives and from our work with volunteer organizations that have delivered workshops to over 11,000 people since 2010. We hope the community will find it useful, and would welcome feedback. Read More ›

Instructor Training at Macquarie University
Belinda Weaver / 2017-06-20
Software Carpentry Executive Director Jonah Duckles and I ran instructor training for new member institution Macquarie University in Sydney on 19-20 June. There were a few firsts at this workshop - Jonah’s first time as an instructor trainer, my first day in the job as the new Community Development Lead for Software and Data Carpentry, and Macquarie’s first-ever instructor training event. We may also have had the biggest-ever workshop etherpad, coming in at 1400+ lines, but that would be a tough claim to substantiate. Certainly the eventual size of the etherpad mirrored the deep engagement of attendees, who all worked very hard. Eighteen people attended the training, most of them from Macquarie (a mixture of postgrad students and IT/research support staff), but there were also attendees from local institutions UNSW and the University of Sydney. Feedback was plentiful - we did both sticky note feedback and One Up, One Down twice - and we were praised for the friendly atmosphere and the energy we brought to the workshop. A talking stick circulated to make sure quieter voices got to have their fair share of the conversation, and this attracted a favourable mention in feedback. Enthusiasm to build community and to get the skills out to others was high, though as one attendee commented a little ruefully in final feedback: “Daily noise gets in the way”. Jonah was forced to multi-task through one session, taking advantage of some audience challenge time to fix problems on the Software Carpentry website, which had disappeared from view because of a Jekyll page build problem. All was luckily fixed before the class raised their heads again. Attendees were keen to put what they had learned into practice, and most enjoyed the live teaching practices more than they had anticipated. As with most workshops, some people wished they had learned some of the material long ago! Jonah and I both appreciated the efforts of our magnificent, well-organised helper Carmi Cronje, who scribed for us tirelessly throughout the workshop. Thanks must also go to Emily Brennan, Project Coordinator in the Office of the Deputy Vice-Chancellor (Research), who worked very hard to make this workshop happen, and Professor Peter Nelson, PVC Research Performance and Innovation, who funded the workshop. Macquarie are very keen to develop strong communities of practice around skills and training, and this workshop was a fantastic first step. I look forward to welcoming the class to the Software and Data Carpentry instructor communities. Read More ›

HPC in a day?
Peter Steinbach / 2017-06-20
Preface In today’s scientific landscape, computational methods or efficient use thereof can be at the heart of the race for new insights, if not at the heart of the race with the academic competition. Learning how to automate tasks from data analysis to data preprocessing as taught by the carpentries provides the technical concepts to enter this race with an advantage. If you just graduated a software/data carpentry boot camp and want to go beyond your laptop’s capabilities, the next step in academia is typically to approached the data center of your university or alike. There, a user account application has to be filed for the High Performance Computing (HPC) facilities. After some more formalities, storage and computing time is awarded and you can successfully log into the cluster. And then? Then either our carpenter is given a link to the wiki of the local cluster and how to use it. Sometimes there can be a short course on the mechanics of the HPC cluster and how to use tools that are installed on the cluster. That’s it and good luck. Clearly, the details of the last paragraph vary from site to site and may be a bit exagerated. Judging from YouTube, there are a lot of dedicated and highly enthusiastic HPC instructors out there. Even so, there is yet a large gap from filing an account on the HPC machine to running an analysis or simulation campaign autonomously and at scale. The reason for this is, that HPC clusters are very complicated installations. Moreover, the trainings in HPC jump very quickly to the nuts and bolts of HPC, i.e. the number of cores, size of CPU caches, batch system intrinsic, optimal communication patterns, what profiler to use etc. As many (if not the majority) of HPC users stem from domain sciences and hardly ever received a formal education in (parallel) programming or modern computer architecture, this situation leaves many users with despair and hopelessness. Many of the latter end up ‘copy & paste’ing scripts from wikis or arbitrary places and using these snippets in a mechanical fashion. Simply put: the fun disappears very quickly. With fun lost, creativity will be the next victim which can be detrimental to the scientific race mentioned above. hpc-novice So let’s change this! Let’s bring fun back to HPC (training) for all. For this purpose, Christina Koch from the University of Wisconsin-Madison, Ashwin Srinath from Clemson University and myself (Peter Steinbach from Scionics Computer Innovation GmbH) started to come about with a hpc-novice curriculum that is inspired by the software carpentry spirit and pedagogical methods. Although this set of material is still in it’s infancy, the idea behind it can be paraphrased by “Help a carpentry learner to use a cluster of computers to speed up their day-to-day data lifting”. Our efforts to brain storm a possible curriculum are currently fixed in this document. Feel free to dial over and provide comments. In an attempt to converge on a curriculum based on user feedback and due to the need for local training at our client, the Max Planck Institute of Molecular Cell Biology and Genetics, I went ahead and came up with an one-day HPC course, which I called hpc-in-a-day. People invited me to to report my experiences made there, so I dedicated the remainder of this post to that. hpc-in-a-day was conceived as a course for our scientists as our (client) institute is becoming more and more cross-disciplinary and hence we have mathematicians, physicists, biologists and engineers that all want to use our HPC infrastructure. As we were about to open our new cluster extension, hardware resources were a bit scarce at the time of the workshop. I started to ask around if I could get support from the AWS cloud team through my data carpentry contacts and some vendors that we work closely together with. But no luck. Surprisingly, our Lenovo re-seller pro-com helped us out and put up a temporary cluster of 8 machines just for our workshop. A big thank you them at this point! Before going in, I prepared pre-workshop assessments to infer the expertise level of the learners. I asked them mostly questions regarding how familiar they were with the terminal and how familiar they were with programming. To give you a feeling of the crowd, 90% of my participants expressed that they use the terminal at least once a day. 45% of all learners mentioned that they would required “google or a colleague of choice” to infer how much disk space they have left on the computer using the terminal. Along the same line of thought, 90% of the participants claimed that they program at least once a day. So I thought I was well prepared and went ahead composing the material. The contents were set up in the following way: Two sessions about advanced shell methods (ssh/scp and file system recap) Two sessions on how to submit jobs to the cluster (scheduler basics, submit scripts) One session on using the shared file system Three sessions on the basics of parallel programming with python (from serial implementation, to a shared memory parallel one and a distributed one) One session on using the scheduler for high-throughput computing What went wrong? First of all, I had to learn the hard way that lime survey also records incomplete surveys. It turns out that 2 of my learners didn’t complete the survey and those were the ones with rarely any expertise in using the terminal. Me, as the instructor, I need to be more careful with this. Not only did this produce a bi-modal distribution of prior expertise in programming among the participants, but it made structuring the course much more difficult as the majority of learners were able to work with the terminal. Second of all, when you are an experienced HPC user and half-time admin, you simply stop seeing the obstacles - many things like file system mechanics are simply done by your fingers and not by your mind anymore. This made my time estimates very inaccurate and tempted me to ditch large parts of some lessons. Judging from this, a 1.5 or even 2 day workshop would be better. A quote from the feedback: Course was very intensive (or just too fast) for unexpirienst user. Very fast in the beggining with all the ssh connection, without explanation what is it and how to work with it. Last but not least, it became apparent in the feedback round (learner 1, learner 2, learner 3, learner 4, learner 5, learner 6 and learner 7) that some of the terms which I used to relate the parallel execution of a program (think “threads on cores”) where not mentioned at all in source code. Apparently, this mental mapping was missing and so one learner even said: I’ve also missed examples of how to run programs that already have a –threads option in the cluster. even though I covered this type of parallelization in detail. That said I wondered, if python is really the best language to teach parallel programming? What went well? In the post-workshop assessment, participants were asked to indicate if they would recommend the course to others, where a score of 5 refers to very strong agreement and 0 no agreement. The feedback from my learners averaged to a score of 4.5 out of 5! A quote from the feedback: Otherwise, great course. Thanks for having me. During the course, I saw that many people just immediately grasped the content that I was trying to convey. Many people immediately asked how to automate job submission, how to profile their Fortran or C++ application, how to automate the optimal parameter set for submitting their jobs and much more. You could tell that some of these questions grew out of their day job use of HPC clusters. Also, I was personally happy to see that people enjoyed parallel programming as much. I chose the Monte Carlo style estimation of Pi using 2 arrays of pseudo-random numbers as the underlying algorithmic problem to solve. I had the impression that people could grasp this rather easily - something I was not clear about beforehand. Summary To put up a carpentry inspired HPC course, some things became evident (again): my hpc-in-a-day curriculum should be split into 2 lessons (hpc-novice and hpc-parallel) to target people that just want to get their job done on the cluster (hpc-novice) and those that need to go further (hpc-parallel). A much more clear communication of the expected expertise of the learners is essential. Good teaching of parallel programming and processing can be done before any deep hardware details enter the stage, which is where I see the biggest selling point for this curriculum. Working in HPC for years and using these machines should not lead us to believe that we fit to teach it. We should therefor reduce the material to concepts first. Further, some HPC centers and even one vendors already asked me if and how hpc-in-a-day will live on and if there will be other implementations of it. I personally would love to continue working with Christina and Ashwin as well as any other volunteers out there to do this and potentially bring HPC back home into the Carpentries. There already was one adaptation of hpc-in-a-day by Andrea Zonca. Read More ›

New monthly updates from the secretary
Rayna Harris / 2017-06-20
After 6 months, I feel that I’m settling into my role as Software Carpentry Steering Committee Secretary. In an effort to improve communication and transparency between the Steering Committee and everyone else, I am commiting to writing or co-writing a montly blog post to keep you abreast on news from each meetings. Check out these new blog post summaries of the in-person Software Carpentry Steering Committee Meeting and of the in-person SWC/DC merger group meeting. I also have resumed the process of recoding all board resolutions here with those from previous years: https://github.com/swcarpentry/board/blob/master/minutes/resolutions.md For your reference, minutes are always archived here: https://github.com/swcarpentry/board/tree/master/minutes These can be a bit detailed or disorganized, so I’ve started to include a summary paragraph at the top the archived to improve their readability. I’ve begun archiving the minutes for the Community Call meetings in this Carpentries GitHub repo: https://github.com/carpentries/community-calls Please let me know if you have questions or concerns. Thanks! Read More ›

1 - 15 June, 2017: Steering Committee retreat, Community Development Lead, Library Carpentry Instructors,CarpentryCon.
Martin Dreyer / 2017-06-19
Highlights The Carpentries might look a bit different in future. The Software Carpentry Steering Committee retreat went really well and they have provided five key areas in which the committee wil provide oversight. We are pleased to welcome Belinda Weaver as our new Community Development Lead we wish her all the best! Tweets For detailed review computation and data: Research Software Engineers report. A good read. A Comprehensive Survey on Open Source. Awesome, Tireless, Excitable Library Carpentry Champion Belinda Weaver is the new Software Carpentry Community Development Lead! If you care about research data practices & have #genomics #bioinformatics experience we’ve got a great opportunity. Hear about how Software Carpentry built community on campus through University of Oklahoma Libraries from Carl Grant and Sarah Clayton. Sheffield University is now a Software Carpentry Parnter! General Library Carpentry will have some new instructors soon and we cannot wait to welcome them to the instructor community. The Carpentries are committed to promoting inclusion to all that want to participate in any workshop. Library Carpentry recently joined the Mozilla science sprint and it exceeded our expectations. Our vision for CarpentryCon 2018 is to be a better more interactive community driven event. 22 workshops were run over the past 30 days. For more information about past workshops, please visit our website. Upcoming Workhshops: June IRI Life Sciences, NASA Goddard Space Flight Center, NASA Goddard Space Flight Center, Macquarie University, TGen, Queen Mary University of London, Pacific Northwest National Laboratory, Materials Physics Center,University of the Basque Country Software Carpentry Workshop, UW Madison. July University of Auckland, University of Chicago, University of Mauritius, UQ Winter School. September University of Würzburg, Oregon State University/CGRB. Read More ›

Timeline for the Data Carpentry & Software Carpentry Reorganization
Karen Cranston, Rayna Harris, Kate Hertweck, Hilmar Lapp / 2017-06-17
On Jun 8 & 9, we met in-person to continue discussing how to proceed with the merger of the Software Carpentry and Data Carpentry organizations to best achieve our strategic goals and serve our communities. We are happy to report that is was a very productive meeting. We spent most of the meeting discussing the structure of the future organizational leadership. We have drafted a number of motions that we will present to our respective steering committees for approval at our next meetings. Among the motions, we will propose that the combined Carpentries governing structure be made up of both appointed and elected members. Additionally, we propose that it is in the best interest of the organization to seed the future steering committee with bylaws, which will be enumerated, written, and approved in the coming months. Below is a timeline for the merger process. Various steps in this timeline require approval by both steering committees, and are dependent on results of previous steps, so this may change as we proceed. Month Merger Process Milestones June Communicate the in-person meeting outcomes and proposed governance structure to staff and community members. July Present motions to approve the proposed governance structure, executive leadership, and transfer of assets (e.g. finances, cross-organization subcommittees). Enumerate required bylaws. August Present and approve an official motion to combine Software Carpentry and Data Carpentry into a unified organization. Approve nominations for the appointed seats to the steering committee. September Approve a unified budget. Approve the transfer of assets. October Call for candidates for elected members of the steering committee. Approve motions to dissolve the current steering committee and transfer assets as of January 1. November Community call to announce candidates for the elected seats on the steering committee. December Elect new elected steering committee members. January New organization with new steering committee commences. We are grateful for all the input we have and will to continue to received from community members, from staff, and from colleagues outside the Carpentries during this decision-making process. Please stayed tuned for updates and feel free to contact us with questions or concerns. Read More ›

The road to CarpentryCon 2018
Fotis Psomopoulos / 2017-06-10
Imagine a global community event where Carpentries members and aficionados come together for a short period to exchange experiences, discuss lessons and debate teaching practices. Visualize a space where keynote speeches are closely complemented by community-driven lightning talks, and round-table panels are supported with ad-hoc informal meetings to share stories about challenges and successes. Finally, to complete the picture, consider a number of satellite events, such as social gatherings, carpentry workshops and new lesson developments. Well, this is our vision for CarpentryCon 2018! During the last meeting of the CarpentryCon task force (2017-06-04), we made several concrete steps towards making this a reality. First and foremost, we started working on a draft agenda for the CarpentryCon event, as well as on a bid guide for potential sites to host the event. Primary considerations in both cases were to incorporate the best experiences in conferences so far, as well as attempting to cultivate an informal but informative environment. Finally, we selected new officers for this task force: Mateusz Kuzak and Rayna Harris, after their tireless efforts so far, stepped down and passed on their responsibilities to Fotis Psomopoulos and Malvika Sharan (as chair and secretary respectively). Our next steps will focus primarily on finalizing the draft agenda and start sending out the bid forms to potential sites. We envision CarpentryCon as an event by the community and for the community. This is why we want to involve you in the process and decisions. We also want the event to be inclusive and attract a very diverse group of people, our learners, instructors, trainers and lesson contributors. Bid guide and agenda documents are open for comments and will be circulated to the community for feedback within the month. We will also host the Community Call in August in order to bring in more people into the discussion and planning. So stay tuned! Read More ›

The Endless Sprint
Belinda Weaver / 2017-06-07
The recent Library Carpentry sprint exceeded my expectations, and then some! Around 107 people signed up on our organising etherpad, either to work remotely or to contribute at one of the 13 sites in seven countries around the globe. That’s a big step up from the 2016 numbers of six sites and around twenty people working on the material. The sprint was organised as part of the 2017 Mozilla Global Sprint. Library Carpenters comprised around one-sixth of that sprint workforce, and our more than 850 GitHub events - pull requests, forks, issues raised, commits and merges - outpaced the rest of the field by miles. We used the sprint to amend, update, and extend the existing Library Carpentry lessons, get draft lessons on SQL and Python into better shape, and develop a new lesson on web scraping. We welcomed contributions from librarians and archivists, as well as from other information professionals, not to mention Software Carpentry Executive Director Jonah Duckles himself who worked on the git lesson alongside Gillian Elliott’s team in Otago. We used our chatroom and this GitHub repository to organise work during the sprint. These were the issues we worked on, with lesson maintainers making themselves available to answer questions either via the chatroom or on Zoom (video call) sessions. The Zoom sessions sometimes ran for hours and proved to be an efficient channel for the different sites and people working remotely to network with others or to resolve tricky questions. Greg Wilson popped up in one Zoom session and we also had a visit from Raniere Silva - both welcome presences at any Carpentries’ hackathon. We used Zoom for the daily handovers as the baton passed to the next team coming online. The sun never set on the project as momentum moved from earlier risers New Zealand, into and across Australia to the Netherlands and the UK, then on to Canada and the US, and back again. US librarians truly committed to the sprint, with seven sites overall, some of whom were still going strong when I clocked in on Saturday morning (my time) to say thanks and goodbye (getting snapped in my pyjamas by Elizabeth Wickes in the process). Librarians who had attended the instructor training in Portland run by Tim Dennis and me (half of the newly minted #TeamSpatula) were particularly well-represented. Portlanders - we love you guys! There were just too many of you to name here but more than half the Portland cohort sprinted, and some are still chiming in with contributions now. I will give a shout out to Scott Peterson for being not only a sprinter, but also a site organiser at UC Berkeley. Sprinters were free to work on whatever lessons or issues they liked. In New Zealand and Utah, librarians worked through lessons to familiarise themselves with what for many was new material. What eventuated was not just lesson knowledge, but a sense of community, and a feeling of ‘buy-in’ that this was useful knowledge worth spreading further. Non-coders were free to raise issues, correct typos, suggest fixes and devise scenarios to try to make the material more relevant. That this worked well could be seen in our email in-boxes: GitHub notification numbers were off the charts for the lesson maintainers. Not that we are complaining - the level of engagement was just mind-bogglingly gratifying. Led by Nora McGregor, the British Library had a team working on git, and Software Carpentry Steering Committee member Mateusz Kuzak led a big team at the national library of the Netherlands where the Python and SQL lessons got a solid working over. Owen Stephens and Carmi Cronje led the charge on OpenRefine, and Jez Cope worked on incubator lessons and built a page for reporting our many workshops. James Baker worked on the data intro lesson, helmed a lot of Zoom action, and provided a steadying hand and answered a lot of questions in the chatroom. James also made sure that issues around standardising README and CONTRIBUTING files across the many repositories were not forgotten. Just as in 2016, the indefatigable Cam Macdonell was on board all hours, and Elizabeth Wickes distinguished herself by being blocked on GitHub - they mistook her hard work for something more spammy and sinister and closed her down for a few hours. Thanks to Thomas Guignard for the web scraping lesson we used as a basis for ours and to Lauren Ko for hacking away at it and running a site in Texas as well. The Brisbane site hosted Richard Vankongingsveld who developed the new Python intro lesson. He made good use of Zoom to consult with sprinters in New Zealand and the Netherlands as the sprint got underway. As for me, I baked two cakes, wrangled pull requests, worked on the git and web scraping lessons, reported on progress to Mozilla, dug people out of Git-holes, talked to people on Zoom, haunted the chatroom, answered queries, and matched people to tasks that needed doing. I also tweeted a lot. All in all, it was a thoroughly rewarding two days, and it was truly sad to see it end. We are now into the consolidation phase, with a lot more work ahead. HUGE thanks to everyone who took part. Great effort all round. Never seen Library Carpentry? Here are the links: Git Shell OpenRefine Python Draft new Python SQL Data Intro, Jargon Busting, Regex Web scraping There is a new Data Intro lesson specifically geared towards the needs of archivists: Data-intro-archives An incubator lesson exists within a separate repository for tidying spreadsheet data. Library-spreadsheets Read More ›

Announcing Belinda Weaver as our Community Development Lead
Jonah Duckles, Tracy Teal / 2017-06-07
We’re excited to announce Belinda Weaver has accepted our offer to become the Community Development Lead. Belinda will join the Carpentries staff later this month, please give her a warm and enthusiastic community welcome! Many of you may already know of Belinda from her work all over the community as a Software Carpentry Steering Committee Member, the Mentorship Subcommittee, the Trainers Subcommittee and a champion and leader of Library Carpentry. As part of our bylaws Belinda will be stepping down from the Software Carpentry Steering Committee on Friday June 16th and the Committee will finish the 2017 term with a membership of 6. For those of you who don’t know her yet, look for her introductory blog post later today. In short, Belinda is a very active community member and a delight to work with, we’re incredibly excited to have her contributing to our community full-time very soon! We want to thank everyone who participated in the conversations about how we created this staff position, the global search process, and helped with community member interviews. It is always encouraging to see such a vibrant and thoughtful community as we think carefully about how to grow our impact around the world. Belinda will officially begin in the new role on June 19th, 2017 and will be joining us as a full-time staff member. You can reach her at: bweaver@carpentries.org and follow her on Twitter at @cloudaus Read More ›

New Community Development Lead
Belinda Weaver / 2017-06-06
I am very pleased to be starting as the new Community Development Lead for Software and Data Carpentry. Building communities, helping people connect, fostering skills and learning, brokering solutions - these are the things that drive me. Jobs I have had include librarian, repository manager, newspaper columnist, Internet trainer and IT project manager. I ran the library system for the City of London libraries in the 1980s, and have just finished working for a non-profit organisation making eResearch infrastructure available to university researchers. In my spare time, I like to read (a LOT), bake bread (and cakes), see as many films as I can, especially foreign/art house ones, and grow herbs and vegetables in the garden. I also love teaching - whether it be Software Carpentry workshops, Library Carpentry courses, or training new instructors. I dive right into the new job on 19 June, when I will be running an instructor training event with Jonah Duckles at Macquarie University in Sydney. It will be great to see the instructor pool in Sydney increase. I have been a fan of Software Carpentry since first hearing about it in 2014, when I organised the first ever Software Carpentry bootcamp in Brisbane in July of that year. I certified as a Software Carpentry instructor in 2015 (and as a Data Carpentry instructor the same year), and taught at two workshops. In 2016, I taught at eight workshops out of a total of 14 statewide. During 2016, I and other Queensland instructors took Software Carpentry to five cities in Queensland - Brisbane, Townsville, Toowoomba, Gold Coast and Rockhampton - a huge improvement on 2015, when we taught three workshops in Brisbane only. I organised Software Carpentry instructor training in Brisbane in 2016, and certified as an instructor trainer myself in late 2016. I have since helped train an online cohort and a librarian cohort face-to-face in Portland, Oregon (with the wonderful Tim Dennis as my co-trainer). I look forward to training many more new instructors in 2017 and beyond. I currently serve as the Software Carpentry administrator for half of Australia. This means helping people in other Australian states and territories organise workshops. I have also served on the Software Carpentry Steering Committee for eighteen months, but I will step down from that on 16 June as joining the staff makes me ineligible for the Committee. I was one of the organisers of the very successful 2016 and 2017 Brisbane Research Bazaar festivals. ResBaz is a three-day research event to skill up graduate students and early career researchers and help them find their ‘tribe’, whether that be in a discipline such as ecoscience or around tools such as R. From one event in 2015, ResBaz grew to ten in 2016 and 14 in 2017 in places as distant as Oslo, Tucson, Christchurch in New Zealand and Cuenca in Ecuador. Software Carpentry workshops are always a key part of ResBaz festivals, and ResBaz events are a great way to attract more people to Software and Data Carpentry. Along with Sam Hames and Nicholas Hamilton, I started a weekly Hacky Hour drop-in IT advice session for researchers at The University of Queensland. In June 2016, I organised a sprint to update and extend the Library Carpentry material created by Dr James Baker and others in the UK. This was part of the annual 2-day Mozilla Science Lab Global Sprint. More than 20 people in six countries worked on updating the material, and added a new SQL lesson to the existing four. Interest burgeoned. Library Carpentry won the British Library Labs award in November 2016, and there have been 30 workshops held worldwide since the 2016 sprint. The recently concluded 2017 sprint attracted 107 people at 13 sites worldwide. New lessons were added (web scraping, introductory Python) and existing lessons were updated. The community is very active, with an ongoing chat room. New members are welcome. Software Carpentry has really taken off in Australia, and the southern hemisphere more generally, with strong communities developing in New Zealand and South Africa as well. In this role, I plan to continue that work, train more instructors, get more partnerships across the line, if possible, and make sure we extend Software Carpentry workshops beyond the capital cities into the regions and into new, under-represented countries and communities. I also hope to improve communications, and to create opportunities for all our instructors and supporters to become more involved, and to feel more valued. But I don’t intend to be just a southern hemisphere community builder - I want to help build Software and Data Carpentry communities worldwide. I hope to spend a couple of months working in the northern hemisphere next spring. If you would like to host me, please get in touch. I am also looking forward to working with the Software and Data Carpentry staff - we all have big plans! Feel free to contact me any time at bweaver AT carpentries.org or ping me on Twitter. I look forward to meeting you all. Read More ›

Instructor Access to Workshops
Erin Becker, Jonah Duckles, Kari L. Jordan, Maneesha Sane, Tracy Teal / 2017-06-06
Working with hosts and instructors on workshop access Read More ›

Summary of the 2017 Software Carpentry Steering Committee Retreat
The 2017 Steering Committee / 2017-05-31
The Software Carpentry Steering Committee met in person on June 7-8 in Davis, California. The committee members have unique journeys that brought them through the community to having a seat on the steering committee. Importantly, we have a unified vision for the future of the Software Carpentry Foundation and the roles of the steering committee. Below is a brief summary of the motions passes and detailed description of the five key areas over which the Steering Committee will continue to provide oversight. Below, we 1) summarize the motions that were passed at the meeting, 2) describe the five key areas over which the Steering Committee provides oversight, and 3) provide a timeline for implementing our goals. Summary of motions passed at retreat The Steering Committee has created a board-designated operating reserve ($80,000, equivalent to one quarter’s worth of budget) to provide an assurance of financial solvency The Steering Committee empowers the trainer group to move forward with open instructor training events to non-member affiliated individuals The Steering Committee empowers staff to perform tasks in their areas of focus while the Steering Committee retains oversight in the key areas of community, instructor training, curriculum, finances, and hiring staff Areas with Steering Committee oversight Community We value the involvement of everyone in our community - learners, instructors, hosts, developers, maintainers, committee members, staff, partners, advocates, trainers, organizers, sponsors, advisors, and helpers. We are committed to creating a friendly and respectful place for learning, teaching and contributing. We will continue to support the Code of Conduct, which is at the heart of our community. Our mission is to continue growing and supporting a diverse and inclusive community. To that end, we have empowered a new Director of Community Engagement to accomplish this mission. Instructor training We are committed to creating a community of practice around instruction. As we train new instructors and instructor trainers, we must also continue to support their professional development through a community of practice. We have empowered the Director of Instructor Training to expand the instructor training program to accommodate the needs of our community. Our strategic plan is to identify new instructor trainees and provide training opportunities while simultaneously working to grow our capacity to offer mentoring and other support to those new instructors. Curriculum The Software Carpentry lessons and workshops are the vehicle through which we teach best practices for scientific computing. Our curriculum evolves over time for many reasons, including changes in the needs of learners and turnover of lesson contributors. Part of our strategic plan is to support community involvement with lesson archival, lesson releases, lesson maintaince, lesson development, and other tasks related to curriculum. Finances The Software Carpentry Foundation is financially responsible for the organization. The Steering Committee will continue to govern the financial model and budget. We have designated a financial reserve to promote healthy and fiscally responsible operations. Hiring staff To achieve our mission, the steering committee will oversee the process of creating staff positions. We will evaluate the extent to which senior staff are carrying the strategic initiatives. Senior staff will be responsible for evaluating junior staff members. Timeline for implementing our goals Immediate concerns (now-six months) Resolve issues with Windows Installer Reconcile and resolve restructuring with Data Carpentry Identify immediate concerns for lesson development and maintenance (with an eye towards a restructured Carpentries) Create a curated list of resources for learners to continue building skills following workshops Connect with members and community organizers, to develop local/regional communities Medium term (6-18 months) Populating the map (in terms of workshops, instructors, members) with an emphasis on diversity Create feedback loops for improving lessons, including lesson templates Local community building Assess inclusiveness of online communities Long term (18+ months) Focus on communities of practice (for learners, instructors, trainers) Solidify pathways of involvement for community members, associated with a mastery rubric of skills Improve methods of documenting and recognizing contributions to the community (e.g., badging) Read More ›

Apply to Become a Carpentry Instructor Trainer!
Erin Becker / 2017-05-31
The Carpentry community is growing! This month we welcomed ten new Instructor Trainers to our community. Now we are looking for the next group of new Trainers. Carpentry Instructor Trainers run instructor training workshops, lead online teaching demonstrations, and engage with the Trainer community about how best to train new instructors. Trainers are also actively involved in developing and maintaining the instructor training curriculum. We meet regularly to discuss our teaching experiences and stay up to date on policies, procedures, and changes to our curriculum. The Trainers are an eclectic group. Some of us have formal training in pedagogy, some are experienced Carpentry instructors, others run Carpentry-like trainings as part of their jobs, and others pitch in on their own free time. We all share a commitment to helping new instructor trainees become familiar and comfortable with Carpentry teaching practices and principles. More detailed information about what Trainers do can be found here. Trainers-in-training meet one hour a week for eight weeks to engage in a series of discussions around teaching pedagogy and creating welcoming classroom environments. After completing this part of the training, new Trainers shadow a teaching demonstration and part of an online instructor training event. Trainers-in-training also attend regular meetings of the Trainer community. This group of Trainers will start meeting in July and be eligable to teach instructor trainings by September. If you’re interested in joining the Trainer community, please apply here! Applications will be open until June 14th. If you have any questions about the training process or the expecations for being a Trainer, please get in touch with Erin. Read More ›

Summary of May Community Call: Restructuring the Carpentries
Kate Hertweck / 2017-05-19
Thanks to those of you who attended the May Community Calls, the topic of which was restructuring of Software Carpentry (SWC) and Data Carpentry (DC) into a single unified (umbrella) organization, with SWC and DC continuing as lesson organizations within the umbrella. As the Chair of the SWC Steering Committee, I am one of the representatives from each organization tasked with developing a plan of action for how the restructuring will proceed over the next ca. six months. I am gratified by how much our community cares about its future, and what we (as community) believe is important to target as we proceed in planning. The following items highlight major discussion points synthesized from both community calls, and a complete set of notes is available on the etherpad. Organizational members represent major stakeholders in our community. Given a majority of organizations have signed joint memberships with SWC and DC, we expect the transition to a restructured organization will be smooth, with scheduling of workshops proceeding as normal. Representatives of each member organization currently comprise the SWC Advisory Council, and we are currently assessing how to maintain connections with member organizations in a restructured Carpentries. Scalability of a restructured organization is clearly an area of interest throughout our community. An umbrella organization represents an ability to not only combine the assets and advantages of both DC and SWC, but also allows for planning to expand operations into different disciplines, geographic areas, lesson modules etc. We are mindful of a need to balance opportunities for growth with continuing to provide high-quality workshops. Local communities are a cornerstone of scalable growth. Finding ways to bolster local and regional groups of learners and instructors represents an essential step in maintaining cohesiveness and identity as we continue to promote both the umbrella organization as well as individual lesson organizations. Other lesson developers besides DC and SWC (e.g., Library Carpentry) are already thinking about how they might fit in to the restructured umbrella organization. Given the restructured organization was designed specifically with this type of growth in mind, we are currently considering what mutual expectations and onboarding might look like in these cases. Instructor Training is currently in high demand throughout our community, and represents one of the major deliverables to our organizational members. Instructors certified for either DC or SWC are currently eligible to teach workshops for either group. When combined with the focus of instructor training on effective pedagogical practices, the restructuring process should not change how we train instructors. Please note that we have only begun to lay out the basic framework for the restructured organization, and the points above serve to highlight areas of interest from the community calls. I’m excited about taking this feedback to the rest of the group involved in planning, and look forward to continued reporting on this process. Read More ›

Three Instructors, Two Coasts and One Spatula
Juliane Schneider / 2017-05-17
In the space of a year, interest and participation in the Library Carpentry community has exploded like an amoeba who over-ate at an algae banquet and attempted one too many pseudopods. For Library Carpentry, though, this is a good thing; the pseudopods are propelling us forward across institutions, disciplines, and continents. The community, grounded in collaborative tools like Github and Gitter (I always want to type Glitter) is coalescing around lesson development and holding new workshops. Why is the buzz so strong? I think it’s a combination of relentless energy from people like Belinda Weaver and Tim Dennis (to name just a few), the acceptance and active encouragement of new people who want to contribute in some way, and the mutual recognition by all of us that in any one thing, we are all absolute beginners, and we all give each other permission to be terrible until we aren’t. I am still terrible at Github and command line and use Tim’s Github workflow post every time I work with Github - seriously, this is Github workflow gold, people. I am less terrible at OpenRefine and will happily show anyone how to rearrange columns because OpenRefine hides that function in a super weird place. There’s a Library Carpentry Sprint coming up on June 1, with sites worldwide contributing to new lessons, and updating/improving existing ones. There’s this powerhouse woman in Australia called Carmi Cronje (what is it you Aussies are drinking, the way you get stuff done?) who is Githubbing the hell out of the prep for it. Go, Carmi! I encourage anyone with an interest in the Library Carpentry community to check out the Sprint and find a way to participate. Remember, Library Carpentry is a no-judgement zone. My cat will judge you, but the LC community will not. West Coast, Library Carpentry Instructor Training Led by the intrepid and hilarious Belinda Weaver and Tim Dennis, with help from John Chodacki and me, 28 very enthusiastic participants were inducted into the Library Carpentry instructor community over two days in Portland, OR. Back when I took the instructor training at UC Davis, I came to it with no experience in instruction whatsoever. By the end of the two days, I was faintly confident in my instructor skills, had learned pedagogical things that seemed obvious but were so not obvious and jumped right into organizing a week-long Library Carpentry workshop with Tim Dennis because Tim happened to have the biggest library conference room reserved and nothing to use it for. I have a feeling this group will be doing the same, and the beauty of OpenRefine will resound throughout the land. The best part about observing my first instructor training from the helper side was the abject terror that the first mention of the video exercise produced. And the second best part was how much confidence they all gained by realizing how good they really were, and how supportive, constructive suggestions for improvement actually sustained that confidence instead of undermining it. Community, yo! On the morning of the workshop, Belinda decided she needed a speaking stick, so we claimed a wooden toy spatula from the daycare room next to the classroom. It was a very popular form of speaking stick, and people were using it as a fake microphone, sometimes without thinking about it which was effective and also hilarious. Here is John Chodacki, using the spatula in a mindful way. This workshop happened because UC3 and the csv,conf came up with the support to make it happen, so thanks, UC3 and csv,conf! East Coast, Boston Library Carpentry Workshop Having conquered the West Coast, Library Carpentry invaded Boston to teach a one-day workshop that included sections on data basics, jargon busting, shell/bash, line command, regular expressions and OpenRefine. [The workshop was actually held across the river, at MIT in Cambridge.] There was no corporeal spatula, but it was there in spirit. Belinda started off discussing the threat that algorithms present to those of us in the info workforce, and how learning new skills can help us move into new roles. Then we did the jargon busting exercise, which was fantastic! The Boston crowd came up with terms I hadn’t yet seen surface in this section, like flag/parameter/option/argument and Bayesian and they were enthusiastic users of the etherpad. Some of the best etherpad use I’ve encountered in an LC event. And then there was regex. Oh regex. So useful, so painful to actually look at! But so useful … The challenge with teaching regex is that to truly understand its power, you need to see it work against a block of text or data. I’m currently developing an after-workshop exercise that students can use to really dig into regex, and look at why an expression returns the strings that it does. So stay tuned for that! The most interesting section was the bash/shell lesson, with Belinda using the text of Little Women to demonstrate how to count how many times a specific word appears. If you haven’t had the pleasure of watching Belinda do her thing, you need to! I learned so much about effective instruction just by watching her engage with the room. She makes everyone instantly comfortable in learning complex new concepts. And, as often happens, a discussion ensued about the context of using bash/shell in libraries. There are definite use cases for using command line, including file management and text evaluation, but is there a more direct line to library work? Are we missing the definitive use case that would drive home why librarians should use this method of working with a computer, or are we not presenting its core benefits clearly enough? (Hint: I don’t know). This discussion comes up every time I’ve seen it taught, so it is something to consider (and has been raised as an issue to fix in the sprint). OpenRefine was, as always, a joy to teach, and the Boston audience got me out of a muddle when I lost my head and forgot the GREL string for extracting the JSON from CrossRef. I learn something every time I get up there and do instruction, and am consistently impressed by the kindness of the people in our workshops. We’re in this together! I look forward to taking the next step, learning how to teach OpenRefine wikidata reconciliation. Finally, I’d like to thank our helpers and our hosts, the amazing Kate Nyland of Yale (and NEASIST), Thomas Hohenstein of Boston University, Daina Bouquin of the Wolbach Library at Harvard, Joshua Dull of Yale, and Christine N Malinowski and Olimpia Estela Caceres-Brown of MIT. This happened through the generosity of NEASIST. Woo NEASIST! Read More ›

1 - 15 May, 2017: Lesson Printing, Instructor training, Successes of the Carpentries.
Martin Dreyer / 2017-05-15
##Highlights Printing our lessons has never been easier! Sign up for the community call about the restructuring of the carpentries. ##Tweets Heartworming to hear about the success of Software and Datacarpentry from Dr Karli Jordan. Share your fun coding applications to inspire learners. ##General The second in-person Instructor training for South Africa took place and was very successful. What are the reasons why we volunteer our time so enthusiastically toward the Carpentries. 16 workshops were run over the past 30 days. For more information about past workshops, please visit our website. Upcoming Workhshops: May Albert Einstein Science Park, Potsdam, Thomas Jefferson National Accelerator Facility ,McMaster Software Carpentry Workshop, Macquarie University,West Virginia University (WVU), University of Arkansas,State Water Agencies, University of St Andrews, UC San Diego Library, Griffith University. June Curtin University, The University of Washington eScience Institute, IRI Life Sciences, Macquarie University, Queen Mary University of London, UW Madison, University of Auckland, UQ Winter School, University of Würzburg. September University of Würzburg. Read More ›

What is the reward for empowering others?
Rayna Harris / 2017-05-14
Following a few round-table discussions with Software Carpentry instructors, I gained some insight into what drives so many of us to volunteer our time so enthusiastically. I came to the conclusion that many of us share a sense of accomplishment when we empower others through mentoring and education. Software Carpentry gives us a fountain-full of opportunities to contibute through community developed lessons, community supported workshops, and world-wide community conversations. We start out volunteering a little when there is time, but we soon carve out more and more space in our schedule for Software Carpentry events and activities that provide the opportunity to empower others. I can’t help but wonder if this is driven in part by a lack of reward for empowering others in traditional academic settings. Compare the motivation of learners in a required semester-long class introductory class to learners in a intense two-day workshop teaching best practices for domain-specific research. Learners in a two-day workshop are more highly motivated. I prefer to teach the highly motivated because I’m more confident that it enhances education and research. Furthermore, we all know that many professors are rewarded for research while paid for teaching (something I know but still don’t quite comprehend). For many of us, Software Carpentry is where we hear most clearly that our teaching really is valued. I wonder who else shares a rewarding sense of accomplishment for empowering others. I’m also curious to know what other commonalities unite our global community. What are your thoughts and comments on the subject? Read More ›

May Community Call: Restructuring the Carpentries
Kate Hertweck / 2017-05-14
The Software Carpentry Steering Committee (SWC SC) announced back in February that we were initiating discussion with Data Carpentry about potentially merging into a single organization. Members from the SWC SC and the Data Carpentry Board have been meeting regularly since then, identifying the advantages and logistical challenges associated with a restructured organization. More recently, we passed a motion approving the following structure: “A Carpentry umbrella organization and a set of Lesson Carpentries. The umbrella organization is primarily responsible for finances, administration, and facilitating policy. Each Lesson Carpentry is primarily responsible for lesson development and maintenance.” This motion represents the first step towards a restructured Carpentries organization, with the respective community governance structures continuing to solidify details over the next few months. We are excited to share our vision and goals with you during the May Community Call. Mark your calendar for May 18 and sign up on the etherpad the etherpad! Read More ›

Instructor Training in South Africa 2.0
Aleksandra Pawlik, Kari Jordan, Anelda van der Walt / 2017-05-09
One of the main goals of our community is to make Software and Data Carpentry workshops accessible for all researchers from all disciplines and all regions of the world. In the past three years workshops in South Africa and other African countries have started gradually picking up and I think we may soon see exponential growth! This has been made possible thanks to a lot of effort and hard work from a number of amazing people, including Anelda van der Walt. In 2014 we realised that the biggest obstacle to running workshops in Africa, was the lack of qualified trainers on the continent. Since then we’ve put significant focus on building a local instructor pool to enable research organisations in the region to run Software and Data Carpentry training. In 2015 several instructors were trained online. Last year North West University at Potchefstroom hosted the first in-person Instructor Training with attendees from South Africa and other African countries. Just last week, between 30th April and 2nd May, we (Anelda, Kari Jordan and Aleksandra Pawlik) ran another Instructor Training in Cape Town with 28 participants. The whole group was incredibly varied with expertise ranging from life sciences, engineering, mathematics, and astronomy to library and information sciences. We had almost 50-50 gender balance (42% female vs 58% male to be precise). The participants also represented a variety of organisations. Several of them came originally from other African countries and were already discussing ideas of bringing the workshops over to their motherlands. We secured a fantastic venue at the Takealot offices with plenty of space, a coffee machine at our own disposal, and a stunning view of Table Mountain. (Photo credit: Maryke Schoonen) Like that view? Well, here’s more. On the second day of the workshop (Monday, 1st May) we were joined by the instructors who were trained last year at the North-West University campus and those trained online in 2016. Their participation in the 2017 Instructor Training was part of a 12-month programme. The programme included a full cycle starting off with Instructor Training, supporting the checkout process, helping out with organising and hosting workshops, and finally bringing the participants back together for a catch up a year later. (Photo credit: Maryke Schoonen) Indeed the programme proved to be a great success building strong foundations for computational training in South Africa and Africa. It also leveraged the power of our community with a number of international people stepping in to provide mentorship for newly trained instructors. Last but not least, bringing together the three cohorts of instructors last week in Cape Town was an incredible networking opportunity. We started with an evening mixer on Monday and carried on with more structured networking activities the day after. As a result, there is now a series of workshops planned for South Africa and several other African countries. We are really grateful to everyone involved in growing the capacity for training in computational skills for research in Africa. The 2017 Instructor Training and co-located events were made possible thanks to North-West University’s eResearch Initiative, Talarify, DIRISA, Takealot and DHET through the Rural Campus Connectivity Project II. Read More ›

How to print our lessons?
Raniere Silva / 2017-05-08
From the time that I joined Software Carpentry, I remember two feature requests that often showed up on our list of issues. The first one was links to help navigate the lessons and the second one was a way to print the lessons. Adding these two features was challenging because the limited information that is available as metadata by Jekyll (or Pandoc, which we used for a while). In addition, we want to have a “real” solution, instead of a workaround that is going to stop work in a few months. Although as software developers we knew that at every release of Jekyll, or any other library, that we are using exist a chance that our pipeline will break. At some point, Greg Wilson managed to implement the navigation using Jekyll Collections, introduced in Jekyll 2.0.0, after the last review of our template. But until this month, instructors and learners had to print each episode of our lesson individualy. We are happy to introduce our “All episodes in one page”. How to Use Access the lesson you want to print. At the navigation bar, on the top, open the “Episodes” menu. Click on the last option, “All in one page (Beta”). If the option isn’t available yet it will be soon. We are still merging the pull request that include the new feature. In your web browser, select the “Print” option and proceed to print the lesson as any other document. If your web browser offers the option to save the document as PDF you can use it to read the lesson off-line. Bonus: If you are using Chrome 59 or higher you should be able to get the PDF of the lesson using $ chrome --headless --disable-gpu --print-to-pdf http://swcarpentry.github.io/your-favorite-lesson/aio/ Next Steps Our CSS rules for the print version need some improvements. If you want to contribute with it contact us. Technical Details After we play around with Pandoc and XPATH and other methods to have the lesson in PDF and EPUB we decided to those approaches were incompatible with our pipeline that depends on GitHub Pages and because of it we should try to come up with a solution that only use Liquid. Since part of the lesson was stored on the YAML header and need to be compiled by Jekyll we end up with a difficult challenge. Fortunately, we can use Javascript not only to animate web pages but also to query servers for data (or episodes in our case) and inject the new data into the page. For this reason, the solution that we implemented is 100% powered by Javascript and isn’t available to web browsers without a Javascript engine. The code running to generate the all episodes in one page is window.onload = function() { var lesson_episodes = [...]; var xmlHttp = []; /* Required since we are going to query every episode. */ for (i=0; i < lesson_episodes.length; i++) { xmlHttp[i] = new XMLHttpRequest(); xmlHttp[i].episode = lesson_episodes[i]; /* To enable use this later. */ xmlHttp[i].onreadystatechange = function() { if (this.readyState == 4 && this.status == 200) { var article_here = document.getElementById(this.episode); var parser = new DOMParser(); var htmlDoc = parser.parseFromString(this.responseText,"text/html"); var htmlDocArticle = htmlDoc.getElementsByTagName("article")[0]; article_here.innerHTML = htmlDocArticle.innerHTML; } } episode_url = ".." + lesson_episodes[i]; xmlHttp[i].open("GET", episode_url); xmlHttp[i].send(null); } } Suggestions to improve the Javascript are welcomed. Read More ›

Plans for Windows Installer
Raniere Silva / 2017-05-04
In March, we had two email threads on our discuss mailing list related to nano on Windows machines The first thread was about “nano not found”, a bug on our installer that we never managed to trace. The second thread was about some “misbehaviour” of nano on Windows, and the suggestion to use Atom as the default text editor. The suggestion to use Atom started a long discussion where instructors described their reasons to use nano and why Atom is inadequate which lead us to start investigating ways to install nano properly on Windows. On May 3rd, Kate Hertweck, Maneesha Sane, Naupaka Zimmerman, Rémi Emonet and I met online to draft a plan—taking into consideration all of the feedback that was provided in the past month—to solve the problem with nano on Windows. Plan We will work to use Git for Windows SDK to compile our own installer that will include bash, Git, nano, SQLite and make. In the future we can work to include man pages and Jekyll. Acknowledgement Thanks very much to everyone that contributed with the discussion on the mailing list, GitHub issues and other places. We had a great resource for anyone that wanted to investigate other options for different projects. FAQ What Software Carpentry is looking for? A novice-friendly command-line text editor for use (primarily) during the shell and Git episodes of Software Carpentry workshops that works across Windows, macOS and Linux distributions. The installation of this command-line text editor must be easy or transparent to install along with the other tools we ask learners to have before showing up. Where can I review the background materials that were considered in the developemnt of this plan? discuss mailing list Survey Best strategies for Windows installation Add installation instructions for Atom Replace Git Bash with Cygwin Replace Git Bash with conda Replace Git Bash with MSYS2 Community Call Which versions of Windows will the installer support? The installer will support Windows 7, Windows 8 and Windows 10. The end of extended support to Windows 7 by Microsoft is January 14, 2020 We don’t have data about how many learners are using Windows 7, so we believe it would be unfair to not include Windows 7 at the first release of our installer. Do we have a date where the new installer will be available? Not yet. Software Carpentry staff and Steering Committee are looking at efficient and sustainable ways to implement the recommendations. No change necessary if you already use your own custom installer or teaching environment? If you are already using an installer you’ve created for your own systems or environment, you do not need to make any changes. Read More ›

1 - 30 April, 2017: Library Carpentry, Instructor training, relevant to Learning.
Martin Dreyer / 2017-04-30
##Highlights Great news for Library Carpentry, we won the British Library Labs Awards 2016. University of Puerto Rico taught an Instructor Training Workshop and it was special in many ways. Libarary Carpentry along with Mozilla science lab will have a lesson sprint in June 2017. ##Tweets Have you considered making a donation to Software Carpentry? Author Carpentry a research training initiative complementary to Software and DataCarpentry. A great atricle for a Software Carpentry Workshop. To folks in Software Carpentry: please take the survey in this blog post and sign up for the community calls! ##General The Carpentries stay relevant and optimized for learning because it is intergrated into the community. Please share your thoughts on the software used to teach the shell lesson. 12 workshops were run over the past 30 days. For more information about past workshops, please visit our website. Upcoming Workhshops: May Pacific Northwest National Laboratory, NIH Bldg. 10 Library,The University of Leeds, Scion Research and NeSI,University of Pennsylvania, University of Edinburgh,Oklahoma State University, University of Alberta,McMaster Software Carpentry Workshop, Macquarie University,West Virginia University (WVU), University of Arkansas,State Water Agencies, University of St Andrews,Griffith University. June IRI Life Sciences, Macquarie University, Queen Mary University of London, UW Madison, University of Auckland, UQ Winter School. September University of Würzburg. Read More ›

Library Carpentry sprint in June
Belinda Weaver / 2017-04-26
Lesson maintenance is necessarily an ongoing task, especially with fledgling lessons where workshops taught reveal gaps or issues with existing material. Development of new lessons may also spring from the same source. Who hasn’t thought, after teaching: “Wouldn’t it be great if we could add x?” But lesson maintenance and new development are probably best done as a shared activity, which is why the Library Carpentry community has signed up to work on lessons intensively during the 2017 Mozilla Science Lab global sprint on 1-2 June. Check out our sign up etherpad for information about our plans. This is a call for librarians - and archivists too - to join the sprint. You don’t need to be a coder to help out. Perhaps you can be a site host - finding a room for people to work together. Or maybe you have ideas about what should be tackled, or even a scenario or a dataset that could be used in a workshop. Perhaps you have a task you have always wanted to automate? Or maybe there is something about how you work that frustrates you? Chances are that other people find the same snags in what they do, and would welcome a new way out of that frustration maze. Ideas are gold for sprinters, so we really welcome ideas on what to teach and how to teach it. At this stage, we plan to work on consolidating our SQL and web scraping lessons and building a new Python lesson. We would also welcome input on our more established lessons such as Data Intro and OpenRefine. See the full list here. How can you get involved? Join the sprint - even if only to say hi or suggest things we should do (we love feedback) Add to the issues we are tackling here Pop into the chatroom and tell us what you want/don’t want Publicise the sprint in your library or workplace Encourage people to join us for some or all of the sprint Follow Library Carpentry on Twitter and help spread the word. So far, we have five sites committed to host people working on the sprint - two in the UK, one in Australia, one in the Netherlands and one in the US but we are hoping Canada, South Africa and New Zealand will make it eight (and there is always room for more). People are also signing up to work remotely, which is good as we are seeing new people joining up. Please get on board ! Read More ›

Instructor Training in Puerto Rico
Rayna Harris, Sue McClatchy, Tracy Teal / 2017-04-24
On March 24-25, Rayna Harris, Sue McClatchy, and Tracy Teal co-taught an instructor training workshop at the University of Puerto Rico (UPR). This was a very special workshop in many ways, and we are excited to share some of the highlights with you. Also, be sure to check out an accompanying blog post by Humberto Ortiz Zuazaga about the combined Replicathon and Instructor Training events. Unique aspects of the workshop Our instructor training event was co-located with a “Replicathon”, which was a 2 day-hackathon built around reproducing the analysis published in some recent high-profile journal articles. Having two simultaneous events really gave the feeling of a “critical mass” for building a community of researchers who are passionate about using and teaching reproducible research practices. Rayna and Sue have been working together on the mentoring committee for 2 years, but didn’t meet face-to-face until March 24. One of the really amazing features of the Carpentries is that sometimes your closest colleagues live thousands of miles away. About the trainees Eleven of the twelve trainees were faculty members from various UPR campuses, and one trainees was from a private company. The trainees were born and raised in Puerto Rico, Venezuela, Colombia, and the Ukraine, so English was everyone’s second language. All the trainees were very excited to meet other faculty members with similar challenges and opportunities. They were all very motivated to enhance their teaching skills and implement new tools and techniques in their classrooms. Together, they have great ideas for building a data-literate community, and they really care about the success and progress of their students. Modifications to the curriculum We planned on starting the workshop with an introduction to the Carpentries and Carpentry teaching practices because we knew that none of the attendees had participated in a Carpentry workshop. We had to modify this plan slightly when we were asked if our trainees could attend Tracy’s keynote lecture for the Replicathon during that time slot we had devoted for the introduction to the Carpentries. In the end, Tracy’s keynote did an excellent job highlighting the history, vision, mission, and accomplishments of the Carpentries. It sparked a lot of enthusiasm and provided a great foundation for the rest of the training. You can view our workshop schedule here: https://smcclatchy.github.io/2017-03-24-ttt-UPR-RP/ At the end of the workshop, we took a few minutes to go around the room and have everyone say what they are excited about for the future. (This exercise is pretty standard for the weekly instructor discussion sessions, but it is not part of the instructor training curriculum.) Since most of the discourse during a workshop happens in the Etherpad, it was great to hear something positive from everyone. They also showed real enthusiasm for building communities and teaching, and you could tell that all the trainees has a positive experience. Trainees response to the curriculum The trainees really enjoyed getting feedback from their peers, which served to increase their network and improve their teaching skills. They also said that the material on motivation and demotivation resonated particularly well. We received a lot of suggestions from the group on how to improve the typical workshop lesson to make them more approachable (such as having an overview that is separate from the agenda, having a rationale for each lesson in addition to the questions and learning objectives). The trainees pointed out a few places where we used idioms that did not translate well and had to be explained. This is an ongoing topic of discussion, and we are working to evaluate the lessons to minimize use of idioms. Advice for new instructors I wish that I knew what I know now when I was younger. Here are a few pieces of advice we have for new instructor trainers: Have trainees pick their lesson for live coding before going home on day 1. The live coding exercise can be particularly challenging when the trainees don’t fully prepare. By asking them to choose the lesson on the end of day 1, everyone can go into the exercise a little more prepared. When soliciting responses in the Etherpad, type everyone’s name on a new line so that the trainees know where to put their response and so that the instructor’s can gauge how students are processing with the challenge exercise. Introductions are crucial. As the instructor, be sure to articulate your qualifications for teaching the curriculum (which are different from the qualifications you would articulate when teaching R or Python). Also, the trainees really want to meet the other trainees, so be sure that they all introduce themselves to each other. Acknowledgements Thanks to Erin Becker, Jonah Duckles, Kari Jordan, Maneesha Sane, and Greg Wilson from Software Carpentry and Data Carpentry for helping make instructor training an awesome thing. Thanks to the group of instructor trainers for collaboratively building the train-the-trainer curriculum. Thanks to the Carpentry community members for your enthusiastic support of events like these. Thanks to Humberto Ortiz Zuazaga, Yamir Torres, Jose Garcia-Arraras, and Patti Ordóñez for welcoming us into their community in Puerto Rico. Read More ›

Software tools for unix shell: Survey and April community call
Kate Hertweck / 2017-04-14
The Software Carpentry Code of Conduct includes a somewhat tongue-in-cheek reference to nondiscrimination based on “choice of text editor.” While a choice of software may seem trivial, this particular discussion has led to more than one heated exchange among programmers. In fact, the Discuss list in March was very active concerning issues with the Windows installer, which provides access to the nano text editor through Git for Windows (used in both the Software Carpentry Shell and Git lessons). That discussion eventually transitioned to consideration of alternative software tools we might consider using and moved over to GitHub, where you can read more about the opinions and options voiced by participants. To help us understand what our community thinks about the software we use for teaching the shell lesson, we’ve developed a short survey to gather information. Please share your thoughts! You may find it useful to peruse the summary and links below before taking the survey. The Community Call for April will be dedicated to a discussion of software tools used while teaching workshops, with a specific focus on the shell lesson. Please join us on Thursday, April 20 in either of two sessions to hear about the results of the survey and share your thoughts on what tools we should introduce to learners. The rest of this post includes a quick summary of the discussion regarding the installer. We’ve had increasing reports lately of problems with the installer failing to function as expected, especially in relation to nano. Tracy Teal administered a quick survey and found that, despite these issues, folks tend to like the installer. Given that a few years have passed since this tool was built by our community to teach workshops, it’s worth revisiting how we could improve the tools used for the benefit of both instructors and learners. The following suggestions have been offered as ways to resolve the logistical problems we’ve been facing lately while teaching shell at workshops. While we’re specifically discussing software used to teach the shell lesson, we also acknowledge that some of these tools could also be used for installation of other software as well. Windows installer: Modify or update to reduce installation problems in workshops. On top of the standard suite of tools provided by Git for Windows, the installer ensures learners have access to nano, SQlite, and make, with all easily accessible in path (it’s worth noting that the latter two tools are only used in workshops specifically about them, which have not often been taught in recent past). Create a new custom package that includes nano using: Git for Windows SDK MSYS2: discussion here conda: discussion for how it would work in workshops here Cygwin: installation instructions for workshops, with extra discussion here Atom: a stand-alone text editor which would abandon use of nano altogether, although this is problematic for workshops focusing on the use of HPC resources The challenge associated with utilizing particular software tools during our workshops is that we must balance multiple (sometimes conflicting) needs on the part of both instructors and learners. Some of these considerations that are especially releavant for the shell lesson include: Ease of use for learners: Novice programmers should be able to use it without too much effort, and it should be able to handle a breadth of activities throughout the workshop. Ease of installation: Learners should be able to install the software without much (if any!) trouble, and it should work as expected for the duration of the workshop. Minimal installation: Some options above allow installation of multiple tools at once, which makes getting the workshop set up easier. Propensity of learners to continue use: If learners will continue developing their coding skills, the tools we show them should be something that will be useful even months after the workshop. Similarity across platforms: installation of nano is a non-issue for Mac/Linux computers, and a basic workaround for a lack of nano on Windows is using notepad (which easily opens from the command line in Git for Windows). This means that learners will be using two different commands while editing files, which can be confusing for learners and instructors. Favored by instructors: Instructors need to be comfortable with the tools they are using to teach. If it’s not their top choice of software, they should at least be comfortable enough to help learners troubleshoot. If you have other thoughts on this conversation, please take the survey or feel free to comment on the relevant GitHub discussions linked above. Read More ›

Optimised for Learning
Anelda van der Walt / 2017-04-07
In 2014 I fell in love with Software Carpentry. I wasn’t quite sure what it was that was that appealing about the workshops, but I knew we had to run more of these in South Africa. I had been following the activities and blogs of Software Carpentry since 2012 when I started working at a next generation sequencing (NGS) facility as lead (only) bioinformatician. Working with our clients I found that most of them had no idea how to deal with NGS data and very often didn’t know how/where to access computing infrastructure that would allow them to analyse their data. But in 2014 we finally ran our first Software Carpentry workshop in Cape Town. It was a big one. Over 100 participants including instructors and helpers. Two rooms. Teaching Python, version control (git), and the shell at “beginners” and “intermediate” levels. The buzz over coffee time gave me the feeling that we had done something good. There were people from several provinces all over South Africa, from a plethora of research disciplines, institutions, and career stages - all talking, buzzing, exclaiming about their research and how they didn’t realise there were others at their own institutions, in their own fields, but also in very diverse fields, grappling with similar challenges in terms of research computing and data analysis and capacity development. Since then I have been thinking about the Carpentry workshops and community a lot. I was trying to identify the core elements that made me love it so much. That made me join a mentoring session at 01:00 in the morning or stay away from my kid for a week or longer to run workshops in strange parts of the country without being paid for it. What was it about this community and initiative that made it almost addictive to be a part of? By chance I started reading Thank You For Being Late – An Optimist’s Guide to Thriving in the Age of Acceleration by Thomas L. Friedman last week and something I came across in his book made me think… Friedman refers to a conversation he had with Eric Teller, CEO of Google’s X research and development laboratory. Teller was explaining to him that the rate of change of technology is outstripping the rate at which we, as human beings, can adapt. No surprises there - we’re all trying to catch up with life. But Teller also said in his interview with Friedman, according to the book, that the only way in which human beings can “come to equilibrium” again and find peace in this fast changing world, is by enhancing our adaptibility (the rate at which we can adapt). And that, Teller said, is “90% about optimizing for learning”. By chance again I was in a meeting today where we were discussing the establishment of a newly funded Research Centre. Amongst other things, we were talking about the type of training that should be delivered by this Centre. I explained the value of building on what has been done in Software/Data/Library Carpentry and how that could be tied in with community building (see e.g. Mozilla Science Lab Study Groups) to facilitate post workshop (continuous) learning which would not be dependent on the single “expert” instructor who graciously visits from afar to impart his/her expert knowledge upon which he/she returns to their institution never to be heard of again because life (or the next workshop) swallows them up. That made me realise… The Carpentry model has been around since 1998. It has been adapted and adapted again to stay in touch with what is needed by the learners, not with what can be taught by the teachers. It is optimised for learning and it is optimising our learners for learning as well. Carpentries are staying relevant because it builds on the shoulders of giants, it learns from everyone in the community, be it learner, host, instructor, helper, or the leadership. And then, even more importantly, it acts on what it learns - Issue Bonanzas, Bug BBQs, Lesson Sprints… No-one has to invent their own Powerpoint slides with stolen images and arbitrary examples the night before the workshop. We can start from where the last iteration of teaching stopped. We can join conversations about workshops that took place yesterday, last week, last month, in several countries and on various continents. We learn from people in STEM fields as much as we learn from Social Sciences and Humanities and we apply what we learn across domains and across the traditional “researcher”/”support staff” barrier. You would be crazy not to want to use what is available and build on that in a world where disruptive technology make the world look significantly and uncomfortably different within 5 - 7 years (according to Teller in Friedman’s book). Thank you to everyone in the Carpentry community for allowing us to stand on your shoulders. We’re not only teaching our learners some recipe for data analysis, but we’re teaching them to learn, and at the same time, we’re learning to keep on learning ourselves. Read More ›

What's new in Library Carpentry
Belinda Weaver / 2017-04-06
There is lots of current Library Carpentry activity. Our big news is Library Carpentry won the British Library Labs award in 2016 for teaching and learning. The prize money has since been used to fund Library Carpentry workshops in the UK, with one in Sheffield just completed, and another upcoming in Taunton in June. Tim Dennis (UCSD) and I are running Software Carpentry instructor training for 30 librarians in Portland, Oregon, on 4-5 May, after csv,conf. I am then teaching Library Carpentry at MIT with Juliane Schneider (ex-UCSD, now working for Harvard) on 15 May. We have helpers for that workshop from Boston University, Harvard and Yale. We are running a Library Carpentry project again this year as part of the annual Mozilla Science Lab global sprint on 1-2 June. This is our sign up etherpad. So far, we have five sites committed to host people working on the sprint. People are also signing up to work remotely. We will stay in touch via the chatroom and daily hangouts. We plan to work on SQL, Python, and web scraping lessons, rebuild our main Library Carpentry page, and to work out a way to record and track workshops better. We are also hoping to make Library Carpentry work for archivists. We are currently seeking feedback on what archivists want through this form. I have been invited by the Australian Society of Archivists to teach a Library Carpentry workshop as a tie-in to their annual conference in September, so I hope to incorporate some of the suggestions already made. Here is more information on what we hope to tackle in the sprint. So far, there have been about 15 Library Carpentry workshops since last year’s global sprint provided the impetus for this new community. Our chat room is very active and issues are constantly debated on GitHub. Australia, Canada, the US, the Netherlands, the UK and South Africa are the main places for activity with New Zealand about to join the bandwagon. There is an upcoming workshop in South Africa in May, as well as plans for one in Ottawa a bit later on in the year. Hopefully there will be many more workshops in the US after the Portland instructor training mints 30 new instructors. How can you get involved? Join the sprint - even if only to say hi or suggest things we should do Pop into the chatroom Request a workshop through the contact form on this page Follow us on Twitter. Read More ›

Our first work cycle - Prometheus
Erin Becker / 2017-03-26
We’re wrapping up our first work cycle! Here’s what we accomplished over the past six weeks and what we’re still working on. To help with any of these projects, please get in touch! Planning for Data Carpentry Ecology Lessons Release What did we do? This cycle we started the process for Data Carpentry’s first lesson release - for our Ecology lessons. During the Issue Bonanza 3/16-3/17, fifteen energetic members of our community and staff submitted a total of 134 issues forcus on items related to our lesson release checklist. Data Carpentry staff and maintainers will be going through these issues in preparation for our Bug BBQ (4/6-4/7). Stay posted for announcements about the Bug BBQ! What are we still working on? Over the next cycle, we’ll be working on documenting the process for future lesson releases and planning for our next lesson release (the Data Carpentry Genomics lessons). How can you help? Sign up for the upcoming Bug BBQ to help polish up the Ecology curriculum for publishing. Even if you can’t “attend” please help clean up issues in those repos! We’d love to hear your thoughts about this new lesson publication process. Please send any feedback to Erin (ebecker@datacarpentry.org) or Tracy (tkteal@datacarpentry.org). Streamlining Process for Instructor Training What did we do? This cycle we focused on simplifying processes, increasing our capacity for training new instructors and giving our Trainers opportunities to share expertise and learn from each other. We simplified our processes for scheduling instructor training events and tracking progress of trainees through checkout to make more efficient use of our volunteer Trainer time. We set up regular meetings for our Trainer community, including discussion meetings to share expertise about teaching instructor training events. We started training ten new instructor Trainers, who will be joining the Trainer team this summer. We also developed documentation for communicating with partners about training events and followed up with pending instructor training applicants. What are we still working on? We’re working on building our capacity for offering training to non-partner affiliated individuals and improving our documentation for Trainers running instructor training events. How can you help? If you’re interested in becoming an instructor Trainer, please contact Erin (ebecker@datacarpentry.org). New hire We’ve received a large number of highly qualified applicants and are working on scheduling first round interviews. Keep an eye out for more news! Setting an Assessment Strategy What did we do? This cycle Data Carpentry overhauled our pre- and post-workshop surveys to include measurements of self-efficacy and skill with R or Python. We’ll be piloting these surveys over the next few months. We also developed and released a long-term follow-up survey for learners who attended workshops six months ago or more. What are we still working on? In October we released a report of Data Carpentry’s impact on learners. We’re now working on a report for Software Carpentry. Stay posted! How can you help? If you were a learner at a Carpentry workshop over six months ago, please fill out our new survey by April 4th and be entered in a drawing for a Data Carpentry swag bag. Lesson Contribution Guidelines What did we do? We surveyed community members about their experiences with contributing to Carpentry lessons and asked for ways that we can make this process more straightforward. We received 54 responses with a wealth of suggestions. What are we still working on? We organized the feedback we received and are working on understanding the best way to implement these suggestions. How can you help? If you’d like to be part of the team developing new documentation and resources for lesson contributions, please contact Erin Becker (ebecker@datacarpentry.org). Our next cycle - Cycle Deimos - March 27th through May 19th I hope you’ll agree that we accomplished a lot over the past six weeks! Our next cycle is also looking to be action-packed and exciting. Stay tuned for an announcement of what’s coming up! As always, if you there’s something you’re excited about and would like to see, post your idea to our Conversations repo or get in touch. This post was originally posted at Data Carpentry Read More ›

Get Involved With Mentoring
Christina Koch / 2017-03-08
Are you a new Software or Data Carpentry instructor? Do you remember what it was like to be a new instructor? Are you interested in improving your own teaching skills? Do you want to connect with other instructors to share teaching ideas and experience? The mentoring committee is a group of Software and Data Carpentry community members who organize initiatives to support instructors and we’re looking for new members. The mentoring committee meets once a month to discuss current activities and new ideas, and we help organize the weekly instructor discussion sessions hosted on this etherpad. We would love to expand our activities and for all of our activities to be a true community-driven and community-owned effort. To make that happen, we need community members to join us! Anyone can be involved – the only criteria for membership is interest in how we can better support instructors and connect them with each other. Here are some ways to get involved: Join the mentoring mailing list and/or attend next committee meeting (to be announced on this etherpad ). Sign up to help with some of our main endeavors as described on this page. If you have questions or suggestions about the mentoring committee, Christina Koch, outgoing committee chair, will be holding two informal FAQ sessions on Monday, March 13. There is information about these on the community calendar. We will also be selecting new positions in the committee next week – if you’d like to vote (or stand for a position!) please get in touch. Read More ›

1 - 28 February, 2017: Inclusivity, Career Pathway Panel, Community Development Lead, BaseCamp.
Martin Dreyer / 2017-02-28
##Highlights The Carpentries stand for inclusivity and freedom of movement for each and every volunteer, researcher, or professional in industry. If you have been affected by any action or policy that is preventing you from partaking in the carpentries in any way, please contact us. The second Career Pathway Panel will be held on 1 March 2017, please register. ##Tweets Have you considered making a donation to Software Carpentry? ResBaz 2017 was a big success! Thank you to all that participated. Need a helping hand to get started with coding? Have a look at our website. Have you subscribed to our monthly newsletter? ##Jobs Software Caropentry and Data Carpentry are hiring a Community Development Lead. ##General The Carpentries have adopted a new work process to better the progress on projects, and it is based on BaseCamp’s six week work cycle. 26 workshops were run over the past 30 days. For more information about past workshops, please visit our website. Upcoming Workhshops: March University of Technology, Sydney, EPSRC & MRC Centre for Doctoral Training in Regenerative Medicine, University of Manchester, McGill University, Winona State University, University of Freiburg, University Of British Columbia, University of Colorado Boulder, University of Oxford, Dartmouth College, Oklahoma State University, Brock University, The University of Washington eScience Institute, Technical University Munich, Washington State University. April Massey University Auckland & NeSI. Read More ›

A Year to Build a Software and Data Carpentry Community at the University of Florida - The Impact of a Local Instructor Training Workshop on Building Computing Capacity
Matthew Collins, François Michonneau, Brian Stucky, Ethan White / 2017-02-22
This January was the one year anniversary of our effort to bring regular Software Carpentry and Data Carpentry workshops to the University of Florida. These workshops are aimed at helping students, staff, and faculty gain the computing skills they need to be successful in our data-driven world. The Carpentries are international organizations that provide materials, instructor certification, and organization of multi-day workshops on basic software development and data analysis tools. In January 2016 a Software Carpentry instructor training workshop held at the University of Florida Informatics Institute provided the start of our efforts. Since then, instructors trained here as well as experienced instructors already in the UF community have held four workshops, reaching 98 participants, including 70 students, 14 staff and 11 faculty. The participants received training in programming languages like R and Python, version control with Git and GitHub, SQL database queries, OpenRefine, and Excel spreadsheets. Graph of participants’ status at UF and word cloud of departments our participants hail from ( https://www.jasondavies.com/wordcloud/) Such a robust and recurring workshop pattern is uncommon in the Carpentries community (but not unprecedented) and it is a result of the generosity and volunteerism of a combination of staff, faculty, students, and organizations at UF. Together we recognized that members of the UF community did not have enough opportunities to get hands-on experience with the software development and data analysis tools they need to be effective researchers, employees, and future job-seekers. In response, we have established a highly collaborative process for giving our fellow UF community members, whether they are students, staff, or faculty, this opportunity. Our Year of Workshops Though UF has a longer history with the Data and Software Carpentry communities, the start of this current program was an instructor training workshop held in January 2016 at the UF Informatics Institute (UFII). Dr. Ethan White provided funds (through a grant from the Gordon and Betty Moore Foundation) for UF to become a Software Carpentry Foundation affiliate member and to run an on-site training for instructors. Fourteen people from UF attended the 2016 workshop, 5 came from other Florida institutions, and 4 from elsewhere in the US and Canada. As a result of this workshop, 8 participants from UF became newly certified instructors for Software or Data Carpentry. Today there are a total of 10 active instructors at UF. Several existing instructors, including Matthew Collins from the Advanced Computing and Information Systems Lab and Dr. François Michonneau from the Whitney Laboratory for Marine Bioscience, with the help of the newly trained instructors, then approached the director of the UF Informatics Institute, Dr. George Michailidis, for logistical support to run a Software Carpentry workshop in March 2016. While it was very successful, only 16 participants of 31 who signed up attended. We did not charge a registration fee, so we believe that many people simply did not show up when another commitment arose. For our second workshop, held in August 2016 just before the start of the semester, Alethea Geiger from the UFII worked with the UF Conference Department to set up an account and a registration page that accepted credit card payments. We were able to charge a $30 registration fee which allowed us to pay for lunch during the workshop. This amount appears to strike a good balance between using a registration fee to encourage attendance and cover catering costs while not imposing serious financial hardship for participants with limited funding. However, the Conference Department web site did not let us smoothly deal with waitlists and capacity caps, and over the first weekend we had more than 35 people sign up for the workshop. In order to accommodate everyone, the Marston Science Library generously offered a larger room for the workshop. Everyone who registered attended this workshop. In October 2016, we held our third workshop using the Data Carpentry curriculum. At this workshop we had the honor of having Dr. Kari L. Jordan as a participant. Dr. Jordan was recently hired as the Data Carpentry organization’s director of assessment and this was her first experience at a workshop. The registration process worked smoothly this time and were able to use the UFII conference room for the workshop and catering. Our most recent event was another Software Carpentry workshop held at the UFII in February 2017. What it Takes This group’s volunteered time as well as the coordination and support of three existing instructors and the logistics supplied by the Informatics Institute have made it possible to reliably host Carpentry workshops. It currently takes about 8 hours for the lead instructor to arrange instructors, helpers, and announcements and to respond to attendee questions. The staff at the UFII spend another 8 hours managing registration and preparing the catering. Instructors spend between 4 and 12 hours preparing to teach depending on whether they have taught the lesson before. Helpers who are already familiar with the content of the lessons usually don’t need further preparation but new helpers spend 4 to 8 hours reviewing lessons and software installation instructions. Combined, each workshop takes about 40 person-hours of preparation and over 80 person-hours to host. With the exception of the UFII staff, this time is all volunteered. How do we keep people volunteering? There are a number of factors that go into maintaining volunteers’ motivation and momentum. We didn’t plan these in advance but now that we have them in place, we recognize them as the reasons we can continue to keep our community engaged and excited about putting on workshops. Instructor density - have enough instructors to get 3-6 people at each workshop without burdening anyone’s schedule Instructor cohesion - just like we suggest learners attend workshops with a buddy, instructors who come to the instructor training from the same department or discipline immediately make their own community of practice Instructor mentorship - a core group of senior instructors to guide initial workshops (note the plural) so new instructors can focus on the teaching experience without the logistical burdens Professional staff - find staff who organize workshops as part of their job to share the overhead of coordinating logistics Institution-level support - a single research lab or department doesn’t have enough people to do this on its own, doing it for the whole institution fits the needs of everyone and shares the work Follow-through - have supporting events and communities available for people to keep learning and keep their experience with the Carpentries fresh in their minds when it comes time to look for more instructors and helpers Community Building After the Workshops Some of the instructors have also been involved in creating and helping communities of learners on campus grow outside of workshops. Dr. Michonneau started a Meetup.com group for the Gainesville community focused on R. M. Collins is an advisor to the UF Data Science and Informatics student organization which holds about 12 evening workshops each semester focused on building data science skills for UF students. In spring 2017 Dr. Daniel Maxwell , Informatics Librarian for the Marston Science Library, re-invigorated the UF R Users mailing list and is holding weekly in-person drop-in sessions. These venues allow former workshop participants to continue learning the skills taught in the Carpentry workshops. They provide a space where participants can ask questions of and interact with their peers when they start using the tools taught in the workshops for their own research. This ongoing communal engagement is proving to be a key factor in making sure workshop participants continue to develop their abilities. UF’s Impact on the Carpentry Community UF has a long history and deep connections to the Carpentries. Data Carpentry was originally imagined during the 2013 COLLAB-IT meeting between the IT members of iDigBio (a large NSF-sponsored project centered at UF) and the other NSF biocenters. The attendees of this two-day workshop found that one important need shared by the biocenters was a training program for researchers, focused on the novice, to develop software skills and data literacy for analyzing their data. Some attendees were involved with Software Carpentry and decided to develop a curriculum based on Software Carpentry’s teaching principles. Dr. White, as well as iDigBio staff including Deborah Paul, Dr. Michonneau, and M. Collins were instructors, helpers, and attendees at the prototype Data Carpentry workshop held in May 2014 at NESCENT facility at Duke University. The second official Data Carpentry workshop was put on by the iDigBio project right here at UF. Since this first engagement with the Carpentries, many other members of the UF community have participated in Software and Data Carpentry workshops across the country. Not all have participated in this most recent effort to run workshops here on campus and some have moved on to other institutions but they have all contributed to UF being a valued organization in the Carpentries community. In addition to building its own workshop infrastructure, UF is helping to advance the Carpentry programs in the US and globally. Dr. White is a founding Data Carpentry steering committee member, a member of the Software Carpentry Advisory Council, and has developed a semester-long course based on Data Carpentry materials that he has taught twice as WIS6934 through the Department of Wildlife Ecology. Through the iDigBio project and support from Dr. White, M. Collins and D. Paul have taught workshops in Nairobi, Kenya and Santa Clara, Costa Rica before the Biodiversity Information Standards conferences in 2015 and 2016. M. Collins has also served as a mentor to instructors trained during the South African instructor training and along with D. Paul has more recently become a member of the formal Carpentry mentorship program providing on-going support to new instructors across the country. Going Forward The success of our group has been the result of the serendipitous meeting of interested UF community members, an existing international teaching community, and informal funding and infrastructure support. We are now looking for a way to formalize UF’s commitment to building capacity in informatics skills for its staff, students, and faculty through an on-going structure. To start this process, a consortium of labs and institutes at the University of Florida have combined resources to sponsor a joint Gold Partnership with Software and Data Carpentry going forward. The UF partners are Dr. White’s lab, the UF Biodiversity Institute (via Dr. Pamela Soltis), iDigBio (via Dr. Soltis), and the UF Informatics Institute (via Dr. Michailidis). This partnership will provide annual instructor training opportunities to grow the instructor community To continue the rest of the key parts of our success, we still need: A UF department or institute to adopt the goal of informatics capacity building for the UF community. An individual to be given the task of coordinating this goal across UF. Continuous funding and resources to provide for a pipeline of people capable of meeting this goal. We believe UF has a unique opportunity to create a sustainable effort that cuts across individual departments and research labs. While existing on-book courses and department-specific programs are available, we have shown that there is need for hands-on, community-led informatics skill development for everyone on campus regardless of affiliation or discipline. By approaching this need at the university level we can maintain the critical mass of expertise and motivation to make our staff more productive, our students more employable, and our faculty’s research more innovative. Acknowledgements The following people have been active members of the UF instructor community and have volunteered their time in the past year by participating as instructors or helpers during the recent workshops: Erica Christensen (*) - Ernest Lab, WEC Matthew Collins - Advanced Computing and Information Systems Lab, ECE Dave Harris (*) - White Lab, WEC Allison Jai O’Dell (*) - George A Smathers Libraries Sergio Marconi (*) - White Lab, WEC François Michonneau - Martindale Lab, Whitney Laboratory for Marine Bioscience Elise Morrison (*) - Soil and Water Sciences, IFAS Deborah Paul (*) - Institute for Digital Information, Florida State University Kristina Riemer (*) - White Lab, WEC Henry Senyondo (*) - White Lab, WEC Miao Sun - Soltis Lab, FLMNH Brian Stucky (*) - Guralnick Lab, FLMNH Shawn Taylor (*) - White Lab, WEC (*) Trained at the January 2016 UF instructor training workshop The following entities have contributed material support to our workshops or the Carpentries communities: Advanced Computing and Information Systems Lab, Electrical and Computer Engineering Earnst Lab, Wildlife Ecology and Conservation Soltis Lab, Florida Museum of Natural History University of Florida Biodiversity Institute University of Florida Informatics Institute White Lab, Wildlife Ecology and Conservation We would also like to thank the incredible support provided by Alethea Geiger, Flora Marynak, and Deb Campbell at the UF Informatics Institute. They have managed the space, catering, registration, and financial aspects of our workshops for us and their services are the main reason we can provide so many workshops. Read More ›

Beginning the conversation: Potential merger with Data Carpentry
Kate Hertweck, Rayna Harris / 2017-02-22
Newcomers to our community frequently request clarification regarding the distinction between Software Carpentry (SWC) and Data Carpentry (DC). SWC and DC are two independently established and operated organizations that share a common goal of promoting education in reproducible science skills, both in data literacy (DC) and software development (SWC). Despite our separate organizational structures, SWC and DC maintain close ties, and have begun moving over the last year or so towards increased connectivity. Staff interact with leadership and members of both communities, and are sometimes shared hires between both groups. We offer joint institutional memberships. We’ve worked together to implement shared policies, and have released statements which reflect our commitment to shared values. Given the increasing levels of integration between SWC and DC, the SWC Steering Committee passed a resolution at a recent meeting to “begin discussions with representatives from DC leadership about a potential merger” between these two currently independent organizations. We are excited about the potential these discussions hold in forging a strong, cohesive community that will continue to promote the goals of both organizations, and look forward to sharing our ideas with you in coming months. Read More ›

Carpentries Career Pathways Panel - Marianne Corvellec, Bernhard Konrad, Aleksandra Pawlik
Lauren Michael / 2017-02-22
Wednesday, Mar 1, 3pm PST / 6pm EST / 11pm UTC / 9am AEST (next day) On Wednesday, March 1, the Carpentries will host the second of three Career Pathway Panels, where members of the Carpentry communities can hear from three individuals in careers that leverage teaching experience and Carpentry skills. (Note: The date of this second panel was shifted from the originally-proposed date of Feb 22 due to scheduling considerations.) Anyone who has taught at a Carpentry workshop in the last three months is invited to join, and should register by Monday, February 27 in order to be invited to the call. Registration is limited to 20 people per session, so please only commit if you are sure you will attend. Attendees can register for any number of these sessions. Each session will last one hour and will feature a different set of panelists. The final session will occur on Tuesday, March 21 at 3pm PST (panelists TBA). For the March 1 session, we are excited to be joined by the below panelists! Marianne Corvellec Marianne earned a PhD in statistical physics in 2012. She now works as a data scientist at CRIM, a semi-public research centre in Montréal, Canada. She specializes in data analysis and software development. Before joining CRIM, she worked at three different web startups. She speaks or teaches at local and international tech events on a regular basis. Her interests include data visualization, signal processing, inverse-problem approach, assessment, free software, open standards, best practices, and community management. Bernhard Konrad Bernhard attended a SWC workshop in 2012 during his graduate studies, and was immediately fascinated by the world of opportunities and productivity that these software tools opened up. He taught a dozen workshops since, and started to work on software-related personal side projects. Bernhard then went to Insight Data Science, a Data Science fellowship in Silicon Valley. After interviewing with a few companies and after a complicated work permit process, he started his job as a Software Engineer at Google in early 2016. There, he develops internal tools for engineering productivity. Aleksandra Pawlik Aleksandra Pawlik is the Research Communities Manager at the New Zealand eScience Infrastructure (NeSI). Before joining NeSI in 2016 she worked for three years at the University of Manchester, UK for the Software Sustainability Institute where she was leading the Institute’s training activities. Software and Data Carpentry has been always a big part of her professional activities and allowed Aleksandra develop a range of skills, understand the research ecosystem and meet a number of amazing and inspirational people. Read More ›

Job Opportunity: Community Development Lead
Tracy Teal, Jonah Duckles / 2017-02-21
Software Carpentry and Data Carpentry are hiring a Community Development Lead! We are excited to announce a position as a full-time staff member to lead community development activities! Software and Data Carpentry have an active global community of researchers, volunteers, learners and member organizations. This person will cultivate and grow this community, developing communication strategies and opportunities for the community to connect with and support each other. You will become an active member of our team of staff and will work with people around the world to advance our mission of building a community teaching digital research skills to make research more productive and reproducible. As the Community Development Lead, you will oversee Software and Data Carpentry’s community engagement efforts to develop and support the community, creating pathways for participation and increased communication. You will lead blog, newsletter and social media efforts, help develop online resources, participate in the mentorship subcommittee and help facilitate the development of regional groups. You will also have the opportunity to guide efforts to reach underserved communities and to be involved in instructor training. For details, including a full job description and the application procedure, please see the Jobs page. This is a joint Software and Data Carpentry position and is cross-listed on both websites. Read More ›

How we're getting things done
Erin Becker / 2017-02-16
Adopting work cycles The Data and Software Carpentry staff have been working together to make progress on projects that are important for our community. To help us do this, we’re trying out a new work process based on BaseCamp’s six week work cycle. You can read their blog post if you’re interested in the details of how structuring a work cycle works. We’re picking a small handful of projects to focus on for each six week cycle, with each staff member working on one or two projects. For each project, we’re setting realistic goals we know we can accomplish before the end of the cycle and holding ourselves accountable to meeting those goals. We’re spending the first two weeks of the cycle planning those goals, dividing up the work into teams, and setting timelines to make sure we stay on track. We envision this workflow having some specific advantages, including: Reducing clutter and letting us focus on making progress. Making it ok to say “we can’t tackle this right now, but that’s an important project, can we do it next cycle”? Making it possible for busy community members to be involved without having to commit time indefinitely. (No commitments after the cycle ends!) Bringing staff time and resources together with community enthusiasm. Giving us a structure for regularly communicating what we’re working on with the community at large. Providing passionate community members more opportunities to get involved. We’re still working out some of the details of how working in cycles will work for us, but we’re excited to share our plan for the first round. If there’s something you’re excited about for the next round, let us know! If you’d like to join (or organize) a team for one of the next few cycles, let us know! Please post an issue on our conversations repo or email ebecker@datacarpentry.org. Our first cycle - Cycle Prometheus (January 23rd - March 17th) Our first cycle started at the end of January and goes through the middle of March. Here’s what we’re hoping to accomplish in our first cycle. Planning for Data Carpentry Ecology Lessons Release Tracy, François Michonneau, and Erin are working on Data Carpentry’s first lesson release! In addition to starting the process for releasing our Ecology lessons, we’re also working on setting up a process for future lesson releases. Based on Software Carpentry’s success with the Bug BBQ last year, we’re planning an Issue Bonanza to coordinate community effort on preparing the lessons for release. Keep your eyes peeled for announcements and ways you can contribute! Streamlining Process for Instructor Training Erin and Maneesha are continuing Greg’s instructor training work and are updating the instructor training program process for organizing training events and tracking trainee progress from training through checkout. We’re simplifying how we schedule instructor training events and putting together resources for instructor trainers. We’re also streamlining the process of tracking instructor trainees to make more efficient use of our staff and volunteer time. Lastly, we’re exploring our needs for new instructor trainers and planning the recruitment and training process. If you’re interested in becoming an instructor trainer, please email Erin so we can keep you in the loop about future plans. New hire Tracy, Jonah and Kari are working a new hire for Software and Data Carpentry. Posting coming Monday, February 20th, so keep your eye out for more information! Setting an Assessment Strategy Kari is developing a strategy for both near-term and long-term assessment of Data Carpentry workshops. She’s putting together new pre- and post-workshop surveys for learners at Data Carpentry workshops that will be piloted starting in April, as well as a long-term assessment for learners from previous workshops to be piloted by mid-March. She’s also cleaning up code and formalizing a template for regular quarterly data releases on assessment efforts. We need more Data Carpentry workshops to pilot our new surveys! Please consider organizing a workshop at your institution in April. Let us know what we can do to support you in getting a workshop set-up. Please email Maneesha. Lesson Contribution Guidelines Erin, Mateusz Kuzak, Aleksandra Nenadic, Raniere Silva and Kate Hertweck are working on making it easier for new instructors and other community members to contribute to lesson development. We’re reaching out to the community to understand roadblocks people may have with the development process, and then developing new documentation and resources to help reduce these barriers. We’re collecting feedback from all of the various discussion threads and GitHub issues. Please keep commenting there, and stay tuned for more opportunities to give us feedback! Continuing Work We’re also continuing to work on our many ongoing projects, including (but not limited to): Publishing our monthly newsletter Running our blogs Maintaining our websites and lessons Coordinating workshops and instructor training events Teaching at workshops and instructor training events Hosting discussion sessions and instructor teaching demos Speaking publically about Data and Software Carpentry Running our Virtual Assessment Network Organizing our Mentorship Program Serving on the mentoring subcommittee, trainers group and bridge subcommittees If you’re interested in helping with any of this ongoing work, or would like to make suggestions about what to tackle in our next cycle, let us know! Please post an issue on our conversations repo or email ebecker@datacarpentry.org. Our next two cycles will be: Cycle Deimos - March 20th through May 12th Cycle Phobos - May 15th through June 23rd Read More ›

Standing for Inclusivity
Carpentries Staff and Steering Committees / 2017-02-02
Our goal as Software and Data Carpentry is to build a community teaching digital research skills. We teach essential computing skills to thousands of researchers in academia and industry each year. As an international organization we rely on the volunteer efforts of hundreds of researchers and professionals from around the world. Our volunteers come from diverse backgrounds, countries of origin, and beliefs. These individuals generously donate their time with the goal of helping to speed the discovery of new knowledge and the creation of new technology. Actions and policies that arbitrarily restrict the movement of peoples based on their beliefs, national origins, race, ethnicity, sexual orientation, gender identity, or any other intrinsic class contradict one of Software and Data Carpentry core values: providing inclusive and supportive environments for all researchers. These harmful policies send a message to the highly-trained individuals who participate in and teach workshops that they and those like them are not welcome in the country where they collectively volunteer the majority of their time. They also put traveling volunteers at risk of being stranded far from their homes and families with little or no warning. These restrictions negatively impact our ability to teach others, collaborate and conduct scientific discourse and affect the advancement of research of all types. We stand with those that have been harmed, both directly and indirectly, by any such actions or policies. If you are a researcher who is stranded and could use a local contact, contact us, and we will work to connect you with volunteers in our global network. Read More ›

Moving Forward
Erin Becker / 2017-01-31
As of January 30th, Greg Wilson has stepped down from his role as Director of Instructor Training to start a new position as Shopify’s Computer Sciences Education Lead. Instructor training will continue under the guidance of Erin Becker, Data Carpentry’s Associate Director and Maneesha Sane, Data and Software Carpentry’s Program Coordinator. Erin has a strong background for this role from her postdoc at University of California, Davis studying the effectiveness of training methods for transforming instructional practices. She has been involved with the Carpentry community as an instructor trainer, a member of the Mentorship Subcommittee, and leader of the effort to form an instructor mentoring program. Maneesha is the Carpentry Program Coordinator, and serves as an active Carpentry instructor and member of the Mentorship Subcommittee. Maneesha’s hard work behind the scenes keeps Carpentry workshops running smoothly. She will now bring her expertise to coordinating instructor training events. Erin and Maneesha have worked actively with Greg to ensure a smooth transition. We are conducting instructor trainings as scheduled, and are planning new events with Member Organizations. We will continue our efforts to train and support instructor trainers and build the instructor training program. If you have any questions about instructor training, including the status of your institution’s planned training event, please contact us at admin@software-carpentry.org. Read More ›

15 - 31 January, 2017: JupyterCon, Steering Committee 2017, North West University, Programming skills .
Martin Dreyer / 2017-01-31
##Highlights We are please to present to you the Steering Committee of 2017. North West University becomes the first African Partner of Software and Data Carpentry. ##Tweets Have you considered making a donation to Software Carpentry? Jupytercon anounced for 2017! Programming skills can help improve your research efforts. ##General We have set out a rubic to rank requests for online instructor training to ensure the spaces are filled. The Career Pathways Panel has started and will have a session every month, please join us. Have you signed up for our monthly newsletter? 18 workshops were run over the past 30 days. For more information about past workshops, please visit our website. Upcoming Workhshops: February University of Oslo, Simon Fraser University, New York Academy of Sciences, New York Academy of Sciences, University of Toronto, University of Texas at Arlington, Conestoga College, AMOS / MSNZ Conference, UF Informatics Institute, University of Auckland, The University of Queensland - Python, The University of Queensland - R, Federal Reserve Bank of Kansas City, Library of Congress, The Jackson Laboratory, Boise State University, Queen’s University, University of Ottawa. March University of Colorado Boulder, Brock University, https://konrad.github.io/2017-03-29-munich/. Read More ›

Announcing the 2017 Steering Committee
Jonah Duckles / 2017-01-30
I’m pleased to announce our new steering committee for 2017. The new steering committee is comprised of Kate Hertweck Rayna Harris Christina Koch Mateusz Kuzak Karin Lagesen Sue McClatchy Belinda Weaver Turnout and Voting Tallies We had 31.6% turnout representing 176 of the 557 eligible voters casting a ballot in this election. Official election results are available from electionbuddy. We thank our outgoing steering committee members I want to thank our outgoing steering committee members for their service. They’ve helped us to grow the impact that Software Carpentry can have in the world in a stable and sustainable way. Jason Williams Bill Mills Raniere Silva Thank you gentlemen! Hope to see you around in subcommittees and at workshops in the future! Read More ›

Carpentries Career Pathways Panel: Raniere Silva, Geneviève Smith, Tiffany Timbers
Lauren Michael / 2017-01-20
Tuesday, January 24, 7am PST / 10am EST / 3pm UTC / 2am AEST (next day) On Tuesday, January 24, the Carpentries will host the first of three Career Pathway Panels, where members of the Carpentry communities can hear from three individuals in careers that leverage teaching experience and Carpentry skills. Anyone who has taught at a Carpentry workshop in the last three months is invited to join, and should register ahead of time in order to be invited to the call. Registration is limited to 20 people per session, so please only commit if you are sure you will attend. Attendees can register for any number of these sessions. Future, monthly, panel sessions will occur on different days and at different times. Each session will last one hour and will feature a different set of panelists. For the first session, we are excited to be joined by the below panelists! Raniere Silva Community Officer at the Software Sustainability Institute, UK. I’m Brazilian, just completed my year living abroad, and my background is applied mathematics. Most of the time I select Python as the tool that I will use to solve my tasks but I’m jealous of those who use RStudio. My dream is that South America host as much Carpentries workshop (Software Carpentry, Data Carpentry, Library Carpentry, …) as US, UK and Australia. Geneviève Smith I’m the Head of Data Science at Insight, where we run training programs for quantitative PhDs who want to move into careers in data science, data engineering, health data, and AI. Prior to joining Insight I did a postdoc and earned my PhD in Ecology, Evolution & Behavior from UT Austin. My research focused on the role of competition in structuring ecological communities of species through a combination of field-based experiments and theoretical modeling. During my time in grad school I participated in multiple Software Carpentry workshops, volunteered at a few, and trained to be an instructor. Those experiences were critical in my development as a coder and helped me gain confidence while building evidence of my computational skills. Tiffany Timbers Tiffany Timbers received her Bachelor of Science in Biology from Carleton University in 2001, following which she completed a Doctorate in Neuroscience at the University of British Columbia in 2012, which focused on the genetic basis of learning and memory. After obtaining her doctorate, Tiffany carried out data-intensive postdoctoral research in behavioural and neural genomics at Simon Fraser University (SFU). During this time, she also gained valuable experience teaching computational skills to students and scientists through her work with Data and Software Carpentry, the SFU scientific programming study group, and teaching a course in computation in Physical Sciences at Quest University. Tiffany began her current teaching role in the University of British Columbia Master of Data Science program in the summer of 2016. Read More ›

South Africa's North-West University Becomes Software and Data Carpentry’s first African Partner
Anelda van der Walt / 2017-01-20
In November 2014 the first large-scale Software Carpentry event was run in South Africa as part of the eResearch Africa conference in Cape Town. Since then 15 more Software, Data, and/or Library Carpentry events were run by the Southern African community across many disciplines and several institutions. The North-West University has been heavily involved in further developing the Southern African Carpentry community. In 2015 NWU led the development of a 12-month proposal that kicked off in April 2016 with the first South African in-person instructor training event. Since 2015 NWU has been involved in four internal Software and Data Carpentry events as well as four events run at other Southern African institutions. The university currently has five qualified instructors as well as two preparing for check-out. Instructors hail from diverse disciplines such as genomics, digital humanities, chemistry, and IT. At the end of 2016 the NWU entered into a gold partnership with Software and Data Carpentry. The partnership marks the beginning of a new phase of capacity development around computing and data at the university, it is the culmination of months of hard work, exciting workshops, and interesting conversations with colleagues from all over the world. The NWU Chief IT Director, Boeta Pretorius, has been the main sponsor for Carpentry activities around the university and hope that the partnership will help to develop and enhance computational research skills amongst NWU researchers and postgraduate students while developing increasing numbers of local instructors. The training events have been run as part of the NWU eResearch Initiative which commenced in 2015. We look forward to continue our collaboration with Software and Data Carpentry and with you, our community! Read More ›

Software Carpentry Steering Committee Candidates 2017
Jonah Duckles / 2017-01-15
2017 Steering Committee elections and January Community Call We are pleased the announce the 2017 Steering Committee elections, which will occur from January 23-27. All members will receive a ballot via email to cast their vote via electionbuddy. Be sure to look at the membership list and let us know if you feel you’ve been left off by mistake; membership details are here. Community Call As a preface to the election, we are using our January Community Call to allow the candidates time to introduce themselves. We have seven outstanding candidates to fill the seven Steering Committee seats. Our governance rules require us to hold the election, albeit uncontested, but we also feel this is a vital way for the community to exhibit commitment to the organizational leadership. Please sign up to attend one of two meetings on January 20 to hear about our candidates’ plans for 2017. What would you like to see the Steering Committee accomplish? What are you excited for Software Carpentry to tackle this year? 2017 Steering Committee Candidate blog posts Linked under each candidates name is their blog post about their plans for Software Carpentry if elected. Kate Hertweck Rayna Harris Christina Koch Mateusz Kuzak Karin Lagesen Sue McClatchy Belinda Weaver Read More ›

1 - 15 January, 2017: CarpentryCon, Steering Committee Elections, Rubic for online Instructor training, TalkPython.
Martin Dreyer / 2017-01-15
##Highlights If you are interested in attending or helping with the planning of CarpentryCon, please sign up! Steering Committee elections will take place on January 23-27, and all members will be receive a ballot via email. Please look at the membership details to ensure you get the email. ##Tweets Have you considered making a donation to Software Carpentry? Talkpython episode #93 Spreading Python through the sciences with Software Carpentry. Huge thank you to the instructors who joined the new mentoring program! 8 best practices to improve your scientific software. Sign up for the Collaborations Workshop 2017. First DTL programmers meeting for 2017 scheduled for January 20 in the Netherlands. ##General We have set out a rubic to rank requests for online instructor training to ensure the spaces are filled. Our executive director Jonah Duckles was interviewed by TalkPython in December 2016, listen to the interview here. Please sign up for the January community call meetings on January 20 to hear the Steering Committee candidate’s plans and visions for 2017. Have a look at the candidate blog post for the 2017 Steering Committee. 12 workshops were run over the past 30 days. For more information about past workshops, please visit our website. Upcoming Workhshops: January LCG-Aula 1-2, Centro de Ciencias Genómicas, UNAM, University of Michigan, NERC / The University of Bristol, Langley Research Center, Imperial College London, Neuroscience Research Australia, ResBaz Hobart, Python Stream, ResBaz Hobart, R Stream. February Simon Fraser University, New York Academy of Sciences, University of Toronto, University of Texas at Arlington, AMOS / MSNZ Conference, UF Informatics Institute, University of Auckland, Federal Reserve Bank of Kansas City, Boise State University. Read More ›

Announcing for Lesson Infrastructure Subcommittee
Raniere Silva / 2017-01-13
One of the issues that we have with the styles, lesson-example, and workshop-template repositories is that some issues or pull request just sit around for a long time because of the lack of ownership of those repositories. Going on with the Proposal for Lesson Infrastructure Subcommittee, that will try to solve that issue, we would like to announce that Subcommittee calendar for 2017. Month Activities February Lesson Infrastructure Subcommittee meeting. April Lesson Infrastructure Subcommittee meeting. May Repositories freeze for release. June Lesson release and Lesson Infrastructure Subcommittee meeting. September Lesson Infrastructure Subcommittee meeting. October Repositories freeze for release. November Lesson release and Lesson Infrastructure Subcommittee meeting. If you are the maintainer of a Software Carpentry or Data Carpentry and want to vote on issues or pull request on styles, lesson-example, or workshop-template repositores please email Kate Hertweck or Raniere Silva until January 27th 23:59 Pacific Time. Once we have the list of Lesson Infrastructure Subcommittee members we will find a suitable time for meeting between February 6th and February 10th. We will continue to welcome issues and pull request to styles, lesson-example, and workshop-template and making our lessons a community effort. Read More ›

Software Carpentry on TalkPython
Raniere Silva / 2017-01-05
On January 3rd, 2017 TalkPython published a podcast interview with our Executive Director, Jonah Duckles, that was recorded on December 6th, 2016. If you have friends who are interested to know about Software Carpentry and are fan of listen podcast (or not) the record is a great way for them to discovery what we are and how we operate. In the past, our staffs were interviewed in other podcasts by Open Science Radio and Podcast.init that podcast fans should check out. Read More ›

Announcing the CarpentryCon Proposal
Alix Keener, Rayna Harris, Greg Wilson / 2017-01-04
To date, we have not yet had an opportunity to bring together all the members of our communities. Enter: CarpentryCon! We are proposing a two-and-a-half day conference, tentatively held in May 2017 at the University of Michigan. We envision an event that brings together members of the Carpentry community, including instructors, partners, advocates, and staff, together with people sharing similar interests from around the globe. We will have a “come and learn” format that is different from most conferences with session on topics such as teaching methods, curriculum development, community organization, and leadership skills. Opportunities will be provided for participants to come together informally to share stories about challenges and successes. There will be at least one session where attendees can share how they have incorporated Carpentry techniques into their own research and teaching, and/or how they have grown their local Carpentry community. The final list of sessions will be determined by the program committee in consultation with the community, balancing “who wants to learn what?” with “who’s willing to teach what?”. Interested in attending or getting involved with planning? We’d love to hear from you! Add your name to the list on the etherpad and sign up for one of the planning calls on Monday, January 9. See more details at our working document. Read More ›

Rubric for Open Instructor Training
Greg Wilson / 2017-01-03
The Software Carpentry Foundation’s Steering Committee recently resolved to run four open online instructor training sessions per year in order to help support people whom we otherwise might not be able to reach. Since there are likely to be many more applications than spaces, we have developed a rubric for ranking requests for training. This applies only to people who are applying for spots in open training sessions: people who are receiving training through institutional partnership agreements will continue to be nominated by their institution as before. Note that as a condition of being trained, people must: Abide by our code of conduct, which can be found at http://software-carpentry.org/conduct/ and http://datacarpentry.org/code-of-conduct/. Complete three short tasks after the course in order to complete certification. The tasks are described at http://swcarpentry.github.io/instructor-training/checkout/, and take a total of approximately 8-10 hours. Help teach a Software Carpentry, Data Carpentry, or Library Carpentry workshop within 12 months of the course. Personal (given) name: [________] Family name (surname): [________] Email address: [________] GitHub username: [________] What is your current occupation/career stage? Please choose the one that best describes you. [_] Prefer not to say [_] Undergraduate student [_] Graduate student [_] Post-doctoral researcher [_] Faculty [_] Research staff (including research programmer) [_] Support staff (including technical support) [_] Librarian/archivist [_] Commercial software developer [] Other [_______] Affiliation: [________] Location: [________] [_] This a smaller, remote, or less affluent institution. Software and Data Carpentry strive to make workshops accessible to as many people as possible, in as wide a variety of situations as possible. Award +1 for outside Europe/UK/US/Canada/Australia/New Zealand. or +1 for being in smaller/remote/less affluent institution within EU/UK/US/Can/Aus/NZ. Areas of expertise: [_] Chemistry [_] Civil, mechanical, chemical, or nuclear engineering [_] Computer science/electrical engineering [_] Economics/business [_] Education [_] Genetics, genomics, bioinformatics [_] High performance computing [_] Humanities [_] Library and information science [_] Mathematics/statistics [_] Medicine [_] Organismal biology (ecology, botany, zoology, microbiology) [_] Physics [_] Planetary sciences (geology, climatology, oceanography, etc.) [_] Psychology/neuroscience [_] Social sciences [_] Space sciences Other areas of expertise: [________] Award +1 for being in economics or social sciences, arts, humanities, or library science (domains where we wish to expand). [] I self-identify as a member of a group that is under-represented in research and/or computing, e.g., women, ethnic minorities, LGBTQ, etc. Details: [_______] [] I have been an active contributor to other volunteer or non-profit groups with significant teaching or training components. Data and Software Carpentry. Details: [_______] Optionally award +1 for each response (maximum of +2). How often have you been involved with Software Carpentry or Data Carpentry in the following ways? [_] Helper [_] Instructor [_] Workshop host [_] Learner [_] Workshop organizer [_] Contributed to lesson materials Score +1 for each previous involvement up to a maximum bonus of +3. Previous formal training as a teacher or instructor. [_] None [_] A few hours [_] A workshop [_] A certification or short course [_] A full degree [] Other: [_______] Description of your previous training in teaching: [________] Award +1 for “a certification or short course” or “a full degree” Previous experience in teaching. Please include teaching experience at any level from grade school to post-secondary education. [_] None [_] A few hours [_] A workshop (full day or longer) [_] Teaching assistant for a full course [_] Primary instructor for a full course [] Other: [_______] Description of your previous experience in teaching: [________] Award +1 for “TA for full course” or “primary instructor for full course”. How frequently do you work with the tools that Data Carpentry and Software Carpentry teach, such as R, Python, MATLAB, Perl, SQL, Git, OpenRefine, and the Unix Shell? [_] Every day [_] A few times a week [_] A few times a month [_] A few times a year [_] Never or almost never Award +1 for “every day” or “a few times a week”. How often would you expect to teach on Software or Data Carpentry Workshops after this training? [_] Not at all [_] Once a year [_] Several times a year [] Other: [_______] How frequently would you be able to travel to teach such classes? [_] Not at all [_] Once a year [_] Several times a year [] Other: [_______] Why do you want to attend this training course? [________] What else should we know about you? [________] Award -3 to +3 based on responses to “why do you want to attend” and “what else should we know”. Read More ›

1 - 31 December, 2016: Instructor Training, Community Service Awards, Career paths, Steering Committee Elections.
Martin Dreyer / 2016-12-31
##Highlights We are pleased to announce the first ever Library Carpentry instructor training in 2017. The very first Software Carpentry community service awards have been awarded. Greg Wilson will be taking up a position as Computer Science Educator Lead at Shopify, and will continue to work as a volunteer with the Carpentries. We are exited about the upcoming series of panel discussions on the variety of career paths available for our community members. ##Jobs Department of Physics and Astronomy at UCL (University College London) is looking for a physics teacher with IT skills. ##Tweets Have you considered making a donation to Software Carpentry? 8 best practices to improve your scientific software. A minimum standard for publishing computational results in the Weather and Climate Sciences. How to write a reproducible paper. ##General The 2017 Steering Committee Candidates have shared their stories with us: Sue McClatchy, Rayna Harris, Kate Hertweck, Christina Koch, Mateusz Kuzak, Karin Lagesen, Belinda Weaver. Python is not only for coding serious programs, you can also use it to create art. Communication is the key to any successful foundation, we would like to better our communication between community, staff and Steering Committee. Despite some severe wheather conditions in New Zealand, NIWA had a very succesful workshop. The Steering Committee’s year in review by Rayna Harris. A special thank you to everyone involved in the instructor discussin sessions during the year. Some insight on how SWC and DC can lead you to new and exciting places. 7 workshops were run over the past 30 days. For more information about past workshops, please visit our website. Upcoming Workhshops: January 229th AAS Meeting, Oklahoma State University, NERC / The University of Bristol,University of Nebraska - Lincoln, University of Oxford, The University of Washington eScience Institute, Michigan State University, University of California Berkeley, University of Oklahoma, University of Connecticut, Software Carpentry @ UW Madison, LCG-Aula 1-2, Centro de Ciencias Genómicas, UNAM, University of Michigan, NERC / The University of Bristol, Langley Research Center, Imperial College London, Neuroscience Research Australia, ResBaz Hobart, Python Stream, ResBaz Hobart, R Stream. February Simon Fraser University, University of Toronto, University of Texas at Arlington, AMOS / MSNZ Conference, University of Auckland, Federal Reserve Bank of Kansas City. Read More ›

Career Pathways Panel Discussions
Lauren Michael, Christina Koch, Erin Becker / 2016-12-28
The Carpentries are excited to announce an upcoming series of panel discussions designed to help our community members become informed about the variety of career paths available to computationally literate members of their fields. Panel discussions will be held virtually in the months of January, February and March (tentative dates, below) with each session featuring 3-4 senior community members in Carpentry-related professions, including: tenured faculty, communicators/consultants, research software engineers, industry scientists, etc. Panelists will discuss how their career path led them to their current positions, including obstacles or challenges they may have faced and how they overcame those barriers. Audience members will have the opportunity to submit questions for panelists, and time will be reserved for free-form Q&A. To insure that all attendees have the opportunity to participate, attendance will be limited to 20 participants who have attended a debriefing within the last 3 months. To attend, please add your information to this form. We are currently in the process of recruiting panelists and would love to have recommendations from the community! If you know of someone who would be a good panelist, please recommend them here by Monday, January 9. Anyone with questions can send an email to Lauren Michael (organizer) at lauren1.michael-at-gmail-dot-com. Tentative Dates Tuesday, Jan 24 - 7am PST / 10am EST / 3pm UTC / 2am AEST (next day) Wednesday, Feb 22 - 3pm PST / 6pm EST / 11pm UTC / 10am AEST (next day) Tuesday, Mar 21 - 3pm PST / 6pm EST / 11pm UTC / 9am AEST (next day) Read More ›

2017 Election: Belinda Weaver
Belinda Weaver / 2016-12-23
Hello everyone. I am standing again as a candidate for the Steering Committee, having served on it this year for the first time. About me I have been involved with Software Carpentry for about three years. I organised the first workshop in Brisbane in 2014. I certified as an instructor myself in 2015 and taught at two workshops that year. In 2016, I taught at eight workshops, all of which I either organised or helped to organise. During 2016, I and other Queensland instructors have taken Software Carpentry to five cities in Queensland - Brisbane, Townsville, Toowoomba, Gold Coast and Rockhampton - a huge improvement on the number of workshops run in 2015 (three). Instructor training I organised Software Carpentry instructor training in Brisbane in 2016, and have since seen our local community of instructors grow to 16. I certified as an instructor trainer myself a week ago - very exciting! I look forward to training new instructors in 2017. Admin I currently serve as the Software Carpentry admin for half of Australia, with Damien Irving in Hobart taking the other half. This means helping people in other Australian states and territories organise workshops and keeping AMY up to date with what we’ve done. ResBaz I was one of the organisers of the very successful 2016 Brisbane Research Bazaar, (ResBaz) a three-day research festival for graduate students and early career researchers. Software Carpentry workshops in R and Python were taught there. I am currently helping to organise the 2017 Brisbane ResBaz, which promises to be huge and which has already attracted a number of sponsors. I will teach Software Carpentry there. I also helped the ResBaz folks in Arizona with ideas for their inaugural ResBaz. Library Carpentry In June 2016, I organised a sprint to update and extend the Library Carpentry material created by Dr James Baker and others in the UK. This was part of the 2-day Mozilla Science Lab Global Sprint. More than 20 people in six countries worked on updating the material, and added a new SQL lesson to the existing four. More lessons are in the works and this has almost become a third ‘Carpentry’ - interest is burgeoning, and it won the British Library Labs award in November. There have been about a dozen workshops run since the sprint. Find out more here. I have taught two full Library Carpentry workshops this year as well as teaching parts of it at other events. The community is very active, with an ongoing chat room. New members are welcome. Guiding I did some ‘guiding’ this year - mentoring recent Software Carpentry instructor trainees through the final stages of checkout. I tried this approach on attendees at the Brisbane instructor training and it was effective in getting people to finish (I think 17 out of 20 certified). I then assisted Anelda van der Walt’s South African instructor trainee cohort - running lesson discussion and practice teaching sessions to help them finish. I plan to do the same thing to help attendees at the recent online instructor training I taught check out as instructors. Communications My one disappointment this year has been my inability to take forward work I proposed on improving Software Carpentry communications. I was simply swamped by the tsunami of interest in Library Carpentry (16 requests to teach it, and counting) and that ended up gobbling up a lot of my time. I did promote Software Carpentry tirelessly through tweets, but campaigns I hoped to run did not eventuate. I am still interested in taking that work forward and would be interested to hear from others with experience in that area who might like to participate in it. For 2017 I think it is important to have Steering Committee representation from Australia (and the southern hemisphere more generally). Software Carpentry has really taken off here, and I think I have proved to be an effective community builder for it. In 2017, I would like to continue that work, train more instructors, get more partnerships across the line, if possible, and make sure we extend Software Carpentry workshops beyond the capital cities into the regions. I also plan to work in more with colleagues in New Zealand and South Africa on building their communities. I also intend to help Library Carpentry continue to expand, and will be running instructor training for librarians in Portland, Oregon, in May 2017. Read More ›

2017 Election: Karin Lagesen
Karin Lagesen / 2016-12-23
2017 Election: Karin Lagesen I have been a member of the Software Carpentry Steering Committee for two years, first as secretary and then as vice chair. My involvement with the SCF started in 2012, when I attended a workshop in Oslo, Norway. I signed on as an instructor in 2013 and became an instructor trainer this year. I have around 10 workshops, including instructor training, under my belt. As a member of the SC, I have mainly focused on Software Carpentry operations and on the development of instructor training and instructor trainers. In addition to serving on the Steering Committee, I am also a member of the mentoring committee. Although I have not been able to be as active in the mentoring committee this year, I have previously worked on developing programs focusing on integrating new instructors and on helping new instructors teach their first workshop. I have a PhD in bioinformatics from the University of Oslo, and am currently employed at the Norwegian Veterinary Institute and the University of Oslo. My background is in both computer science and molecular biology. Since I have formal training in both fields, I am frequently the one to translate the biological problem into a computational one. I have often been called upon to teach people with little to no training in computer science how to do their bioinformatics analyses. This means introducing them to Unix, to command-line work and to basic programming. Working in such multi-disciplinary situations has made me very aware of how hard it can be to move into a field far removed from your core area of expertise. This makes the values and skills that Software Carpentry teaches particularly important to me. If re-elected, I will focus on building and maintaining cohesiveness, consistency and continuity within Software Carpentry and its committees, as well as with other organizations. Software Carpentry is growing rapidly and is spreading through both new domains and geographic areas, and is facing great changes as a consequence. In addition, the Director of Instructor, the founder of Software Carpentry, Greg Wilson, has decided to leave the employ of the organization. We are also integrating new instructor trainers into Software Carpentry. Such transitions can be challenging, and I will work to provide continuity and consistency to the organization to ensure that things run smoothly. Last, but far from least, we are part of what has become a “Carpentries” ecosystem, and I believe that working together with and towards a common structure for the “Carpentries” would be of great benefit to all parties. Feel free to contact me on Twitter (@karinlag) or by email (karin.lagesen@gmail.com). I occasionally blog at blog.karinlag.no. Read More ›

2017 Election: Mateusz Kuzak
Mateusz Kuzak / 2016-12-23
Hi, I’m standing up for election to the Software Carpentry Steering Committee 2017. Me & Carpentries My background is life science. I currently work at the Netherlands eScience Center in Amsterdam. Apart from being a developer on various research software projects, I’m involved in Center’s training activities and software development best practices. I got sucked into in Software Carpentry world through Elixir Data Carpentry Pilot organised by Aleksandra Pawlik. During hackathon in Helsinki, together with Francois Michonneau and few other people we kickstarted ggplot2 part in R-ecology-lesson. From day one I learned a lot about the mechanics of SC, lesson development and logistics of the workshops. Instructor At the end of 2015, I attended instructor training in Manchester and became certified instructor shortly after. Since then I instructed at 7 SC and DC workshops around Europe and few more eScience Center workshops based on Software or Data Carpentry materials. Currently, I’m learning towards becoming an instructor trainer. Mentoring Recently I started joining instructor discussions as an observer and I’m planning on becoming host next year. I also joined recently announced mentorship program. Library Carpentry With small group of people in Amsterdam we hosted site for Mozilla Science Global Sprint and joined groups around the world in the effort to migrate Data Carpentry lessons to Library Carpentry. I have been contributing to LC materials since then. Netherlands and Europe I see how powerful Carpentries training model is and how important it is to establish Carpentries communities in the Netherlands and Europe. I have been relentlessly promoting SC, from local Research Software Engineers meetings (DTL programmers meetings) to conferences (BioSB). I helped establish eScience Center and Software Carpentry Partnership and work together with SURF (the collaborative ICT organisation for Dutch education and Research) on wider partnerships within the Netherlands. ELIXIR is very close to finalising the partnership with SC/DC and next year multiple SC and DC workshops will be organised for life science researchers in ELIXIR nodes. It has become apparent that there is a need for bioinformatics lesson. Together with other people from other ELIXIR nodes, we are planning to test drive and contribute to Data Carpentry Genomics Workshop through series of hackathons in the Netherlands, Portugal and UK. Plans for Steering Committee I think it’s very important that there is someone on the committee to give the European perspective. The culture differs here from North America. The scale is different too, it easier to connect / partner with other countries and European Union projects. I plan to help establishing the partnership with SURF in the Netherlands and multiple smaller partnerships around the Netherlands. I will also continue connecting various European projects with Carpentries initiative. One example is establishing SC workshops as part of H2020 European Training Programs (ETPs). eScience Center is currently contributing to two of those. I know how important it is to build sustainable local instructor community and realise how low is the instructor certification rate. I hope to contribute to improving it through mentorship program but also local study groups. Read More ›

2017 Election: Christina Koch
Christina Koch / 2016-12-22
To begin, I present: Notable events in Christina’s Software Carpentry career: 2013-April: Graduate from grad school (Masters degree in mathematics). 2013-May: Attend Software Carpentry workshop (and see git and Python for the first time). 2014-January: Teach first Software Carpentry workshop. 2014-March: First time travelling abroad for Software Carpentry. 2014-May: Hear about a job through the Software Carpentry blog. 2014-October: Get the job (which is still my job). 2014-November: Become a lesson maintainer. 2015-December: First time leading an instructor training. 2016-January: Switch roles, from lesson maintainer to mentoring committee chair. 2016-June: Teach first Data Carpentry workshop. To this list, I would like to add: 2016-December: Stand for the Software Carpentry steering committee. And hopefully: 2017-January: Begin serving on the Software Carpentry steering committee. As you can see, Software Carpentry has meant a lot to me over the past 3-4 years. I’ve received so much from the community, and would now like to give back in a new way - as a member of the steering committee. As a member of the steering committee, I would primarily aim to: work together with my fellow committee-members, the Software Carpentry leadership, and the community at large to build a shared vision and direction for Software Carpentry. create policies and structures that enable community members to realize this vision for Software Carpentry, providing both the freedom to try new things, and the necessary oversight to guide their efforts. Personally, some of my visions and goals for Software Carpentry include: Provide clear avenues for supporters and community members to connect with the Software Carpentry organization and with each other. Maintain a pool of high-quality instructors and provide opportunities for instructors to share knowledge and grow in their teaching skills. Initiate opportunities for community members to learn more about topics like accessibility, diversity, and discrimination. However, I am most interested to hear what other members of the community value about Software Carpentry and what their goals would be for the organization and its members. If you read the above timeline carefully, you’ll see that I went from my very first Python script and git repository to teaching other people in under a year. I’d be applying that same “get-started” motivation to my work on the steering committee - learning the ropes as quickly as possible so that I can direct my energy back towards the community right away. If any of this sounds interesting to you, I encourage you to vote for me, or even better, to join me and stand for the steering committee yourself! P.S. For those who are interested in knowing a little more about me, I work as a Research Computing Facilitator for the Center For High Throughput Computing at the University of Wisconsin - Madison. I love to read, have recently gotten into knitting, and this winter, have tried my hand at curling for the first time. You can say hi or follow me on Twitter at @_christinaLK. Read More ›

Instructor Traininig Intercontinental
Aleksandra Pawlik / 2016-12-22
The end of the calendar year is a usual opportunity to make look back at the past 365 days. The summary of mine would be “Instructor Training Intercontinental”. I have always considered myself to be incredibly lucky working with Software and Data Carpentry and certainly in 2016 I hit another jackpot. I run 7 Instructor Training workshops, in 6 countries on 3 continents. But the highlight was, as it has always been with SWC and DC, the people. January 2016 kicked off with Instructor Training for ELIXIR in Switzerland which I ran together with Tracy Teal, Data Carpentry Executive Director. It was a baptism of fire for me, as I taught for the first time without Greg who could step in and save me, in case I would have messed things up. So I am very grateful to Tracy who helped me to get through the Instructor Training in Lausanne without causing any damage to the attendees. In fact, almost all of them have now completed their checkout and ELIXIR’s collaboration with Software and Data Carpentry is growing further. Next, I headed off across the world to Australia and New Zealand. A veeeery long flight and straight from the winter wonderland of Switzerland I found myself in sunny Brisbane surrounded by supporting arms of Belinda Weaver who, little did both of us know, became the co-creator of the guiding programme for a group of trainees which I taught 3 months later. In Australia, at the University of Queensland I learnt that the excitement of sharing what I know about teaching with a group of enthusiastic people effectively helps with the worst jetlag ever (Belinda, thanks again for the sleeping pills!). The excitement and Jetlag carried on over to Melbourne where I taught trainees just as they were getting ready for the Research Bazaar. Alistair Walsh guided me through the best coffee sources on campus and beyond. He also helped presenting the Cognitive Load Theory module during the Instructor Training and talking about “flow state” whilst I was trying to stay awake. Jetlag finally let go when I crossed the Tasman Sea to arrive to Auckland where New Zealand eScience Infrastructure (NeSI) team organised the first in-person Instructor Training in the land of Kiwi bird (and fruit). The participants came from the institutions all over the country and I am now working directly with most of them…see below for details. Whilst I was hovering around the bottom of the map, Anelda van Der Walt was working incredibly hard putting together “A Programme for the Development of Computational and Digital Research Capacity in South Africa and Africa”. A part of her plan was to run the first in-person Instructor Training in South Africa (and in Africa) with myself as the trainer. Meanwhile, I fell in love with New Zealand so deeply that I wanted to live there and so I had to make one of the hardest professional decisions ever: I decided to leave UK and thus my job with the Software Sustainability Institute (SSI) and with the eScience Lab team at the University of Manchester, and join NeSI. I spent 3 years with the SSI and the Manchester team, and they taught me everything I know and more. It was also there where I became a Software and Data Carpentry instructor, a steering committees member and a trainer. But New Zealand is just like on the attached photo and I couldn’t resist it. Fortunately, in my new job I remain connected with the communities and projects I had worked with before. Preparing to move my life to another continent was pretty much full on but Anelda and I really wanted to make the Instructor Training in Africa happen. So in April 2016 I landed in Johannesburg (no jetlag!) and two days later I was with an incredibly inspiring and impressive group of trainees. Most of them were based in South Africa but we also had participants from Kenya and Namibia. We realized that most of them would be Software and Data Carpentry pioneers in their home institutions, or even countries, and thus they needed a lot of support from the Carpentry communities. Long story short, thanks to the leadership of two amazing women, Anelda and Belinda, with engagement from several of others, we saw one of the best outcomes of having an inclusive international community: guiding and supporting the newcomers through the initial stages of them joining the initiative and helping them make their membership sustainable. Right before I left Great Britain, in May 2016, I ran Instructor Training organised by the Software Sustainability Institute for trainees from UK universities as well as organisations such as the Met Office. We hosted in it Edinburgh, the first place I lived in when I moved to the UK in 2008. I co-run the workshop with my SSI colleague, Steve Crouch who recently has just been training up more instructors in the never-saturated-UK-research market. Another SSI-er and the UK SWC and DC Admin, Giacomo Peru was there as well to help and the SSI’s Director, my super-cool boss at the time, Neil Chue-Hong showed up too. Good times. Fast forward, it is December 2016. I am about to hit the road towards the NZ South Island (surrounded by Santas and Christmas trees glowing in the roasting sun…this still doesn’t compute). The intense past months took their toll - I certainly became less effective, erm…significantly less on the ball with my emails and not always managing to keep up with everything that is going on in Software and Data Carpentry (is there anyone who can?). I fell off the grid (which isn’t that difficult in New Zealand). But Software and Data Carpentry is quite a boomerang (see what I did just there?) so I think I am crawling to get back on the track. Wow. It’s been a year. Read More ›

Christmas Instructor Discussion
Raniere Silva / 2016-12-22
On December 20th we hosted our two last Instructor Discussion Session from 2016. We started 2016 with Post-workshop Instructor Debriefing Sessions that run twice (in different times) every two weeks and during the year we change its name to reflect the modifications made to better accomodate their inclusion on the Instructor Training pipeline as well as the increase of offers (twice every week). The Steering Committee is very grateful for all the work from the Mentoring Subcommittee as well all Instructor Discussion Session hosts without help would be impossible to provide all the hours of sharing experience among members of our community. The morning session on December 20th was hosted by me and Marian Schmidt. Markus Ankenbrand and Felipe Ferreira Bocca shared with us the experience of teaching, respectively, at the University of Würzburg, Germany and the University of Campinas, Brazil. Three future instructors attended from Europe and participated on a conversation about our R lesson. One of our future instructors asked us for advices of how to engage advanced learners who attended the workshop. This is, probably, one of the top five advice requests that we receive on the Instructor Discussion Session and, this time, we was gifted with the attendance of Greg Wilson who is the best person to help that future instructor. Shortly, because is impossible to put on paper any conversation that you have with Greg since he is always inspiring you to climb higher, Greg recommended to invite advanced instructors to help their peers on the current workshop and be a co-instructor on the next one. The “watch one, do one, teach one” phylosoph, from Paulo Freire, is what bring us, Software Carpentry, here and we are counting with your help to push it forward. The night session on December 20th was hosted by Kate Hertweck and me. Unlike the morning session, we didn’t have any instructor for debriefing but we have seven motivated future instructors from Europe and the United States. Kate marvellous leaded that session to answer questions about workshop organisation, instructor training checkout procedure, and our lessons. On both sessions we had questions about the challenges on our lessons. We have more challenges than any instructor can conver during the average duration of the workshop. Instructors should select the challenges they will use on the worksop based on the information they have about their learners. If you think that we should drop or replace any challenge, please create a pull request on GitHub or email us. We will return with the Instructor Discussion Sessions in 2017 after a short Christmas and New Year break. If you are looking for the next Instructor Discussion Session to complete your Instructor Training, keep a eye on this blog at the begin of 2017. We will be looking to see you next year in one of the Instructor Discussion Session and for now enjoy the holly season. Merry Christmas and Happy New Year. Read More ›

Rayna Harris's Year in Summary 2016
Rayna Harris / 2016-12-19
A year in review I want to thank you for giving me the prilege to serve on the Software Carpentry Steering Committee. Here is a very brief recap of the year. Click this link to read my 2017 election post. A year ago, I said I would focus my efforts on: integrating data from the mentoring and debriefing sessions with the assessment surveys to understand the degree of workshop effectiveness discussing the above information with lesson maintainers, who can decide if lessons need revision or not integrating the above with instructor-trainers and instructor-mentors to improve lesson delivery and ultimately student success streamlining the above processes so that new trainees can easily be incorporated into these leadership roles Here is my recap of what we as a community have accomplished When subcommittees and subcommunities have an overlapping representation, it facilitates the transfer of ideas, concepts, and data across groups. I like to be in multiple groups so that I can better understand and identify shared challenges and opportunities. I’m excited to see some new and revamped programs for 2017 that are the results of share idea with members of the community. I’m really happy with the mentoring program, the communication revamp, monthly meeting, discussion sessions, newsletter, featured Data Carpentry Blogs. Lesson maintenance has undergone some changes for the better this year. Click here to read the October 2016 summary from Kate Hertweck. I unexpectedly jumped whole heartedly into the effort to integrate insight from the mentoring committee into the instructor trainer curriculum. Shortly after begin elected, I joined the ranks of the 20 instructor trainers. We had a crash course in training trainers and then were launched into teaching. I must say, I can now relate even better to the new instructors our there. I participated in 5 instructor training workshops and hosted about a dozen discussion sessions. The instructor training program is supported by membership agreements and grants, and I’m grateful we have the support and infrastructure to tranform research and education. We have elaborated the means to integrate new members into the community and into leadership roles. The community is always inventing and championing new ways to do what we do better. There are many pathways and vehicles for success in this organization, but sometimes its hard to navigate. We are working on improving all forms of communication, and we appreciate your patience and enthusiastic participation as we have navgated these growing pains. A little more reflection In the instructor training manual, one section compares self-directed learning and instructor guided learning. I believe I am constantly learning through mentored-learning and self-directed learning. I am fortunate to have mentors in this community that guide me with wisdom but also give the with support to explore uncharted territory. I think its important to know what balance of the two styles works for you. You could even make an analogy that our community learns and grows in a similar manner through mentored- and self-guided learning. This year, new staff joined the community to bring expertise and facilite our growth in a mentor-guided fashion. I love that volunteers spearhead many novel activities in this community, but its great to balance this will with the wisdom of colleagues how have experience in the area. I think the Bug BBQ and the Instructor Discussion sessions are two awesome examples where many individuals in the community took leaderships/mentorship roles to enhance our curricula and instructor training program. We’ve are also seeing the great benefit that Data Carpentry staff hires have on our curricula and workshops; their knowledge has really helped us to implement some of the practices we value and preach but were not always successful at implmenting . Communication is important and worth the effort. I’m trying to improve my communication in many ways. I’m glad I’m in good company of hundreds of people who also want to improve their communication, teaching, and research. I like that we can all make progress on this together as a community of geographically diverse members. I like hearing from you, and meeting in real life (IRL) is so awesome! I met Greg, Jonah, Belinda, Jason, Kate, Bill, and Maneesha in Cold Spring Harbour. I met Christina in Annapolis Maryland. Kate and I have used our departmental seminar resources to visit each other in Tyler and Austin, Texas. I met instructor trainees in Texas, Oklahoma, Maryland, Toronto, Washington D.C., Arizona, Seattle. Hopefully I’ll meet more of you in 2017! Check the meetup etherpad and add your travel plans. Thank you! I conclusion, I want to that you, the community, for giving me the opportunity to serve on the steering committee in 2017. Y’all have made significant contributions to my growth as a scientist. I volunteered a liiiiittle more time and mental effort to SWC than I anticipated, but it was so worth it. Either way, I want to take a moment to acknowledge that I am support by a fellowship from the The University of Texas at Austin Graduate School, which gives me considerable flexibility in my schedule to balance research in the Hans Hofmann Lab with volunteering for this amazing organization. Read More ›

2017 Election: Kate Hertweck
Kate Hertweck / 2016-12-19
I am excited to stand for election for the 2017 Software Carpentry Steering Committee (SWC SC). I hope you’ll consider supporting me in this year’s election so I may continue to serve our community. Previous experience with Software Carpentry: Some folks might consider it insanity for a tenure-track assistant professor to volunteer so much energy to the Carpentries, but in reality, the time I’ve spent on the SWC SC has been essential to my development of an effective model of leadership. The following points highlight my contributions to our community: Social media: Last May, I created a Facebook page for the Carpentries and have since posted the majority of content. Mentoring and discussion sessions: My SWC service originally started with the mentoring subcommittee; I’m pleased to have continued this work over the last year primarily by hosting discussion sessions. Lesson maintainers: I spent the early part of 2016 observing and learning about our lesson development and maintenance process. A few months back, I began leading this group through a process of reorganization and am excited to formally submit subcommittee proposals accordingly. Bridge committee and community calls: As part of my SWC SC responsibilities, I served as a liaison to the Bridge committee with Data Carpentry. This group took over organizing community calls in the latter part of the year, and I’ve been pleased to attend and participate in these community interactions. SC resolutions: I am gratified to have led discussions among members of the SC that resulted in what I believe are positive changes for our community. First, I helped the SC devise a policy to assist in continuity of the SC following elections, which was passed via community vote in October. Second, a motion I proposed at the end of November (and was subsequently passed by the SC) will continue availability of open online instructor training for members of our community at institutions that do not currently have access to training through joint organizational memberships. I believe both of these projects ensure stability and continued growth for SWC. Future goals for Software Carpentry: In retrospect, it took a solid six months for me to “hit my stride” as a member of the SC, and I’m eager to capitalize on this interia by continuing for another year. My experiences working with the SC and other committees described above have reaffirmed my commitment to providing strong leadership with the following goals: Clarifying responsibilities in leadership: SWC has matured a great deal since the first SC was elected two years ago. I am keen to solidify the specific roles best filled by not only the SC, but also the Advisory Council, staff, and other community leaders. Communication and community engagement: Our organization faces many challenges in terms of communicating with an intellectually diverse and geographically dispersed community. I want to do a better job of relaying information about choices made at the SC and subcommittee level back to the community, and balancing community input with steady movement towards decision making and progress as an organization. Streamlining operations with Data Carpentry: I am very excited to have worked closely with Data Carpentry staff this year, and am looking forward to continuing to integrate our operations to our mutual benefit. More about me: My current position is Assistant Professor of Biology at the University of Texas at Tyler. I teach plant taxonomy, genomics, and bioinformatics, and also mentor undergraduates and graduate students in independent research. I’m happy to chat about knitting, science, teaching, and faculty life on Twitter or my blog, and you can learn more about my research on GitHub or my research/teaching website.  Read More ›

Software Carpentry workshop in severe conditions
Wolfgang Hayek, Aleksandra Pawlik / 2016-12-19
Usually the main struggles preceding Software and Data Carpentry workshops involve the laptop setup, signposting the room and making sure the washrooms are unlocked. But the recent workshop in New Zealand, held at National Institute of Water and Atmospheric Research (NIWA) was facing much more extreme conditions. The day before the workshop New Zealand was hit by what is now known Kaikoura earthquake. Wellington, where the NIWA offices are, was also affected. Fortunately, NIWA buildings remained safe and the hosts, Fabrice Cantos and Wolfgang Hayek, decided to go forward with the workshop. However, the earthquake turned out to be not the only one force of nature that affected the area. Wellington experienced some extreme weather with gale-force winds and very strong rain that caused flooding in the city. Despite all these issues, the workshop still had almost full attendance and received very good feedback. 23 researchers participated in the event, that included NIWA staff from Wellington and other NIWA branches, as well as 3 external participants from the University of Canterbury The instructors (Andre Geldenhuis of Victoria University Wellington; Alexander Pletzer, Craig Stanton and Wolfgang Hayek, all of NIWA) taught the core Software Carpentry curriculum as well as an added-on introduction to HPC using Fitzroy, the supercomputer operated and maintained by New Zealand eScience Infrastructure. Every participant could use a personalised account on Fitzroy to follow the HPC session. 2 remote participants at NIWA Auckland joined for the Python session. Teaching was done through a video-conference system by sharing the presenter screen as well as room audio and video, which worked well. Despite the sever weather conditions and the overall post-earthquake concerns, the atmosphere was very relaxed, and participants asked numerous questions in all sessions. The workshop will be followed up next year with workshops on programming with C, C++, and Fortran, as well as parallel programming. Read More ›

2017 Election: Rayna Harris
Rayna Harris / 2016-12-19
I am really excited to stand in election for the 2017 Software Carpentry (SWC) Steering Committee so that I can continue to serve this amazing community. ##Software Carpentry Involvement I was first exposed to Software Carpentry by April Wright, who suggested that I attend the Instructor Training Workshop at UC Davis in January 2015. In 2015 I co-taught workshops at UT Arlington and New Mexico State University, I co-organized the Austin-based Instructor/Helper Retreat, served on the Mentoring and Assessment Subcommittee. In 2016, I was elected to the SWC steering committee and I was certified as an instructor trainer. You can read my year end summary from 2016 here. Without a doubt, my favorite Software Carpentry activity is hosting instructor discussion sessions. I learn so much when you share your experiences, and it has helped me become a better instructor and scientists. More importantly, synthesizing these community-shared experiences gives me an immense amount of energy and inspiration for continuing to engage in community-driven research education. ##Vision for 2017: open science and reproducible research The words “open science” and “reproducible research” occupy most of my headspace these days. In January, I’m going to a curriculum development hackathon for Reproducible Research using Jupyter Notebooks and then to a Moore Foundation Early Career Researcher Symposium to synthesize our ideas about reproducibility. I’m excited to see the progress we make in this area and I look forwards to more discussions on these topics. I have so many ideas in this realm that I’m happy to discuss with anyone who will listen. ##Vision for 2017: championing mentorship Successful research driven education programs find the perfect balance between instructor-guided and self-guided learning. We all fall at different places on the spectrum of self-taught to classroom-taught programmers and teachers. I’m excited that our organizations are championing mentorship programs to enhance and extend our existing training program. This provides multiple pathways to building better educators and scientists. I’m fortunate to have multiple mentors and role models in this community both locally and globally, and I’m looking forward to hearing mentoring success stories. ##Vision for 2017: integration across levels I hear the phrase “integration across levels” every week during seminars, papers, and discussions. I believe the concepts are quite relevant to Software Carpentry today. My continued vision is to promote integration across the organization and community so that we are better aware of the challenges and achievements of each other experiments. We’ve noticed some leaky pipes in our across levels communication, but improving this is a priority. Thanks Thank you for considering me for re-election for the Steering Committee. Software Carpentry has contributed vastly to my growth as an educator and scientist, and I look forward to contributing back to this excellent community in 2017 and beyond! Read More ›

Teaching Support IT Job at UCL Physics and Astronomy
Ben Waugh / 2016-12-15
We’re looking for someone with IT skills, and interests in teaching and physics, to work with us at the Department of Physics and Astronomy at UCL (University College London). This is a system manager job with an emphasis on supporting our teaching, and will involve a wide range of responsibilities, including managing a Linux cluster and interfacing PCs to lab equipment as well as providing first-line support for the university Windows environment. The application deadline is Monday 2nd January 2017. Applicants must already have the right to work in the UK. Duties and Responsibilities A system manager is required to support Teaching and Learning in the Department, and to provide some support to administrative support staff. The balance is likely to be approximately 75% support for teaching and learning, and 25% other support. The Department has over 180 desktop computers in teaching labs. Most of these are installed with the standard UCL Windows Desktop environment, also available to students in other cluster rooms across the University, and permit each student to record their experimental work, analyse data, and engage in e-learning activities while in the lab. Additionally, we have around 30 individual PCs interfacing with lab equipment. We also have a Linux cluster available for student use: this currently has 19 PCs running Scientific Linux, supported by rack-mounted servers in a separate machine room, but this is likely to expand. Programming is an important skill for any scientist, as well as for many graduates who go on to work in other fields, and the computing strand of our degree programmes is continually being reviewed and updated. All Physics undergraduates learn programming in Python in their first term and, depending on their choices, may learn Mathematica, Matlab and Java in subsequent terms. Other courses, not focussed specifically on programming, are also increasingly making use of e-learning technology and some computing to carry out calculations and aid understanding of scientific concepts. These courses all rely on expert technical support to ensure that the relevant software is installed and configured correctly. Professional Services staff in the department use Windows PCs for a variety of administrative tasks using locally installed software and services provided by UCL’s central Key Requirements The successful applicant will have a proven ability to communicate and collaborate effectively with people of varying levels of technical knowledge, a demonstrable interest in and knowledge of physics and education, and excellent technical skills. A deep knowledge of either Linux or Windows is required, along with some experience of the other operating system and the willingness to learn more. Applicants should have knowledge and experience of some of the relevant technologies and tools. These include scientific software, programming, e-learning systems, networking, and deployment and configuration of computing hardware and services. Further Details Click here for full details and application form. Read More ›

Next Steps
Greg Wilson / 2016-12-14
Software Carpentry has accomplished an amazing amount over the past six and a half years, but a new opportunity has come up for me here in Toronto, and after a great deal of thought, I’ve decided to pursue it. At the end of January 2017, I will be taking a position as a Computer Science Education Lead at Shopify, where I will help with their CS education partnerships. I’m excited to have a chance to work for change locally, but also look forward to continuing to be involved in the Carpentries as a volunteer, and to many more discussions of teaching, open science, and how awful Git truly is. It has been a privilege working with you all: watching you turn a handful of lessons into a global organization empowering tens of thousands of researchers has been the best experience of my career. Thank you for everything. Read More ›

Community Service Awards 2016
SCF Steering Committee / 2016-12-13
We are very pleased to announce the recipients of the Software Carpentry Foundation's first community service awards, which are given annually to recognize work that improves the Foundation's fulfillment of its mission and benefits the broader community: After becoming an instructor, Christina Koch immediately started taking on extra responsibilities. She has taught countless workshops, been a lesson maintainer, played a major role in the mentoring committee, and become an instructor trainer.   Adrianna Pinska participated in her first workshop as a helper in November 2014, and then became certified as an instructor. Since then she has organized workshops in conjunction with events like PyConZA, going from four participants in the first to a full house most recently.   Jon Pipitone was our unofficial system administrator from 2010 to 2016. In that time, he managed our servers, took care of backups and mailing lists, and generally kept the lights on. Jon's work was not directly visible to most of our community, but was essential to keeping us afloat. Please join us in congratulating all three! Read More ›

Don't forget to submit your post to stand for the 2017 Steering Committee
Kate Hertweck / 2016-12-12
If you’ve considered running for the 2017 Software Carpentry Steering Committee, please submit your blog post announcing your intentions by December 23. More information about what to include in your post can be found in the original announcement. Read More ›

Feedback on Communications
Tracy Teal, Erin Becker / 2016-12-12
Software and Data Carpentry have at their core a collaboration-driven ethos, and communication is key to that collaboration. We’re reaffirming our commitment to open and transparent communication, because we know we can do better! We want to give community members opportunities to talk to each other, staff and Steering Committees, to get updates on efforts and activities and to generate ideas and participate in discussions. So, first, we want to hear from you! What ideas do you have about communication? What do you want to hear from us? What channels do you like to use for communication? Do you like email lists or forums that include every topic, or ones on particular questions, domains or regions? What do you like about communication now? What don’t you like? We’re going to be working on communication channels and strategies to promote and support these ideas, and continue to make the Carpentries a community that you are excited to be a part of, so please let us know what you think! Please respond as a comment to this post, or in our “conversations” repository on GitHub (we’re considering these our suggestion boxes) if you have particular topics. Thanks for your feedback! To be true to our ethos and effective in our mission, we need to be able to communicate effectively about both aspirations and ongoing efforts so that we can learn from each other, identify critical issues, recover quickly from mistakes, evaluate ideas and commitments, and make strategic decisions. As a community, we communicate in many ways and for different purposes. Community members take initiative to coordinate activities. Staff and committee members seek community input. Staff and committee members report actions and deliver products to the community. We know that we need effective ways for: Community members to propose ideas for new work or directions for ongoing work. Community members to organize work efforts around a particular issue. Community members to stay up-to-date on work going on in the community, including work done by staff members, Steering Committees and subcommittees and unofficial groups of community members. Staff to jointly decide on priorities, form productive collaborations and keep up-to-date on progress of projects. We also know that there may be other communication needs we have as an organization that we haven’t yet considered. We invite anyone who has experience in communications, in building open communities or who simply has thoughts about these issues to contribute as we work to develop an thoughtful, efficient and transparent communications strategy. We envision this blog post and our new “Conversations” repository as a first step in developing this strategy. To take part in the conversation about developing communication strategy - please respond to this post or to the GitHub issue. As we work to develop a communications strategy, Carpentry staff will actively monitor this thread and follow-up on issues. Read More ›

Instructor Training for Library Carpentry
Greg Wilson / 2016-12-08
We are pleased to announce that we are partnering with csv,conf (a community conference for data makers everywhere) to run an instructor training class specifically geared for people interested in Library Carpentry. The class will take place in Portland, Oregon, on May 4-5, 2017; for details, please see the full announcement. Read More ›

Making art with Python: Projects after Software Carpentry
Eleanor Lutz / 2016-12-08
This March I signed up for a Software Carpentry class and learned Python for the first time. I had a great time at the workshop, and I wanted to share one of the first Python projects I completed thanks to Software Carpentry. I originally signed up for the workshop for my PhD research in mosquito behavior. I needed to automate some video analysis tasks, and several friends recommended learning Python. I ended up making the video analysis work (thanks Software Carpentry!), but this blog is actually about a Python art project that I worked on right after finishing the class. One of my Python matplotlib animations, based on the public commons image Arabesques: mosaïques murales XVe. & XVIe. siècles. Coming into the class I had a little coding experience, but not much. I’d taken an introductory Java class four years ago (and barely used it since), and learned GitHub and HTML/CSS while making maps as a designer at Mapbox. Dave Williams and Jes Ford, the Python instructors for my class, did an amazing job of making the class accessible to beginners like me. I particularly appreciated how they took the time to set up their own computers to look exactly like what a beginner would see - no shell aliases or custom installs and appearance settings. In our class we worked on graphing the example “inflammation dataset” using matplotlib. I was impressed by the Seaborn graphics library shown in the class, and I wanted to see how far I could get using vanilla matplotlib for an art project. I needed a practice project to get better at Python, and as a former designer one of my favorite challenges is making art out of limited tools. For my project I wanted to make a browsable color palette website like Adobe Kuler, but with animated examples for every color palette. This was fairly straightforward once I figured out that matplotlib.patches will plot any shape given a list of points. After that I just needed to define a set of shapes and pass their location and size to each frame of the GIF. (I also turned the project into an open source Git repository, for anyone who wants to take a closer look.) Another Python animation, based on the public commons image Mosaik aus der Moschee des Galaon el Alfi auf der Citadelle zu Kairo (Friedrich Hessemer 1842) I learned a lot about Python while making this, so for me it was a really useful practice project. For example, by generating hundreds of figures and unintentionally causing a huge memory leak, I learned that matplotlib doesn’t have automatic garbage collection. It was also useful to carefully decide which parts of the project to write in Python, and which to write in HTML/CSS/Javascript. In the end my final color palette browser website uses a mix of original HTML/CSS/Javascript, automatically generated Javascript written in Python, and GIF images generated in Python. This was a fairly basic coding project, but I wanted to share it with the Software Carpentry community to show that I actually learned something useful from the class. I ended up really enjoying Python, and the structure of the class helped me quickly get a handle on the basics. Finally, I want to acknowledge everyone at Software Carpentry who helped me during the March 2016 workshop: Dave Williams, Jes Ford, Ariel Rokem, Allison Smith, Rick Riehle, Emilia Gan, Bernease Herman, Bryna Hazelton, Chris Suberlak, and Jeremey McGibbon. Thanks for all of your work helping beginners in coding! Read More ›

2017 Election: Sue McClatchy
Sue McClatchy / 2016-12-05
Hi, I’m a bioinformatician at a research lab in rural Maine, U.S.A. My path here has been winding, varied, and fraught with good luck. I had the good fortune to find Software Carpentry some years ago, and in my travels have never found anything quite like it. I’m honored to be part of this community and now want to give back by contributing my experience and expertise. Previous involvement I’ve been a certified Software Carpentry instructor since the spring of 2015. Since then, I’ve organized and taught 6 workshops, and have plans to teach 5 more in early 2017. I serve on the mentoring subcommittee and lead discussion sessions for new and experienced instructors. Presently, I’m working toward becoming an instructor-trainer, and expect to teach instructor training early in 2017 in addition to the 5 aforementioned workshops. This year I secured 4 years of partnership funding between [The Jackson Laboratory] (http://www.jax.org) and the Carpentries. The first year of partnership is funded by an internal grant, with succeeding years funded by a grant from the U.S. National Institutes of Health (NIH). How I can contribute I bring an uncommon perspective from formal teacher training and an 8-year career as a K-12 teacher in the U.S. and Latin America. Diverse abilities and cultures in the classroom are the norm for me, and I have much to share with instructors about how to best meet everyone’s needs - the first thing being a focus on learners’ needs. My training instilled the idea that the learner, not the instructor, is the most important person in the room. A learner-centered approach to instruction responds well to diversity and arms the instructor with tools to adapt instruction to new situations and new people. Many of these tools work equally well with both child and adult learners, and work across cultures and abilities. I entered biomedical research in the early 2000s and have contributed my teaching expertise to the field since then. I’m well-acquainted with the need for improved computing and data analysis skills in research and know that the Carpentry approach promotes greater research productivity and happiness. I’ll bring this understanding to instructor training and mentoring to bolster instructional expertise within the Carpentry community. I will help Software Carpentry to expand its training and instructional footprint into new regions, especially in Latin America, by teaching and by mentoring instructors there. In January 2017 I will teach workshops at the Talleres Internacionales de Bioinformática (International Bioinformatics Workshops) in Cuernavaca, Morelos, México. I intend to follow this with further training in Latin American countries, and to support those already teaching in these countries. I will contribute grant-writing expertise to grow and sustain Software Carpentry by identifying and pursuing new sources of funding from foundations, government grants and institutional partnerships. I’m presently working on a NIH grant proposal and have shared the proposal with key NIH staff. I’m especially interested in pursuing funding that will broaden Software Carpentry into different communities than those it already represents well. My goals are the following: Bolster instructional capacity and expertise by training and mentoring new instructors in all disciplines. Broaden Software Carpentry’s reach into largely untapped regions, especially in Latin America. Build Software Carpentry into a sustainable, well-funded organization that reaches a diverse audience. More about me if you’re so inclined More about me on LinkedIn, Github, and an occasional tweet from @SueMcclatchy. I also have a minimalist instructor training blog. Read More ›

UCSF is Hiring
Ariel Deardorff / 2016-11-30
The UCSF Library’s Data Science Initiative is hiring! We are looking for a biomedical researcher with an entrepreneurial spirit and a passion for programming in R/Python, bioinformatics, data curation, statistics, data visualization (or all of the above) to serve as the Scientific Lead for our Data Science Initiative. We are taking a broad approach to data science, and are looking for someone who will work to identify the data science needs of the UCSF research community, help build a Library-based hub for data science activities, develop programs and events, and teach workshops and classes. To find out more about this position please visit https://aprecruit.ucsf.edu/apply/JPF01144. Read More ›

15 - 30 November, 2016: Instructor Training, UCSF Library, Code of Conduct, Announcement List, Steering Committee minutes.
Martin Dreyer / 2016-11-30
##Highlights We are gratefull that Open Instructor training is going well with participants from all over the world. ##Jobs The UCSF Library’s Data Science Initiative is looking for a biomedical researcher, please visit the UCSF recruit site. ##Tweets Remember to abide by the Code of Conduct. Incase you missed it, the low volume announcements list is up, please sign up. Think of replacing awk with bioawk, you can contibute to the development. ##General Have a look at the Programming with GAP lesson on our website. The Steering Committee has uploaded the minutes of their third quarter to the Software Carpentry Foundation page as well as GitHub. Please contact Ranier Silva with any critics of suggestions. 19 workshops were run over the past 30 days. For more information about past workshops, please visit our website. Upcoming Workhshops: December: Instituto Tecnológico de Costa Rica, University of Campinas, Oxford University Department of Biochemistry, University of Victoria, UCSF,University of California, San Francisco, Deakin University. 2017: January 229th AAS Meeting, NERC / The University of Bristol,The University of Washington eScience Institute, LCG-Aula 1-2, Centro de Ciencias Genómicas, UNAM, NERC / The University of Bristol, ResBaz Hobart, Python Stream, ResBaz Hobart, R Stream. February AMOS / MSNZ Conference. Read More ›

Minutes of Steering Committee Meeting
Raniere Silva / 2016-11-21
The Steering Committee informs that the minutes of their third quarter are now linked on the Software Carpentry Foundation page as well on the README of the board GitHub repository. If you have critics and suggestions related to the minutes please send those by email to our Secretary, Raniere Silva, at raniere@rgaics.com. In the forth and last quarter, the Steering Committee will focus to wrap and documenting some procedures to make the transition to the next Steering Committee smoother than this year, something that started with the Amending Steering Committee Election Procedures that you voted last October. We are very excited by all this year achivements from our community, more details will be available later this year on our annual report, and looking for the amazing members from our community that will standing for this election to help shape the future direction of Software Carpentry. Read More ›

Open Instructor Training
Erin Becker, Greg Wilson / 2016-11-19
(Originally posted on the Data Carpentry blog.) After workshops and conferences, we frequently get questions from people who are interested in teaching with the Carpentries. We’re overjoyed by this interest and excited to bring more committed and enthusiastic instructors into our community. Unfortunately, until recently, we haven’t had the resources to open up our instructor training program, and have been holding training events primarily with Partnering institutions. In response to this sustained community interest, Data and Software Carpentry re-opened applications in July for anyone interested in instructor training, regardless of affiliation. This two-day intensive training covers aspects of pedagogy critical for teaching our target audience, including creating useful formative assessments, motivating learners, dealing with cognitive load, and understanding how subject-matter expertise is developed. We also teach signature Carpentry instructional practices, including live coding and the use of sticky notes to track learner progress. Within three weeks of calling for applicants we received 169 applications for 60 available seats. Applications came in from 22 countries spread across all continents except Antarctica. The Carpentry team reviewed applications on the basis of applicant’s previous involvement with the Carpentries, previous teaching experience and/or formal pedagogical training, and commitment to teaching workshops. In addition to these criteria, we looked for applicants from locations outside of our already established communities or with training in domains that are underrepresented among our current instructor pool, such as the social sciences and digital humanities. We were able to make offers to applicants from 13 countries representing the full geographical breadth of those who applied. Two training sessions have now been held, a third is taking place this week, and the fourth is scheduled for the second week of December. The feedback from the first two sessions has been very positive: we have had to adapt some of our exercises to account for the fact that the trainees are participating online rather than being physically co-located, but other than a few web conferencing glitches, things have gone surprisingly smoothly. If you were not selected for this round of instructor training, don’t lose heart: we have kept everyone’s application in queue, and hope to revisit our offerings in the new year. If you also have colleagues who are interested in teaching for the Carpentries, consider asking your institution to Partner with us! Partnering institutions receive multiple benefits, including reserved seats in instructor training events and discounted workshops. We are very grateful to everyone who applied, and hope that you will continue to be involved in the community. We welcome contributions to our lessons, which are all developed collaboratively by our community, and encourage you to help at our host a Carpentry workshop at your institution. Read More ›

Programming with GAP
Alexander Konovalov / 2016-11-18
Software Carpentry is more than just a set of workshops and lessons. It is also a way to develop lessons, one that we have used successfully to create a lesson on Programming with GAP. GAP is an open source system for discrete computational algebra. It provides a programming language with the same name; thousands of functions implementing various algebraic algorithms; and data libraries containing extensive collections of algebraic objects. GAP distribution includes its detailed documentation; even more materials on learning GAP and on using it in teaching a variety of courses are available on GAP homepage here. Throughout the history of GAP, its development has been supported by a number of grants, one of these being the EPSRC project EP/M022641 “CoDiMa (CCP in the area of Computational Discrete Mathematics”. This is a community-building project centred on GAP and another open source mathematical software system, SageMath. CoDiMa activities include annual training schools in computational discrete mathematics, which are primarily intended for PhD students and researchers from UK institutions. A typical school starts with the Software Carpentry workshop covering basic concepts and tools, such as working with the command line, version control and task automation, continued with introductions to GAP and SageMath systems, and followed by the series of lectures and exercise classes on a selection of topics in computational discrete mathematics. This naturally led to the idea of establishing a Software Carpentry lesson on programming with GAP. I started to develop it in 2015 for our first training school in Manchester. Since I have never been at any of the Software Carpentry workshops before and had not yet completed instructor training at that point (it is currently in progress), it was extremely beneficial for me to come as a helper to the first ever Software Carpentry workshop in St Andrews in June 2015, and obtain an insight into the Software Carpentry teaching methodology. I took inspiration from the core Software Carpentry lessons, in particular from those on UNIX shell, Python and R. All of them have a central story which goes through almost every episode. For the GAP lesson, I have imagined a common situation: a research student with no prior experience of working with GAP (and perhaps little or no experience with programming at all) is facing a task to find a way in the huge library of GAP functions in order to study some research problem. Along this way, they start to work with GAP command line to explore algebraic objects interactively; then use the GAP language to write some simple scripts; then create own functions. More advanced topics such as, for example, extending GAP with new methods for existing types of objects, or even new objects, or organising your code in the form of a GAP package, are not so obvious for the beginners, and I have made an attempt to create a lesson which will show the direction in which their skills should be developing, and also to cover the importance of testing their code. I started from picking up a research-like problem which may nicely expose all needed techniques and explain the mindset required to deal with it. A good candidate was the problem of calculating an average order of an element of the group, which once I’ve seen being used by Steve Linton to quickly demonstrate some GAP features to a general scientific audience. I have tried to expand this problem in my talk in Newcastle in May 2015 (see the blog post here), and thus the choice has been made. The resulting lesson leads the learner along the path from working in the GAP command line and exploring algebraic objects interactively to saving the GAP code into files, creating functions and regression tests, and further to performing comprehensive search using one of the data libraries supplied with GAP, and extending the system by adding new attributes. On this path, the learner will became familiar with basic constructions of the GAP programming language; ways to find necessary information in the GAP system; and good design practices to organise GAP code into complex programs (for a more detailed lesson overview, see my blog post here). Of course, it is not possible to cover everything in a several hours long course, but it fits really well into the week-long CoDiMa training school like this. It prepares the audience to hear about more advanced topics during the rest of the week: debugging and profiling; advanced GAP programming; GAP type system; distributed parallel calculations; examples of some algorithms and their implementations, etc. Also, staying for the whole week of the school, everyone has plenty of opportunities to ask further questions to instructors. What next? The lesson on GAP can be seen here, and it has been published via Zenodo here. So far I am only aware that it has been taught twice (by myself) at two annual CoDiMa training schools in computational discrete mathematics. I can surely teach it myself, but is it written clearly enough to be taught by others? Is it possible for the reader to follow it for self-studying? Is there any introductory material missing, or is there an interest in having more advanced lesson(s) on some other aspects of the GAP system? If you would like to contribute to its further development, issues and pull requests to its repository on GitHub are most welcome! Also, we invite collaborators interested in developing a lesson on SageMath: please look at this repository and add a comment to this issue if you’re interested in contributing. Read More ›

Software engineer position at The Jackson Laboratory
Sue McClatchy / 2016-11-15
A scientific software engineer position is available immediately at The Jackson Laboratory in Bar Harbor, ME. This position reports to the Associate Director of Jackson’s Computational Sciences Scientific Computing (CSSC) team that is primarily responsible for developing software applications for scientific research programs. The individual in this position is responsible for developing software applications and systems to support genetics and genomics research including but not limited to web-based technologies and systems. A senior level candidate will lead development of complex projects, from high-level requirements, involving teams that may include other software developers, bioinformatics analysts, statisticians and scientists. The ideal candidate for this position has a BS or higher degree in computer science or bioinformatics and/or significant related job experience in the biomedical field or bioinformatics. Experience in identifying and developing software applications in the biomedical sciences and/or bioinformatics and implementing systems for analyzing large-scale scientific data e.g. Next Generation Sequencing data (NGS) is preferred. Experience with data exploration and visualization is a plus. For more information, please see this [position posting] (https://jax.silkroad.com/epostings/index.cfm?fuseaction=app.jobinfo&jobid=220385&company_id=15987&version=2&source=ONLINE&JobOwner=968882&startflag=1) Read More ›

Systems Biology Postdoc Position with The Jackson Laboratory
Sue McClatchy / 2016-11-04
The [Churchill Lab] (http://churchill-lab.jax.org/website/) at The Jackson Laboratory is seeking a Postdoctoral Fellow in systems biology. Our group applies a systems approach to study the genetics of health and disease. We develop new methods and software to improve the power of quantitative trait locus mapping and high throughput sequence analysis. We are especially interested in the genetics of aging and metabolic disorders. [The Jackson Laboratory] (http://www.jax.org) in Bar Harbor, Maine, USA, is recognized internationally for its excellence in research, unparalleled mouse resources, outstanding training environment characterized by scientific collaboration and exceptional core services - all within a spectacular setting adjacent to Acadia National Park. The Jackson Laboratory was voted among the top 15 “Best Places to Work in Academia” in the United States in a poll conducted by The Scientist magazine. Exceptional postdoctoral candidates will have the opportunity to apply to become a JAX Postdoctoral Scholar, a selective award addressing the national need for research scientists who are accomplished in the broadly defined fields of genetics and genomics. The award includes an independent research budget, travel funds, and a salary above standard postdoctoral scale. Applicants must have a PhD (or equivalent degree) in quantitative biology or another quantitative discipline such as computer science, physics, or applied mathematics. Experience in statistical genetics and gene expression analysis is strongly recommended, and applicants must have a commitment to solving biological problems and good communication skills. Expertise in scientific programming languages including R or Python is recommended. The successful candidate will work on the genetics of aging or metabolic disorders. Please contact Dr. Gary Churchill directly at gary.churchill (at) jax.org using the subject line “SWC postdoc ad”. Read More ›

Research Scientist Position at The Jackson Laboratory
Sue McClatchy / 2016-11-04
The Carter Lab at The Jackson Laboratory is seeking an Associate Research Scientist in the genetics of Alzheimer’s disease. [Our group] (https://www.jax.org/research-and-faculty/research-labs/the-carter-lab) is developing novel computational methods to derive biological models from large-scale genomic data. The strategies we pursue involve combining statistical genetics concepts such as epistasis and pleiotropy to understand how many genetic factors combine to control disease-related processes in neurodegeneration. We are therefore seeking an individual with expertise in epistasis analysis as it pertains to studies of Alzheimer’s genetics in humans. The [Jackson Laboratory] (http://www.jax.org) in Bar Harbor, Maine, USA, is recognized internationally for its excellence in research, unparalleled mouse resources, outstanding training environment characterized by scientific collaboration, and exceptional core services - all within a spectacular setting adjacent to Acadia National Park. The Jackson Laboratory was voted among the top 15 “Best Places to Work in Academia” in the United States in a poll conducted by The Scientist magazine. Broad skills in statistical genetics, the genetics of human disease, and Alzheimer’s etiology are required. Applicants must have a commitment to solving biological problems and communicating these solutions. Applicants should have a PhD. in the computational sciences, and postdoctoral experience related to bioinformatics and computational biology, particularly as it relates to Alzheimer’s disease. Candidates should have a record of scientific achievements including journal publications and conference presentations. Please contact Dr. Greg Carter directly at gregory.carter (at) jax.org using the subject line “SWC research scientist ad”. Read More ›

Computational Genetics Postdoc Position with The Jackson Laboratory
Sue McClatchy / 2016-11-04
The [Carter Lab] (https://www.jax.org/research-and-faculty/research-labs/the-carter-lab) at The Jackson Laboratory is seeking a Postdoctoral Fellow in computational genetics and systems biology. Our group is developing novel computational methods to derive biological models from large-scale genomic data. The strategies we pursue involve combining statistical genetics concepts such as epistasis and pleiotropy to understand how many genetic and environmental factors combine to control disease-related processes in animal models and human studies. We are especially interested in dissecting the genetic complexity of autoimmune disease, neurodegeneration, and cancer. The [Jackson Laboratory] (http://www.jax.org) in Bar Harbor, Maine, USA, is recognized internationally for its excellence in research, unparalleled mouse resources, outstanding training environment characterized by scientific collaboration and exceptional core services - all within a spectacular setting adjacent to Acadia National Park. The Jackson Laboratory was voted among the top 15 “Best Places to Work in Academia” in the United States in a poll conducted by The Scientist magazine. Exceptional postdoctoral candidates will have the opportunity to apply to become a JAX Postdoctoral Scholar, a selective award addressing the national need for research scientists who are accomplished in the broadly defined fields of genetics and genomics. The award includes an independent research budget, travel funds, and a salary above standard postdoctoral scale. Applicants must have a PhD (or equivalent degree) in quantitative biology or another quantitative discipline such as computer science, physics, or applied mathematics. Experience in statistical genetics and gene expression analysis is strongly recommended, and applicants must have a commitment to solving biological problems and good communication skills. Expertise in scientific programming languages including R, C/C++, Ruby, Perl, or Java is recommended. Expertise in cancer genetics, immunology, or neurological disease is desired but not required. Please contact Dr. Greg Carter directly at gregory.carter (at) jax.org using the subject line “SWC postdoc ad”. Read More ›

RStudio Training and Consulting Directory
Greg Wilson / 2016-11-02
RStudio maintains a directory of people who provide training and consulting for R using their flagship product. If you have taught R for Data Carpentry or Software Carpentry, have an established training and/or consulting practice, can provide some positive references from previous clients, please contact Garrett Grolemund about adding your profile. Read More ›

A Reproducibility Reading List
Greg Wilson / 2016-11-01
Prof. Lorena Barba has just posted a reading list for reproducible research that includes ten key papers: Schwab, M., Karrenbach, N., Claerbout, J. (2000) Making scientific computations reproducible, Comp. Sci. Eng. 2(6):61–67, doi: 10.1109/5992.881708 Donoho, D. et al. (2009), Reproducible research in computational harmonic analysis, Comp. Sci. Eng. 11(1):8–18, doi: 10.1109/MCSE.2009.15 Reproducible Research, by the Yale Law School Roundtable on Data and Code Sharing, Comp. Sci. Eng. 12(5): 8–13 (Sept.-Oct. 2010), doi:10.1109/mcse.2010.113 Peng, R. D. (2011), Reproducible research in computational science, Science 334(6060): 1226–1227, doi: 10.1126/science.1213847 Diethelm, Kai (2012) The limits of reproducibility in numerical simulation, Comp. Sci. Eng. 14(1): 64–72, doi: 10.1109/MCSE.2011.21 Setting the default to reproducible (2013), ICERM report of the Workshop on Reproducibility in Computational and Experimental Mathematics (Providence, Dec. 10–14, 2012), Stodden et al. (eds.), https://icerm.brown.edu/tw12-5-rcem/ // report PDF Sandve, G. K. et al. (2013), Ten simple rules for reproducible computational research, PLOS Comp. Bio. (editorial), Vol. 9(10):1–4, doi: 10.1371/journal.pcbi.1003285 Leek, J. and Peng, R (2015), Opinion: Reproducible research can still be wrong: Adopting a prevention approach, PNAS 112(6):1645–1646, doi: 10.1073/pnas.1421412111 M. Liberman, “Replicability vs. reproducibility — or is it the other way around?,” Oct. 2015, http://languagelog.ldc.upenn.edu/nll/?p=21956 Goodman, S. N., Fanelli, D., & Ioannidis, J. P. (2016). What does research reproducibility mean? Science Translational Medicine 8(341), 341ps12–341ps12, doi: 10.1126/scitranslmed.aaf5027 The papers themselves are great, but what really adds value is the way they're ordered, analyzed, and connected. If you're trying to make sense of all this, or trying to help others do so, it's a great place to start. Read More ›

Tracy Teal on Research in Action
Greg Wilson / 2016-11-01
The latest podcast from Research in Action features Dr. Tracy Teal, Executive Director of Data Carpentry. In 33 minutes (plus a couple of bonus clips), Tracy talks about the mission of Data Carpentry, how it came to be, and how people can get involved as learners, instructors, and lesson developers. It’s a great introduction for newcomers, and has a lot of tidbits that long-time participants will find fun and interesting as well. Read More ›

Close Cousins
Greg Wilson / 2016-10-30
Our process for developing and maintaining lessons has grown and changed over time. Simultaneously but separately, an organization called the Programming Historian has crafted a diverse set of open, reusable lessons on computing skills for people working in the digital humanities (DH), and their process is different from ours in some interesting ways. The main elements of our approach are: A first version is created by: someone writing something on their own (or translating something they’ve written before), a group of people getting together at a hackathon to create a roadmap, or someone driving an open design process of the kind used for our new Python lesson or Zack Brym’s new visualization lesson. The lesson is put in a GitHub repository. Everyone is invited to submit enhancements and changes by filing issues and/or submitting pull requests, and to comment on other people’s submissions. This is a doubly-open process: both the submissions and the reviews are tied to the GitHub usernames of their creators, which in turn are usually tied to their actual identities. One or two people act as the lesson’s maintainers (a term we borrowed from open source software projects). Their role is primarily editorial: they either review PRs themselves or make sure that other people review them, and have final say over whether changes are merged or not. We publish our lessons twice a year by tidying them up and then archiving them at Zenodo, which gives each of them a DOI. Everyone whose work has been merged into the lesson is listed as a contributor, and the maintainers are listed as editors (because that’s a role everyone in academia understands). The strengths of this approach are that the community maintains the lessons (we’ve had about 400 distinct contributors in the past three years), while the editor-vs-contributor distinction allows us to recognize people who are doing extra work. Its weaknesses are that big changes are more difficult to make than they would be if there was a single author, and there’s no incentive for people to do reviews: someone’s name doesn’t show up in the bibliographic record for a lesson if “all” they did was craft hundreds of lines of thoughtful feedback. In contrast, the Programming Historian’s model is: A would-be author submits a proposal for a lesson, which is reviewed by two assigned reviewers as well as the general public. If the lesson receives a green light, the author writes it (using PH’s template) and submits it for peer review. The lesson is then reviewed as if it were a research publication. The review is doubly open, but only the original author (or less commonly, authors) make fixes in response. Once the lesson is done, it is published on the PH website. It is also published in the more traditional academic sense: the Programming Historian has status as an online journal, so their lessons are indexed in the usual scholarly way. The strengths of this approach are the review process and the fact that authors get credit in a way that academia finds digestible. Its main weakness is maintenance: while people may submit errata or make other comments, lessons continue to be maintained by their original creators, which can be problematic as other demands on their time grow, or as platforms and APIs change beneath the lesson’s feet. Could we hybridize these approaches to create something with the strengths of both? Could the Programming Historian start accepting updates via pull requests and adding people whose changes have been accepted to the lesson’s byline? And could we start using a more formal review process, either as lessons are being designed or when major changes are proposed? And in parallel, what should we both do about giving people credit for their work? Someone who writes thoughtful, detailed reviews of a lesson deserves to be recognized, but how should we count and weight that? Lots of groups are exploring exactly this question with regard to academic publications, software, and data; which of their answers could and should we borrow? If you’re interested in discussing this, please add your thoughts to this GitHub issue some time in the coming weeks. Read More ›

New Book: Tidy Text Mining with R
Greg Wilson / 2016-10-29
A new online book has recently been published that may be of interest to our community: Tidy Text Mining with R This book provides resources and examples for people who want to use tidy tools from the R ecosystem to approach natural language processing tasks. The intended audience for this book includes people who don’t have extensive backgrounds in computational linguistics but who need or want to analyze unstructured, text-heavy data. Using tidy data principles can make text mining easier, more effective, and consistent with tools already in wide use like dplyr, broom, and ggplot2. Topics covered in the book include how to manipulate, summarize, and visualize the characteristics of text, sentiment analysis, tf-idf, and topic modeling. The authors are still in the writing process and will be actively developing and honing the book in the near future, but it already contains many developed examples of using tidy data principles for text analysis. Julia Silge is a data scientist at Datassist where her work involves analyzing and modeling complex data sets while communicating about technical topics with diverse audiences. She has a Ph.D. in astrophysics, as well as abiding affections for Jane Austen and making beautiful charts. Julia worked in academia and ed tech before moving into data science and discovering R. David Robinson is a data scientist at Stack Overflow. He has a Ph.D. in Quantitative and Computational Biology from Princeton University, where he worked with Professor John Storey. His interests include statistics, data analysis, genomics, education, and programming in R and Python. If you are the author of a book that is related to Software Carpentry or Data Carpentry’s mission, and would like to announce it here, please get in touch. Read More ›

The Rest Is Yet To Come
Greg Wilson / 2016-10-29
I co-taught an instructor training workshop earlier this week, then taught a second one on my own a couple of days later. I made some pretty big mistakes in both: I kept interrupting my co-instructor in the first, while in the second, I told too too many stories, made jokes about hipsters and Javascript programmers when I’d told participants not to belittle people in class, and shut down discussion a couple of times when I had no authority to do so. I have another workshop this week. I’d like to do better, so I’m going to give myself two sets of three sticky notes each day. (Sticky notes are the duct tape of teaching…) Each time I stray from the schedule, I’ll take down one from the first set; each time I tell a story, I’ll take one down from the second. It’s no guarantee that I’ll do better, but not doing something proactively pretty much guarantees that I won’t. It’s never fun to find out that you still have work to do, particularly on something you’ve been working on for years. When it happens, I tell myself the same thing as Ben Orlin: (For more of Ben Orlin’s wonderful work, see Math With Bad Drawings.) Read More ›

What the Carpentries Mean To Me
Daniel Chen / 2016-10-26
October 26 marks my 3rd Github cakeday. It also marks my 3 year anniversary since my first Software Carpentry workshop as a learner. The icing on the cake (haha?) is that it’s also Open Access Week. My first computer science course was in high school. I got though the class with a healthy amount of struggling, but I never thought I’d make it in computer science because some of my fellow classmates got though the class so effortlessly. My rationale at the time was: if this is what it takes to be good at computer science, I’d never make it. I graduated with a BA in psychology/behavioral neuroscience, and minors in biology and computer science. Computer science? Didn’t I just say I would never do this again? Yes. But When I took my first computer science class as a junior in college, I realized that the class itself was relatively effortless for me. Why? I’ve seen all of this before. I’ve learned about conditional statements and loops in high school! The fact that the class used Python and not netlogo/scheme was a matter of syntax. I already knew how to think procedurally. I can make the argument that I was never really going to go into computer science to begin with, medicine and medical school was always my main goal. But, the fact I did not program from sophomore year in high school to junior year in college, could be traced back to my feelings of inadequacy in high school. We experience or see this discouragement all the time, just talk to Greg. Fast forward to October 26, 2013, where Justin Ely and Dave W-F taught my Software-Carpentry Workshop. I had already been dabbling around Linux and Python over the years, and just started using Git for my Master’s thesis, so I opted to take the ‘intermediate’ workshop. I learned bits of new things during the workshop, but my main take away was: “I can teach this too!”. I had my first TA position teaching intro epidemiology and intro biostatistics at the time, and found teaching extremely fun and rewarding. After the workshop, I emailed the admins, booked a bus to Boston, and ‘randomly’ showed up as a helper for a MIT workshop in January 2014 led by Aron Ahmadia and Randy Olsen. I’ve been teaching since then, and I absolutely love it. It didn’t occur to me until after I taught a few workshops, that I realized I was starting to master the topics I was teaching. Each workshop I taught got me more familiar with the material. As a side-effect, it became easier for me to pick up the next new concept to enhance my own knowledge. This ‘new’ knowledge can be conveyed to my own students, or for my own work. Now, 3 years since my first workshop, I look back at how much I’ve grown as a graduate student, an instructor, and person. Everything I know today can be traced back to my first workshop, the same can be said with all of my professional connections, and the great sense of belonging I have when I attend conferences. For that, I’m eternally grateful to the community. That’s what the Carpentries mean to me. Read More ›

Call for Candidates for the 2017 Steering Committee
Kate Hertweck / 2016-10-24
Software Carpentry will hold its annual election for the Steering Committee of The Software Carpentry Foundation on January 23-27, 2017. Please consider standing for this election to help shape the future direction for our commmunity. The roles and responsibilities of members of the Steering Committee are available here. If you are a qualified instructor who has taught at least twice in the past two years, or have done non-teaching work for the community, you can both stand for election and vote. Please visit the list of current members to see who is eligible to stand and vote in our election. If you believe you qualify as a member but are not listed there, please contact us as soon as possible. In order to stand for election we request that you write a blog post that introduces yourself to the community. The post: must be about 500 words can be written in any format (question and answer, paragraph etc.) must be titled “2017 Election: Your Name” must be submitted by December 23, 2016 You can submit your post as a pull request to this repository or by email. In the post, you should explain: your previous involvement with Software Carpentry what you would do as a member of the Steering Committee to contribute to the growth and success of the community Candidates will be given the opportunity to share their thoughts with our community, including ideas for continued involvement, at our two community meetings on January 19, 2017. Read More ›

Programming as Theory Building
Greg Wilson / 2016-10-23
I was recently reminded of a thought-provoking but often-overlooked essay by Peter Naur from 1985 called “Programming as Theory Building” (scan here, plain text here). He suggests that, “…programming properly should be regarded as an activity by which the programmers form or achieve a certain kind of insight, a theory, of the matters at hand. This suggestion is in contrast to what appears to be a more common notion, that programming should be regarded as a production of a program and certain other texts.” His thoughts on what programmers actually do, especially when modifying programs, seem directly relevant to most research software development. In particular, when the Jupyter Notebook and R Markdown are discussed as ways to make research more reproducible, I wonder if part of that is to encourage the programmer to make her theory of what she’s doing explicit. Read More ›

Library Carpentry is One Year Old
Greg Wilson / 2016-10-22
The indefatigable James Baker recently wrote a blog post summarizing what’s happened with Library Carpentry in the past year. It summarizes their lessons, their workshops, how Library Carpentry is managed, and much more. Announcements and initial discussion take place on Gitter, and new members are welcome to join–please check them out. Read More ›

A Comparison of Online and In-person Instructor Training Workshops
Rayna Harris / 2016-10-22
I have co-taught three instructor training workshops this year (one online and one in-person with Christina Koch and one online with Greg Wilson. Overview Too long to read? Here’s an overview in tabular form. Feature In-person Online networking excellent poor to very good Etherpad use poor excellent displayed on screen the web? slides? the instructor webcast view of other learners time commitment prep + class time + travel + paperwork prep + class time private communication in-person Slack & email technical difficulties medium high Networking In-person: I think the most amazing feature of an in-person workshop is the networking. There is something about face to face conversation during activities and over coffee, beer, and dinner that really solidifies personal relations. I became a certified instructor in January 2015 at UC Davis instructor training workshop taught by Greg Wilson, Tracy Teal, Bill Mills, and Aleksandra Pawlik; I consider all four of these people to be close colleagues now, and I’m going to be visiting a handful of other workshop participants in November when I visit California. Also, I regularly talk to SWC/DC instructors on-line, but there is something about meeting them in-person that feels like its the first time we’ve met face-to-face. Online: As for networking in the online workshops, I would say its a mixed bag ranging from poor (for anyone in a room by themselves) to very good (for groups of learners in the same room). The first time I noticed a lack of networking for the online classes was during a debriefing session earlier this week. It wasn’t until 45 minutes into the call that I realized that two of the participants were learners in one of my workshops! I’m glad I finally made the connection, but it made me realize how hard it is to recognize people when you only see a tiny image of them on your screen when you are teaching. On the other hand, I do see that the online workshops help foster cross-disciplinary networking between individuals at the same institution but from different departments, so thats awesome. See link to tweet from the MSU group. What does this mean for next time? I think next time I teach online, I’ll ask each learner to walk up to the camera to introduce themselves during one of the lunch or coffee breaks so I can better associate a name and face and make a more personal connection with each learner. Etherpad Use The great thing about the Etherpad (in my opinion) is that is allows everyone to answer every question. You can put answers in the chat (like a quick yes/no response) or in the main body (like personal experiences or faded examples) rather than just calling on one person to answer. In-person: During the last in-person session with Christina, we often asked students to answer questions out loud rather than using the Etherpad, and one of the comments at the end was that this form meant that a lot of people’s thoughts/opinions were never heard. Online: During the last online session with Greg, we used the Etherpad extensively, so I felt like participation was really high. Also, the extensive note-taking allowed Greg to visualize participation during one of the exercises he had to miss. What does this mean for next time? I think next time I teach in-person, I will use the Etherpad chat a lot more, especially for quick yes/no responses. What is displayed on the big screen? In-person: During the in-person class I taught, I struggled with what to display on the projector. I had to use it, right? I bounced back and forth between various webpages (the Etherpad, the lesson page, fun images, videos), but the whole time I felt like this was ineffective. It made me wish that I either had slides to use or could avoid using it all together. See link to tweet about mistakes as pedagogy. Online: During online workshops, the big screen projector is used for the webcast so that learners can see the instructor and the learners at other sites. I setup a whiteboard right behind my chair for drawing concept maps and other illustrations. Each student has their laptop open to the Etherpad, and they can easily open links to webpages or videos that we give them. Since the student never see anything projected except for my face and my white board, maybe this means that I can do the in-person classes without using the projector…. What does this mean for next time? Next time I teach in person, I’m going to try not using the projector at all on Day 1 and will encourage Etherpad notetaking, and then I’ll only use the projector on Day 2 only for live coding. For online workshops, I’ll highly recommend using a white board if you can write clear enough that it can be read. Time Commitment In-person: Even though I really enjoy traveling, saying yes to teaching a workshop in a different city is a huge time commitment. Instead of just saying yes to teaching from 9-4, I’m saying yes to being in a different city for 24 hours a day. I also have to devote a lot of additional time to planning the travel, traveling, and getting reimbursed for expenses. Online: When I teach online, I really only have to commit time to prep for the lessons and to teach them. I can make breakfast in my own home, eat dinner with friends, and even make it to meetings and lab meetings that happen on the same day in the same building. See link to instagram photo from one of Rayna’s teaching rooms. What does this mean for next time? All things considered equal, this factor alone makes me much more willing to say yes to co-teaching online rather than in-person workshops. Private Communication In-person: When co-teaching in-person, you can easily communicate privately with your co-instructor during the lesson and during the breaks. You never have to worry about wether you are muted or not or if the learners can hear you, and making decisions about wether or not to change the lesson plan on the fly is pretty easy Online: When co-teaching online, you have to have yet another application open on your computer for private communication. I like Slack a lot for communicating, but it was a little odd when I was screen sharing and some slack notifications came on the screen. Email also works, but then I can get distracted by other emails, so this is not ideal. Technical Difficulties In-person: I would rate potential for technical difficulties in the classroom as medium. There’s always a chance that the projector system isn’t optimal or that the internet connection is poor, but usually an expert in the room can come up with a solution or temporary fix on the fly. Online: This is a real pain and can eat into your teaching time. I’ve encountered all sorts of issues including bad sound, bad video, poor Etherpad accessibility, inability to screen share, among other things. I don’t have an answer for this. What does this mean for next time? I can’t help but wonder if we should cut 15 minutes of material from the syllabus for online workshops in anticipation of these technical difficulties Summary All in all, each format has its pros and cons. The data has shown slightly better success from the in-person workshops, but online workshops are successful! I like teaching both, so I’m gonna keep teaching both in-person and online courses. Read More ›

Ten Simple Rules for Digital Data Storage
Greg Wilson / 2016-10-20
We are pleased to announce the publication of a new paper whose author list includes several members of our community: Edmund M. Hart, Pauline Barmby, David LeBauer, François Michonneau, Sarah Mount, Patrick Mulrooney, Timothée Poisot, Kara H. Woo, Naupaka B. Zimmerman, and Jeffrey W. Hollister: “Ten Simple Rules for Digital Data Storage”. PLOS Computational Biology, Oct 20, 2016, http://dx.doi.org/10.1371/journal.pcbi.1005097. Their ten rules are: Anticipate How Your Data Will Be Used Know Your Use Case Keep Raw Data Raw Store Data in Open Formats Data Should Be Structured for Analysis Data Should Be Uniquely Identifiable Link Relevant Metadata Adopt the Proper Privacy Protocols Have a Systematic Backup Scheme The Location and Method of Data Storage Depend on How Much Data You Have We hope you find it useful, and encourage you to follow in their footsteps and write down what you know so that others can learn from your experience. As always, we are smarter together. Read More ›

Community Call on Assessment
Kari L. Jordan / 2016-10-20
Discussion of our workshop survey results Read More ›

Cambridge Instructor Training 19-20 September 2016
Steve Crouch, Laurent Gatto, Karin Lagesen, Greg Wilson / 2016-10-20
Last month, Steve and Karin taught an instructor training workshop at the University of Cambridge, sponsored by the R Consortium. The event was organized by Laurent Gatto, a Software Sustainability Institute Fellow, with help from Paul Judge and Gabriella Rustici from the University of Cambridge Bioinformatics Training facility. 25 trainees from a diverse set of backgrounds spent two days getting to know each other and learning how to teach, and several have already completed the checkout process. You can read a full write-up on the SSI website, and we hope to be able to organize a repeat in the new year. Read More ›

Software Carpentry at Oklahoma State
Jamie Hadwin / 2016-10-19
I recently instructed Git for the third time at a Self-Organised workshop on the Oklahoma State University main campus. I enjoy instructing and helping with the Software Carpentry workshops (and hopefully will get to do a Data Carpentry soon) and each workshop is always different from the last, so I was excited to participate again. Because of a scheduling conflict, we only had the lab for five hours on Day One. In five hours (with two 15-minute breaks), we went through the entire bash shell lesson without many hiccups. One thing we did create for the bash shell lessons was a series of three-question quizzes we launched live on Socrative before each break. The learners had good feedback about this and felt it was a good way to break up the lessons and provided a good review of concepts they had just learned. On Day Two, we started Python, but because of some delays, we started behind. Once we got started, lesson went well (not very many typos!) and we were able to get through a few episodes in the Python lesson, but we weren’t able to get through much of the material we had hoped to teach. We also decided not to issue the Python quizzes because we were behind. However, at this point, the learners seemed satisifed with their progress thus far! The coffee and donuts may have helped, too! Side note: many of the learners wanted more experience in Python. This was also reflected in the surveys. We are going to take this into account with future workshops in that we should plan to be more Python-heavy. Because we weren’t able to get to a lot of the material, we are offering an additional afternoon Python workshop to everyone who was at the Python session within the next two weeks. Luckily, all of our learners were local. After lunch, we started on Git - my lesson! For the third time teaching Git at a SWC workshop, it went the best it has ever gone! I was enthusiastic to get started. I had typed up my own notes and had a PowerPoint slide of images I wanted to use to demonstrate Git concepts. I even had some Git jokes up my sleeve to keep the class on its toes. I’ve found that with each time I teach Git I get more and more comfortable to the point that I can go “off script” and throw in extra jokes, tidbits of information, etc. What really helped the Git lesson go smoothly is that I had a Git “partner-in-crime” who has been helping me with the collaboration portion of the lesson since I first began teaching Git. If you break the class into pairs and are using two instructors to go through the local and remote steps, I find it helps to spend a minute or so making sure each member of each pair knows which instructor they are supposed to follow. We did this by writing “Partner A” and my name and the word “local” on the whiteboard. We also wrote “Partner B”, my partner’s name and “remote” on the board. I told the pairs that all the people sitting to my left (holding up my left hand) was a “Partner A”, and all those sitting on my right was a “Partner B.” Then I had all the “Partner A” learners raise their hands when I asked who was to follow me. Then my partner had all the “Partner B” learners raise their hands when she asked who was to follow her. Side note: I had also frequently used the word “local” in the lesson leading up to this point to describe our repositories on our own computers, so I think this helped establish the difference between “local” and “remote” well before we got to the Github/remote repository portion of the lesson. We started the Git lesson at 1:05 p.m. and ended the conflict portion of the lesson at 3:45 p.m. This gave us 15 minutes to briefly go over things like open science, licensing, citing & hosting as well as to encourage the learners to take the post-workshop survey while we answered questions and ended the day on an uplifting note. At the end, everyone was happy the workshop was over, but I think ending the workshop on Git is a good idea because the material is less “type-heavy” and more interactive. And again, the jokes! Find some Git jokes here If you’re interested in what happened leading up to the workshop, you can continue reading below. I want to mention that SWC and DC probably operate a little differently at OSU than most other institutions. Over the past year, we’ve built a solid “Carpentry team” consisting of six certified instructors, a plethora of helpers and a list of students, faculty & staff who want to become certified in the near future. We all meet monthly, and because of this, my experiences with planning, coordinating and instructing this workshop will probably be a little different than most. Planning: When the team met up at the beginning of the academic year in August, we knew we wanted to host a workshop fairly quickly. After looking over several dates, we decided on October 13-14 because classes were cancelled on the 14th for OSU’s fall break. We thought this might give students, faculty & staff an opportunity to attend the workshop all day on Friday. However, when we went to go book the computer lab we normally use for the workshops, it wasn’t available until noon on the 13th. After a little bit of discussing, we decided to go ahead and do a five-hour session for Day One, and then do the normal 9am-4pm schedule for Day Two. Coordinating: Because I’ve coordinated one of our local Self-Organised workshops, I helped a team member who was going to be more active with the coordination duties get this workshop set up. Once he filled out the SWC workshop request form, I helped him get an EventBrite registration page set up and recorded registrant information in a shared spreadsheet, while he worked on building the workshop webpage and Etherpad and communicated with Maneesha Sane at SWC. Pre-workshop: About a week before the workshop, all the instructors and several of the helpers met up to go over the gameplan for the workshop. The instructors discussed how we were going to prepare our lessons and what material to make sure to include and what we could leave out if we were running short on time. In previous workshops, we had one to two learners who fell behind the class and sometimes slowed the pace down to ask questions, so we picked a few more experienced helpers who said they were willing to help with any learners who might be falling behind. We suggested that helpers focus in on a section that way they wouldn’t be running into each other going from one side of the room to the other. Finally, we set some “expectations” for instructors and helpers, i.e., it is ok for the helpers to let the instructor know if they need to slow down/speed up based on their observations of the pace of the class. At this time, we also looked at our list of registrants from our EventBrite site to determine what fields of study or departments they were coming from and to gauge their experience levels and what type of data they might be working with. I would definitely suggest putting a few demographic questions in with your EventBrite registration. If you made it this far, I hope this was all helpful advice if you’re getting ready to coordinate your own workshop! Read More ›

Machine Learning with Python
Greg Wilson / 2016-10-17
A new book has recently been published that may be of interest to our community: Introduction to Machine Learning with Python Data driven approaches have taken over many empirical sciences and many business application. Machine learning algorithms are one of the most important tools for extracting knowledge and making decision based on complex datasets. This book takes a practical approach to machine learning, using Python and the scikit-learn library. Starting from the basics, it explains how and when to use machine learning, discusses common methods and points out pitfalls for beginners. Every method and example comes with code in the form of Jupyter notebooks. The book requires a basic understanding of the Python programming language and some familiarity with NumPy. Experience with matplotlib is helpful to gain a better understanding of the visualizations. Andreas Müller received his PhD in machine learning from the University of Bonn. After working on computer vision applications at Amazon for a year, he joined the Center for Data Science at New York University. He is a maintainer of and core contributor to scikit-learn, and has authored and contributed to several other widely used machine learning packages. Sarah Guido is a data scientist who has spent a lot of time working in start-ups. She loves Python, machine learning, large quantities of data, and the tech world. An accomplished conference speaker, Sarah attended the University of Michigan for grad school and currently resides in New York City. If you are the author of a book that is related to Software Carpentry or Data Carpentry’s mission, and would like to announce it here, please get in touch. Read More ›

October 2016 Maintainers' Meeting
Kate Hertweck / 2016-10-13
This week’s meeting of our lesson maintainers made some great progress in streamlining our decision-making process, and we hope to begin implementing some of the changes discussed in the next few weeks. The major highlights are: Creation of a developers’ subcommittee: We’re creating a new subcommittee that will be in charge of decision-making for template and style changes common to all lessons. Each maintainer will still be responsible for PRs/issues specific to their lesson. I will contact folks who indicated interest in participating, and am pleased we have folks from both SWC and DC; if anyone not currently acting as a maintainer would like to take part, please contact Kate Hertweck. Instructor notes: Quite a few of us are supportive of the proposed template for standardizing instructor notes; these headings will be added to the example lesson this week, and we’ll begin implementing across our core lessons over the next few weeks. I’ll engage the mentoring subcommittee to see if there are folks interested in these conversions, since they’ve spent a lot of time talking to new instructors and can offer some great insight. Minor changes to styles/lessons/workshop-template: There were no objections to a few outstanding changes, so these will be merged. Greg and François will wrap up the unresolved issue of inconsistency in paths to data files. Support for additional human languages: We still don’t have a workable solution for supporting lessons in languages other than English. At the very least, it would be nice to have a statement somewhere indicating our feelings on the matter, as we receive queries about this every few months. This will be one of the first items tackled by the new developers subcommittee. Defining core lessons: Quite a few folks were enthusiastic about moving from the inflammation R and Python lessons to lessons based on the gapminder data. The consensus was that quite a bit more work would be required before this could be an “official” decision. This is another issue that will be discussed by the developers, but will obviously require more communication with the lesson maintainers. As always, we’re grateful to the lesson maintainers for everything they do, and we hope these changes result in less email and more productivity. Please let me know if you have any questions or concerns. Read More ›

In Memoriam: Hans Petter Langtangen
Greg Wilson / 2016-10-11
Hans Petter Langtangen’s books Python Scripting for Computational Science and A Primer on Scientific Programming with Python taught many of us how to do numerical computing with Python. He passed away yesterday after a long struggle with cancer; while I only had the privilege of meeting him in person twice, we corresponded frequently during Software Carpentry’s early years, and he was always helpful, insightful, and enthusiastic. He will be missed. Read More ›

Vote Next Week to Amend Steering Committee Election Procedures
Kate Hertweck / 2016-10-10
Last month, the Steering Committee announced a special election to amend the procedure for Steering Committee elections. The election itself will take place next week (Oct 17-21, 2016). All members will receive a ballot via email to cast their vote regarding this revision via electionbuddy. Please take a moment to check the current membership list and contact us by email to let us know of any omissions. As per our bylaws, our membership includes any certified instructor who has taught at least twice in the past two years (i.e., since October 2014) or has made other significant contributions to Software Carpentry in the opinion of the Steering Committee. Read More ›

Beth Duckles on the Practice of Measuring
Greg Wilson / 2016-10-10
Dr. Beth Duckles, who did a valuable study of our instructor community earlier this year, gave a talk at the recent Measuring the Impact of Workshops workshop titled “The Practice of Measuring”. It’s a very useful 50 minutes, particularly for those of us who have backgrounds in the physical rather than social sciences. Read More ›

Request for Review: ESIP's Software Guidelines
Greg Wilson / 2016-10-05
ESIP (the Federation of Earth Science Information Partners) has been developing research code/software guidelines for the earth observation and geosciences communities, and would appreciate feedback on the current draft before the end of October. If you have suggestions or feedback for: Interoperability Jupyter notebooks as code or documentation The proposed progression model Sustainability or adoption/reuse please chime in. (Look in the right margin of the browser page for the hypothes.is controls that will let you add and view comments.) As a taste of what they’re doing, here’s a table of their stakeholders’ user cases and desired outcomes: Stakeholder Use Case Desired Outcome Funder As a funding agency, we're interested in evaluating the software projects we fund. A functional evaluation system based on accepted metrics. Project Manager, Principal Investigator (manager in practice) As a manager, I'm interested in using the rubric/progression as a learning tool to help improve the development practices in my research group. A checklist or other informal assessment to help the research group meet funder's expectations and to determine the next steps for training or related activities in the research group. Principal Investigator As a PI, I would like a tool to assess our progress and to ensure we're meeting our funder's expectations for a software project based on the readiness level stated in the original proposal and as defined by the funder. A checklist or other informal assessment to help the research group meet funder's expectations, and to determine the next steps for training or related activities in the research group. This informal assessment would also provide aid for formal reviews. Science Software Developer, Researcher who codes As a science software developer, I'm interested in using the recommended practices to improve my own workflow and skillsets. A checklist or mentoring activity to help guide me towards training options to meet my research and skillset goals. Developer </td> As a developer, I would like community-supported guidelines to support requests to change our current dev team practices. A checklist or informal assessment to encourage my manager or PI to allow the development team to adopt appropriate practices. </tr> Grad Student, Post-Doc, Researcher interested in continuing code education </td> I've taken the introductory courses and want to continue to improve my skills but don't know what steps to take next, and I'd like guidance based on my skillset. A checklist or mentoring activity to help guide me towards training options to meet my research and skillset goals. Research Community We want to provide educational materials or other support for community members to meet their goals regarding research software implementation and career growth. A set of guidelines for technology assessment, and the framework for using those guidelines as educational tools. Read More ›

Python as a Second Language
Greg Wilson / 2016-10-04
Donny Winston, Joey Montoya, and I taught a one-day class for Lawrence Berkeley National Laboratory on Python as a Second Language last week. As its introductory blurb says, “This lesson is an introduction to programming in Python for people who are already comfortable in some other language such as Perl or MATLAB.” The notes are still very much under development, but having delivered it twice, we’re pretty confident that it can actually be delivered in one day. We would be very grateful for feedback: please file issues in the GitHub repository to let us know what you think, to add more exercises and bullet points, or anything else. As well as delivering new(ish) material, we experimented with having one of the instructors teach via video conferencing with local helpers in the morning, while on-site instructors taught in the afternoon. Some of the feedback included: Positive On-line with local help worked very well. Mixed mode worked well for the first section because the material was easier, might have been more difficult for second half. Easy to follow, well written exercises. I thought remote instructor was great…having local instructors was a big part of that though. Plotting super helpful. Etherpad being read-only may have helped, so people didn’t mess it up. Cool, dense content, helpers are very knowledgeable. Negative Sometimes fast. Typing is hard to follow as it scrolls off screen. Pytest can’t install. For online instructor, dual screens may be useful, I’d like to see the Notebook longer. A bit fast at times, particularly due to the auto scrolling of the screen. Confusing if you fell behind for a second and the teachers would overwrite, rather than start new cell. I would have liked to see how python is more typically used, such as IDEs, command line, etc. Tell people that terminal functionality is needed in advance. A bit rushed through matplotlib, would have liked more practice plotting. Based on this feedback and what we heard in the previous round, we have moved the material on command-line scripts into the “Extras” section: there wasn’t time to get to it, and it requires yet another install for Windows users. Read More ›

Congratulations to Our New Instructor Trainers
Greg Wilson / 2016-10-04
We are very pleased to welcome Karin Lagesen and Anelda van der Walt to our team of instructor trainers. Karin and Anelda have been very active in all of our activities–Karin is now in her second year on the Steering Committee, and Anelda has been the driving force behind our growth in South Africa–and we are very grateful that they are willing to give even more to our community. Read More ›

And Now There Are Three
Greg Wilson / 2016-10-04
A new book has just been published that covers much of the same material as Software Carpentry, and a great deal more: Paarsch and Golyaev’s A Gentle Introduction to Effective Computing in Quantitative Research: What Every Research Assistant Should Know. It covers almost everything I would want to see in a one-semester course for new research students: the Unix shell, data organization, the basics of Python, data analysis, “geek stuff” (including hardware and algorithm analysis), numerical analysis, some worked examples, Python extensions, and preparing manuscripts with LaTeX. By trying to cover so much, this book necessarily spreads itself thin: I don’t think anyone who isn’t already familiar with Make or Git would be able to use them after these brief introductions. That said, this book deserves a place alongside Haddock and Dunn’s Practical Computing for Biologists and Scopatz and Huff’s Effective Computation in Physics, and I think anyone contemplating a graduate-level computing course would do well to explore it. Read More ›

Two Studies of Online Communities
Greg Wilson / 2016-09-30
Two recent papers may be of interest to this community. The first is from Adam Crymble at The Programming Historian a distributed group of digital humanities scholars that has built some excellent tutorials on software tools. Its title is Identifying and Removing Gender Barriers in Open Learning Communities, and its abstract reads: Open online learning communities are susceptible to gender barriers if not carefully constructed. Gender barriers were identified in The Programming Historian, through an open online discussion, which informed an anonymous user survey. The initial discussion pointed towards two barriers in particular: a technically challenging submission system and open peer review, as factors that needed consideration. Findings are put in context of the literature on gender and online communication, abuse, and online learning communities. The evidence suggests that open online learning communities such as The Programming Historian should work actively to promote a civil environment, and should listen to their communities about technical and social barriers to participation. Whenever possible, barriers should be removed entirely, but when that is not feasible due to financial or technical constraints, alternatives should be offered. Its findings are that the tools they use—that we use—may be a significant barrier to contribution: “Initial comments in the open conversation made it clear that the choice of venue (Github) was a gender-barrier, as Github is associated with male geek coding culture.” On the other hand, “…both men and women were overwhelmingly positive about open peer review (29 like, 6 neutral, 3 dislike, 9 skipped - no gender difference), with the caveat that moderating by an editor who stepped in to prevent ‘nastiness’ was crucial to a successful system of open peer review.” The second paper, by Ford, Smith, Guo, and Parnin, is “Paradise Unplugged: Identifying Barriers for Female Participation on Stack Overflow”: It is no secret that females engage less in programming fields than males. However, in online communities, such as Stack Overflow, this gender gap is even more extreme: only 5.8% of contributors are female. In this paper, we use a mixed-methods approach to identify contribution barriers females face in online communities. Through 22 semi-structured interviews with a spectrum of female users ranging from non-contributors to a top 100 ranked user of all time, we identified 14 barriers preventing them from contributing to Stack Overflow. We then conducted a survey with 1470 female and male developers to confirm which barriers are gender related or general problems for everyone. Females ranked five barriers significantly higher than males. A few of these include doubts in the level of expertise needed to contribute, feeling overwhelmed when competing with a large number of users, and limited awareness of site features. Still, there were other barriers that equally impacted all Stack Overflow users or affected particular groups, such as industry programmers. Finally, we describe several implications that may encourage increased participation in the Stack Overflow community across genders and other demographics. It found five barriers to contribution that are seen as significantly more problematic by women than by men: lack of awareness of site features feeling unqualified to answer questions intimidating community size discomfort interacting with or relying on strangers perception that they shouldn’t be “slacking” Surprisingly, “fear of negative feedback” didn’t quite make this list, but would have been the next one added if the authors weren’t quite so strict about their statistical cutoffs. The authors are careful to say, “…we are not suggesting that only females are affected by these barriers, or that these barriers are primarily due to gender, but rather that five barriers were seen as significantly more problematic by females than by males.” Read More ›

Perth Software Carpentry - A Tale of Three Trainers
Andrew Rohl, Matthias Liffers, Andrea Bedini / 2016-09-30
Andrew Perth’s journey into Software Carpentry began when Andrew Rohl attended eResearchNZ in 2014, for which he has to thank Nick Jones of NeSI for financial support. There he met the director of the Mozilla Science Lab, Kaitlin Thaney, and learned about the Software Carpentry movement. Fast forward to the end of the year and David Flanders was arranging a “Train the Trainer” Software Carpentry course in Melbourne and he even had funds to cover travel costs for those selected to attend! Andrew convinced Raffaella Demichelis, a fellow computational chemist, to also apply and they were both fortunate enough to be chosen. They enjoyed the training in February 2015, followed by the first Research Bazaar conference and returned to Perth enthused. Then the reality that they were now expected to teach Western Australia’s first Software Carpentry course hit! With only two instructors in WA and unaware of any other people who had even attended a Software Carpentry workshop, they searched for helpers and found Rachel Lappan and Chris Bording. Rachel had just attended a Software Carpentry workshop in Queensland as part of the UQ Winter School and Chris had started Software Carpentry training. Andrew and Raffaella convinced two other computational chemists to help: Bernhard Reischl and Marco De La Pierre. With the team now sorted, the first Software Carpentry workshop in WA was held on July 20 and 21, 2015 and was a big success. With 24 attendees the workshop proved itself popular but the team knew they could aim higher. They decided to sign up for ResBaz 2016 and to accommodate 80 registrants, running both R and Python streams concurrently. They desperately needed more instructors and helpers! Out of the blue, Lukas Weber, an Australian from WA who is studying in Switzerland, got in touch and agreed to come over as an instructor. Chris Bording had also finished his training and so was able to teach. But the team still wasn’t big enough! Matthias Matthias Liffers had been working in research data management for a couple of years and thought that something was missing in the training offered to researchers. It was all well and good to provide fancy eResearch facilities, but the learning curve to move from spreadsheets to processing huge datasets on supercomputers was just too steep. Matthias learned about the Software Carpentry movement by word-of-mouth and thought it was an excellent way for researchers to start learning these computing skills. He attended eResearch Australasia in late 2015 and discovered from Belinda Weaver and David Flanders that not only had Software Carpentry already been run in Western Australia, but that this Andrew Rohl chap was already planning a ResBaz in Perth! Amusingly, Matthias had already worked with Andrew but the topic of researcher training had never come up. Matthias then got in touch with him, offered his assistance in running ResBaz, and quickly found himself whisked to Melbourne to attend the Software Carpentry instructor training with Aleksandra Pawlik. As an experienced librarian, Matthias already had a good decade of training experience, but the discussions on pedagogy really changed the way he thought about the training he had already delivered. Another concept Matthias learned about at eResearch Australasia was HackyHour - an informal get-together that served the dual purpose of networking and providing post-training support to SWC/ResBaz attendees. It was in getting HackyHour off the ground that Matthias met Andrew’s new team of research computation specialists. Andrea Andrea Bedini’s journey started when he was a postdoc at the School of Mathematics and Statistics at the University of Melbourne, where he would spend his days giving lectures and coding Monte-Carlo simulations. Andrea strongly supports the idea that science should be open and reproducible and found himself spending perhaps more time than he should thinking on how to put those concepts into practice. Around mid 2014, one of his students, Noon Silk, decided to organise an Open Science Workshop where students/researchers could learn everything about GitHub, iPython notebooks and SageMathCloud. Andrea didn’t hesitate to give a hand! The workshop was just awesome. While organising the workshop, Andrea and Noon got introduced to a group of people at Melbourne Uni who were working on a similar initiative, Software Carpentry. In this way Andrea met David Flanders, Damien Irving, and Fiona Tweedie. They had been running HackyHours at a local bar for a while and they were all very busy organising the first Research Bazaar conference planned for February 2015. Just before the conference, the group had organised for Bill Mills (then Community Manager for Mozilla Science Lab) to come over from Toronto to run a Software Carpentry instructor training course. Andrea attended both the instructor course and the conference, also helping Alberto Pepe and Nathan Jenkins with the classes on Authorea (a collaborative paper-writing tool, which is also awesome - check it out). Raffaella and Andrew also attended the training and ResBaz 2015 in Melbourne - that was a missed connection! Andrea was feeling he wasn’t enjoying his postdoc position any more and his wife suggested he look for a career change in her hometown, Perth. Little did he know that, at the same time, at Curtin University in Perth, Andrew Rohl was hiring a team of computational specialists with the hope they would support his efforts in increasing the presence of Software Carpentry in WA. The rest, as they say, is history… Read More ›

Software Carpentry Workshop Attendance: a New Zealand Perspective
Tom Kelly, Mik Black, Sung Bae, Wolfgang Hayek, Aleksandra Pawlik / 2016-09-28
Having taught and helped at a series of workshop over the past few months Tom Kelly, PhD Candidate in Genetics at the University of Otago, wrote up some of his reflections on the issues related to workshop attendance. This spurred further discussion via email among the New Zealand instructors. We decided to put these thoughts together in hope that this could help other sites struggling with the attendance problems. Please note that these are the authors’ views and thus they should not be treated as representative for their home institutions. Tom based his opinions after having taught at various workshops in Australia and in New Zealand, including Research Bazaar 2015 at the University of Melbourne, Research Bazzar 2016 at the University of Otago, University of Otago, University of Canterbury, and NeSI. There have been several other workshops in New Zealand facilitated by NeSI, in Auckland, Wellington, and Palmerston North over the past year. Main point Over the course of several workshops we’ve had relatively minor problems with “no-shows” (people signing up and not attending) or “drop-offs” (people not returning for future days or sessions). However, in the case of the oversubscribed workshops it was still somewhat frustrating. This has led to discussions about how we may address the issues related to the attendance to ensure that others who would have attended for the entire workshop, but ended up on the waiting list, do not miss out on places. Issue 1: No-shows At our most recent workshop in the University of Otago we had 21/25 attendees who signed up attending. At previous workshops this had also been a bit of an issue, being as high as 25% of no-shows in Christchurch in February 2015. I know that this issue is not specific to these sites or to New Zealand itself. Shortly after I got involved in Software Carpentry, I had a chance to talk to Bill Mills who was visiting from Canada to help boot-up the workshops and train some instructors. Bill did say that they usually have 25-30% no show in US/Canada so our attendance figures are not too bad compared to other free Software Carpentry events. Issue 2: Departures over time A larger concern to me is the number of participants who attend for the beginning of a multi-day workshop and do not return for the final sessions. Some participants may be leaving midway because it just doesn’t work for them (thankfully, this is rare). Some will be interested only in a particular session, such as biologists who may attend only for the R module, even if we are encouraging them to attend the full course. With others it may be difficult to address, particularly if they don’t leave any feedback on reasons why they left. Though, we can assume that some participants may have to leave early due to other committments such as running lab experiments or childcare responsibilities. So at our recent workshop at the University of Otago we tried splitting it into 3 shorter days, rather than 2 full ones. Approach 1: Registration Fee We discussed further no-shows with Bill Mills when we were doing ResBaz/SWC in Melbourne and Christchurch. He mentioned a solution suggested by Software Carpentry of applying a small registration fee to make sure those who register actually attend or cancel giving the organisers some notice. Based on the experiences of several hosts accross this usually results in no-show numbers dropping to below 5%. Whilst this is certainly an option to consider, in many local contexts this would not be possible. There are complications with the university local regulations. Some universities charge venue fees unless the event is free, run at cost, or for the benefit of staff and students. Another problem with charged events (even with a small fee) is that it may create disparity between research groups where some are funded from the lab and others need to foot the bill themselves due to financial or adminstrative constraints. Eventbrite makes it easier for the hosts in terms of handling payments and registrations but within the University system it would create issues for labs that want to pay (through Eventbrite) for their members to attend - not insurmountable, but just extra hassle. There are also some cultural aspects. For example, New Zealand may differ to other places where ticketed events have been tried. We don’t have a tipping culture, one of the largest home-supplies supermarket chains has the slogan “Where Everyone Gets a Bargain”, and another grocery supermarket chain proudly announces that it has “NZ’s lowest prices”. These stores are widely successful. It can be said that many of us see this as a good deal rather than appearing cheap, particularly among the university student population. Many people here view a bargain or freebie positively, so I don’t think the event is under-valued being free. However, it would be interesting to see if any other NZ sites have tried a paid ticketed event to boost attendance rates and how this compares to other countries. Approach 2: Catering Another suggestion to raise numbers is providing catering to boost numbers (possibly registration fees can be used for that) which we tried at Dunedin Research Bazaar last February. However, we had issues with overcatering for those did not stay for lunch and we still had dwindling numbers by the last day. I think the “come back for day 3” rate was higher in our most recent Otago workshop due to combined Git+Bash sessions on days 1 and 3. Unfortunately some participants did still give the impression of only wanting to attend the R session (or Python) but most seemed to give the rest a shot. And even the catering was not enough of an attraction. Dwindling numbers seems to be a bigger problem with longer (3 day) events but there are higher costs for catering an event this long. Reducing the length of each day was another approach we’ve tried as discussed here. Approach 3: Blacklist Another approach could be the hosts checking the actual attendance and keeping a record of people who habitually don’t show up without giving notice. They are then only able to sign up if there are places left right before the event. At the recent University of Otago workshops no one missed out due to no shows. Generally, we manage to let most of our waiting list in with cancellations anyway. It would be interesting to know if particular people (groups, or institutions) are signing up and not coming recurrently, but a blacklist (as some SWC sites have done) may be overreacting. This appears to be a rather drastic solution and thus needs to be treated with care. There are many other understandable reasons why a participant may not be able to attend at the last minute which would be difficult to monitor, such as illness or bereavement. There may be students who choose (or feel pressured) to prioritise their experimental research over the workshop on the day. It’s likely preaching to the choir to even mention how counterproductive this lack of training can be in the long-run. However, a punitive approach such as a blacklist is not an appropriate way to encourage engagement in our workshops over research activities. We consider a blacklist a last resort over the current first-come-first-served sign up system to consider only if people are repetively missing out. Perhaps a whitelist of people who missed out last time would be less punitive? We could either bump them up to the top of the waiting list or email them about the workshop in advance of public announcement. This would give potential participants more incentive to sign up even if the current workshop is full and may give a better indication of how much interest there would be in a future workshop. Approach 4: Overbooking This approach is notoriously used by some airlines. Many of you might have experienced a frustrating time at the gate when it has turned out that you actually don’t have a seat even though you do have the ticket. Then it gets to an exciting action when the airline tries to bribe the passengers with the allocated seat to give it up (for cash) and take one of the subsequent flight (possibly next day). Neither Software nor Data Carpentry are aiming to go that way but it may be tempting for hosts to allow for a high number of sign ups (say 45) with an assumption that there will be 20-30% no-show rate, particularly if a larger venue and additional helpers are available. In larger (parallel session) events, such as ResBaz, feedback has been overwhelmingly positive for the inclusion of ‘helpers’. They can somewhat mitigate issues with a larger group offering one-on-one assitance when needed and getting participants back on track so they can follow along after falling behind of technical problems. We encourage helpers to be proactive at larger events, or those covering more advanced content, checking on participants when they get withdrawn or quiet rather than waiting for the sticky notes. The larger a group, the wider the range of pace and learning styles will be there. If participants have raced ahead of the content this is also a good opportunity to encourage them to work with their neighbours, try out extension challenge questions, or discuss how the tools involved could be applied to their work. However, one problem that Wolfgang Hayek, NZ Instructor based at NIWA and NeSI, has seen, is that venues get very crowded if turnout is large, with attendees complaining accordingly in their feedback. Sticking to the recommended number of attendees is definitely a good idea. For example, in Wolfgang’s experience, the Wellington Victoria University ResBaz was a very relaxed event, at least partly due to its lower attendance. The Git session that Wolfgang has taught there was a lot more interactive than sessions that he had taught at other events, which made it quite enjoyable for everyone (many questions were asked and issues discussed, attendees participated more in the hands-on sections). While it is clear that we want to maximise efficiency of these events, there is also a positive side of having lower attendance, too. Approach 5: Establishing rapport with the participants Another alternative to the carrot and the stick is trying to establish close communication with the participants. Mik Black of the University of Otago said that him being a very hands-on person also helped with attendance numbers: particularly when co-ordinating the larger ResBaz event with parallel sessions. He sent repeated emails to registrants with reminders to tell the hosts if they couldn’t come as there was a waiting list. That was somewhat effective but there were still had no-shows (plus drop-offs after the first day). It also worked at the time because Mik needed to email about other ResBaz details at the same time (venue, schedule, laptop setup, etc) - he wasn’t just spamming them with “are you still coming?” every two days. Sung Bae of the University of Canterbury (and previously NeSI) who has hosted and taught at a number of workshops across New Zealand developed a habit of going around the participants with the guest list and making them a name tag on the spot (and checking the attendance at the same time). Sung found it was helpful to build up personal connections with them (that helped him to remember their names too) and he also produced attendance lists from events he led. It possibly could help mitigate the number of drop-offs on the subsequent days of the workshop. We recognise that there is no silver bullet to help us sort out the attendance issues. However, there may be various ways these problems can be mitigated. The experiences from Software and Data Carpentry workshops can also possibly translate to other training that many members of this community run. Read More ›

SWC: First Impressions
Leo Browning / 2016-09-28
This post is a simple telling of the beginning of my experience with SWC and hopefully first of many encounters with SWC as a community as well as a learning experience. I attended a session a couple of months previously and was very impressed by my experience. I had a chat to the other attendees and found that their experience as complete beginners was as positive as someone who had some experience in python, git and shell as I did. I have always held a firm belief that digital literacy in research and education is both vital and sadly lacking, my own experience is entirely self taught throughout a university education, so it is no surprise that myself and my fellow attendees were drawn to the SWC workshop to fill that gap. SWC addressed the digital gap in research, and had me so hooked that I just had to stay involved. Over the next couple of months I looked for any opportunity; I incorporated just the python novice material into an independent workshop that I ran on Python for data pipelines, and now I have just instructed my first SWC session. The more involved I get the more I feel like SWC is something that I want to be involved in, because it addresses an important need in research, and because it does so in a way that is accessible and tailored by its community. I think my experience with SWC is not unique, and that hundreds of people around the globe have been in the same boat as I was. In fact I am sure it is not unique, as every contact I have seems to be with people as interested as I am. Some of my favorite examples of “of the people, by the people, for the people” places on the internet are Stackexchanges, reddit maker communities and wikipedia. And without fail, when I have gotten involved with these communities they are all underpinned by a strong sense of community purpose tailored to a specific need. Although SWC seems to me to be a mix of online and in person community, I look forward to it being added to my list of the best of the best as I continue to be involved. I would love to hear particularly interesting or inspiring first experiences of other new or long time members, I am sure that there are many! Read More ›

Teaching Programming to the Blind
Greg Wilson / 2016-09-23
Andreas Stefik (who discusses what we know about the usability of programming languages in this entertaining podcast) has worked extensively on computing education and programming tools for the visually impaired. When asked earlier this week how to teach programming to the blind, he sent the response below. We’re grateful for his comments, and for Evan Williamson’s recent pull request to improve the accessibility of our lessons. If you are making any presentations, be sure to provide the powerpoints to the blind individual in advance if you can. Powerpoint is the “most” accessible, but if you have any images, you need to manually specify “alts” inside the presentation. It’s not hard, but most people don’t realize powerpoint has this feature. When actually presenting material, for any kind of diagrams, I find it helpful (if my audience is blind) to practice oral description of the images ahead of time. This is sometimes tricky in code, especially for things like linked structures or trees. So, if you are explaining those kinds of concepts, just be aware that it might take some practice. I’ve practiced this for years in my own presentations, but still find it challenging sometimes for highly visual content (e.g., we taught 3D gaming to blind people this summer, which was a real challenge). Same goes with code. If the person doesn’t read code coming in, screen readers don’t even output all of the special characters without special modes turned on (e.g., verbosity mode in JAWS). For example, if I have: a = a - b it might say “a equals a b” (notice the missing minus). Point being, depending on the experience level of the person coming in, and how comfortable they are with their screen reader, they might need some help getting used to the quirks. When presenting, you sometimes have to actually say the special characters or they won’t know they need to be typed. If you are using tools for programming, a great many out there don’t work for the blind. The best you can do here is make sure you get them to the person in advance if you know they work. If you don’t, you can either ask or at least have a fallback. A basic text editor and the console usually works on most systems, although that doesn’t mean that kind of setup is easy to use. We have some stuff that might help, but it depends on what you are teaching and your specific needs. Different languages can cause major issues for blind individuals. I could go into detail, but imagine things like white space in Python. Or, imagine hearing statements like, “for left paren int i equals semicolon i less than ten semicolon i plus plus right paren left brace” in C. Both can cause headaches for various reasons. Find out about their specific needs beforehand if you can and if they are willing to tell you. If they just need magnification and large print materials, this stuff is a lot easier. If they are a total, then braille can be helpful. But, crucially, you need to know whether they know Braille, and if so, which kind. Braille standards have changed in recent years and it matters for computer code because of the special characters. I’m not a Braille expert, but if this is an issue on your end, I can get you info from some experts. Finally, one thing I almost always recommend to do before hand, just to make sure you have a little bit of context, is to download a screen reader and give it a shot. On Windows, grab NVDA, or on Mac, just press APPLE F5. Even spending an hour going over a tutorial can help give you a little of context. Spending an hour programming blind on your own won’t make you an expert, but it’s such a different way of programming that it might help give a glimpse into that world. Read More ›

Teaching at the Board
Chris Hamm / 2016-09-20
This post originally appeared on Chris Hamm’s personal blog. Software Carpentry is a non-profit organization that teaches basic computer skills. The lessons for these courses assume no prior knowledge among the learners. I am a certified instructor for Software Carpentry and its sister organization, Data Carpentry. I have taught two Software Carpentry workshops at the Federal Reserve Board in Washington, DC and they were very different from one another. Both workshops were successful but for very different reasons. This post explores the reasons why (I think) they were so different. The first course was held in late June and the learners were all relatively new employees and the level of lessons (shell, R, git, and more R) was appropriate for their skill level. The students were engaged, followed the materials, and it was an excellent workshop. The second course was held in late August and the learners for this workshop were all seasoned employees that had worked at the Fed for 2-3 years. After only 30 minutes into my R lesson I could tell that I did not have the students. I’ve taught this lesson ~10 times for Software Carpentry, I know the material very well and consider myself a good teacher. This was the first time I did not have any questions or learners in need of assistance. Something was up. I called an audible. I paused the lesson and started a discussion with the students to understand why the lesson was falling flat. The learners conveyed that they were all experienced with R and that this material we far too simple for them. Yet, their level of expertise was above that of Software Carpentry lessons. My co-instructor and I decided to alter the lessons so the learners could get something out of the course. I ditched the basic R lesson and went into more data manipulation, installing packages from source, interfacing R and SQLite, using the ProjectTemplate package and how RStudio integrates with git. My co-instructor changed her materials to focus more on data manipulation via the shell. We were lucky to be able to make adjustments and come up with new lessons, but this is not and should not be the standard for Software Carpentry lessons. It is important for potential leaners to recognize that we teach basic computer skills and to read workshop descriptions before signing up. Sticky notes (the learners write something that we could improve on and something that worked well for them) from my lessons are below: Intro to R Worked well: Thanks for noticing & adjusting to the level of the class: interested in for tomorrow: call stack, tidyr, split-apply-combine, vectorization I really enjoyed the small SQLite tutorial / everything was very clearly explained awesome explanation and super fun problem solving good changes at end loved interactive instruction I really benefited from doing exercises. It’s helpful to try things out yourself. Needs work: Too basic at first (but got better!) Git Worked well: Really liked going through ProjectTemplate w/git. I’m at the point in code, etc., where I’m thinking a lot more about organization etc. Although it was a little bumpy it wasn’t bad at all. On your own you run into errors and it was valuable to learn how to remedy them. Thanks for coming. I liked the shell parts & git Overall, I’m very glad I took the course. I’ll definitely adopt the RStudio, git, ProjectTemplate workflow. Great interaction / response to feedback from Day 1. very useful. having never used git, step by step Good git stuff, found it very helpful explanation of git basics were great, the repetition of commands was nice the git / unix classes are great. Very helpful. Needs work: Git demo became repetitive in the middle I might start with git and RStudio, then move to git in Linux because there are more visual clues to whats going on in RStudio More of a “Fed Board” problem but it would’ve been cool to work with the repositories already created in my section Why can’t we start at 9 Dimming the lights might make it easier to see the big screen. Thank you. Slow down the lecture please No complaints Read More ›

Systems Biology Postdoc Position with The Jackson Laboratory
Sue McClatchy / 2016-09-19
The Carter Lab at The Jackson Laboratory is seeking a Postdoctoral Fellow in computational genetics and systems biology. Our group is developing novel computational methods to derive biological models from large-scale genomic data. The strategies we pursue involve combining statistical genetics concepts such as epistasis and pleiotropy to understand how many genetic and environmental factors combine to control disease-related processes in animal models and human studies. We are especially interested in dissecting the genetic complexity of autoimmune disease, neurodegeneration, and cancer. The Jackson Laboratory in Bar Harbor, Maine, USA, is recognized internationally for its excellence in research, unparalleled mouse resources, outstanding training environment characterized by scientific collaboration and exceptional core services - all within a spectacular setting adjacent to Acadia National Park. The Jackson Laboratory was voted among the top 15 “Best Places to Work in Academia” in the United States in a poll conducted by The Scientist magazine. Exceptional postdoctoral candidates will have the opportunity to apply to become a JAX Postdoctoral Scholar, a selective award addressing the national need for research scientists who are accomplished in the broadly defined fields of genetics and genomics. The award includes an independent research budget, travel funds, and a salary above standard postdoctoral scale. Applicants for both positions must have a PhD (or equivalent degree) in quantitative biology or another quantitative discipline such as computer science, physics, or applied mathematics. Experience in statistical genetics and gene expression analysis is strongly recommended, and applicants must have a commitment to solving biological problems and good communication skills. Expertise in scientific programming languages including R, C/C++, Ruby, Perl, or Java is recommended. Expertise in cancer genetics, immunology, or neurological disease is desired but not required. Read More ›

Show Me Your Model
Greg Wilson / 2016-09-18
As far as I can tell, there are no published studies showing that version control is better than mailing files around or sticking them in shared drives. I believe it is–I wouldn’t work on a project that didn’t use version control–but nobody’s ever gathered data, compared it to a model, and published the result. One reason, I think, is that we don’t know how to measure the productivity of programmers. “Lines of code per hour” clearly isn’t right: good programmers often write less code, or spend their time on the parts of problems that have the highest thinking-to-coding ratio. Without some operationalization of “better” and “worse”, it’s hard to rank or compare alternatives. This problem came up again when I tweeted, “If anyone has data showing Excel is more error-prone than MATLAB/Python/R once you normalize for hours spent learning it, plz post.” It’s clear from the responses that most people on Twitter believe this, but I’m not really sure what “this” is: There are more errors in published results created with Excel than in results created with scripting languages like MATLAB, Python, and R. OK, but given that many more people use Excel, that’s like saying that people in China have more heart attacks than people in Luxembourg. Results calculated with Excel are more likely to be wrong than results calculated with scripting languages. This is what I had in mind when I tweeted, and I don’t think the answer is obvious. Yes, there are lots of examples of people botching spreadsheets, but there’s also a lot of buggy code out there. (Flon’s Axiom states, “There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code.”) And even if this claim is true, correlation isn’t causation. I think that people who do stats programmatically have probably invested more time in mastering their tools than people who use spreadsheets. The (hypothesized) differences in error rates could easily be due to differences in time spent in reflective practice. People who are equally proficient in Excel and scripting langauges are more likely to make mistakes in Excel. This formulation corrects the flaw identified above, but is nonsensical, since the only meaningful definition of “equally proficient” is “use equally well”. Spreadsheets are intrinsically more error-prone than scripting languages because they don’t show errors as clearly, they’re harder to test, it’s harder to figure out what calculations are actualy being done, or they themselves are buggier than scripting languages’ math libraries. These are all plausible, but may all be red herrings. Yes, it’s hard to write unit tests for spreadsheets, but it’s possible: Felienne Hermans found that 8% of spreadsheets included tests like if(A1<>5, "ERROR", "OK"). I’d be surprised if more than 8% of people who do statistics in Python or R regularly write unit tests for their scripts, so the fact that they could is irrelevant. To be clear, I’m not defending or recommending spreadsheets. But if programming really is a better way to do science than using spreadsheets, surely we ought to be able to use science to prove it and to figure out why. What I’m really hoping is that if we figure out how to answer an “obvious” question like this, we will then have the tools we need to tackle harder ones. Was the switch to Python3 worth making? Will Julia be better enough than the languages we’re using now to justify the hundreds or thousands of programmer-years it will take to build a comparable ecosystem? What about requiring people to do code reviews when they review papers–is that a better place for them to spend their time than having them pair-program as they’re developing their own code? We make decisions like this all the time, but victory seems to go to the loud and the lucky more often than to the righteous. Leslie Lamport once said, “Writing is nature’s way of letting you know how sloppy your thinking is.” Experimental design has the same effect: it forces you to clarify what questions you’re asking and how you’re answering them. So instead of asking if anyone has data comparing Excel to programming languages, I should have asked, “What experiment would you run to decide whether spreadsheets are more or less error prone than programs?” Answers to that would be very welcome. Read More ›

The Discussion Book
Greg Wilson / 2016-09-10
Hot on the heels of Small Teaching (which we reviewed last week) comes Brookfield and Preskill’s The Discussion Book. Its subtitle is “50 great ways to get people talking”, and that’s exactly what it delivers: one succinct description after another of techniques you can use in classes or meetings to get everyone talking productively. Each one is covered in three or four pages with the headings “Purposes”, “How It Works”, “Where and When It Works Well”, “What Users Appreciate”, “What to Watch Out For”, and “Questions Suited to This Technique”. I’ve used some of these before, like Circular Response, Think-Pair-Share, and Justifiable Pressure. Others seem less practical to me, but given how incisive everything else in this book is, I’m probably mistaken. Overall, it reminded me of Lemov’s Teach Like a Champion, and I think it deserves to be just as widely read. Read More ›

Community Service Awards
Greg Wilson / 2016-09-10
The Software Carpentry Foundation relies on volunteer efforts to achieve many of its goals. It is now inaugurating a Community Service Award as a way for its Steering Committee to recognize work which, in its opinion, significantly improves the Foundation’s fulfillment of its mission and benefits the broader community. Details are available on this page; nominations are welcome at any time, and we will make the first awards before the end of this year. Read More ›

September Carpentries Community Call
Tracy Teal, Kate Hertweck / 2016-09-09
Our next Carpentries Community Call (formerly called Lab Meeting or Town Hall meeting) will be Thursday, September 15 (September 16 Aus/NZ). These meetings will now be monthly on the third Thursday of every month. It would be great to see instructors there! These calls are a great chance to connect with other Carpentry instructors and get updates and information on important and interesting topics for the community. Times: 7am Pacific / 10am Eastern / 2pm UTC / 12am (Sept 16th) Sydney 4pm Pacific / 7pm Eastern / 11pm UTC / 9am (Sept 16th) Sydney Topics this month will include: New lesson template Policy committee and update on CoC IRB approval and updates on assessment Highlighting manuscripts from our community Election on rules for Software Carpentry Steering Committee Head over to the etherpad to let us know you’ll be attending one of the two sessions. Read More ›

17 August-12 September, 2016: Steering Committee, Google Summer of Code, rOpenSci, Small Teaching, Ten Simple rules.
Martin Dreyer / 2016-09-09
##Highlights We are amending the steering committee election procedures. Some members of our community co-authored research papers, and would appreciate some feedback. A very succesful Google Summer of Code 2016 has come to an end. Openness can still lead to a frustrating situation. You can now log in to the etherpad to let us know if you will be attending the monthly Carpentries Community Call. ##Tweets Excel might be to blame for your conversion mistakes. Software Sustainability Institute fellowship programme 2017 applications now open. Learn how to make your research analysis better and reproducible at PyConZA 2016. Read more about Academic archetypes. What would you like to have at the Brisbane Research Bazaar (ResBaz)? A good read on how to be a Modern Scientist. If you have innovative ideas for open science the arnold foundation may be the place for you. ##Vacancies rOpenSci is looking to employ a postdoctoral scholar to help with grant research. ##General For our Instructor Training Course, we would like to put together ten simple rules on how not to engage your students when instructing. University of California, San Diego’s first Library Carpentry workshop was recently run, and it was a huge success. Small Teachings as suggested by James Lang may make a big difference in your teaching. The Discussion Book can be a helpful tool to get people tlking productively. 17 workshops were run over the past 30 days. For more information about past workshops, please visit our website. Upcoming Workhshops: September: University of Colorado,McGill University, Griffith University, University of Chicago, University of Waterloo, Scripps Institution of Oceanography, UCSD, European Molecular Biology Laboratory, University of British Columbia - EOAS, University of Southern Queensland, James Cook University, Townsville, Nathan Campus, Griffith University, University Of British Columbia. October: Simon Fraser University, Aristotle University of Thessaloniki, Simon Fraser University, The River Club, Queensland Brain Institute, University of Colorado Boulder, UW Madison, University of Würzburg. 2017: AMOS / MSNZ Conference. Read More ›

Post-doc Position with rOpenSci
Karthik Ram / 2016-09-07
The rOpenSci project based at the University of California, Berkeley seeks to hire a postdoctoral scholar to work on the research activities funded by the grant titled “Fostering the next generation of sustainable software and reproducible research practices in the scientific community”. The project develops open source software to promote reproducible research practices in the scientific community. The postdoctoral scholar will focus on a research topic aligned with their own interests in order to better understand and improve scientific software practices. Possible topics include but are not limited to: Defining and evaluating sustainability for research software Improving the development process that leads to new, sustainable, reusable, and impactful software Developing recommendations for the support and maintenance of existing software: engineering best practices, transparency, governance, business and sustainability Building large and engaged user communities Understanding and recommending policy changes to software credit, attribution, incentive, and reward; issues related to multiple organizations and multiple countries, such as intellectual property, licensing, and mechanisms and venues for publishing software, and the role of publishers Improving education and training Studying careers and profession institutional changes to support sustainable software such as promotion and tenure metrics, job categories and career paths We expect the postdoc will disseminate their findings in the form of blog posts, technical reports, workshop and conference talks and papers, journal papers or software products, as appropriate for the work and the applicant’s own career goals. While experience in developing academic software may be helpful, many projects in the areas above need not require software development experience. The candidate can expect to work closely with mentors from the rOpenSci project aligned with their interests, such as Dr. Karthik Ram and Dr. Carl Boettiger (UC Berkeley), Dr. Jenny Bryan (University of British Columbia), and Dr. Daniel S. Katz (University of Illinois), as well as other members from the rOpenSci and UC Berkeley communities. This position will be based at UC Berkeley, but arrangements for working remotely may be available. This research is funded by the rOpenSci project through a grant from the Helmsely Trust. The initial appointment will be for 1 year, full-time (100%), renewable for another year based on adequate progress made in the first year. The position comes with a competitive postdoc salary plus benefits, as well as a generous yearly allowance for computing equipment, conference travel, and other research expenses. Expected start date is November 2016. Please contact Dr. Karthik Ram karthik.ram@berkeley.edu with any informal questions about the position before applying. Qualifications: Applicants must possess a PhD or equivalent degree by their start date (more details in the full job ad linked below). Applicants from natural or social sciences, computer science, statistics or related disciplines are all welcome. To apply, submit the following items online at https://aprecruit.berkeley.edu/apply/JPF01132. Read More ›

We Still Can't Have Nice Things Together
Greg Wilson / 2016-09-05
Last year I used YAML and Norway to explain why why we can’t have nice things. We’ve just stumbled over a problem that has forced us to re-do some of the work we did to publish our lessons a couple of months ago, and which illustrates how openness can still be frustrating to actually do. Are you sitting comfortably? Then let’s begin. GitHub can publish repositories as website. If the user’s ID is gloom, and the project’s name is despair, then the GitHub repository’s URL is http://github.com/gloom/despair. If that repository has a branch called gh-pages, GitHub automatically creates a website at http://gloom.github.io/despair. You will never find a more wretched hive of scum and villainy than the web. As a result, sites and browsers need to take precautions, some of which affect us. Many sites (including GitHub) encourage people to use HTTPS (which is secure) rather than HTTP (which is not). In particular, newly-created repositories on GitHub will only serve GitHub Pages websites over HTTPS, and older sites are being pushed to switch over as well. This is often done using redirection: if you go to http://whatever (insecure HTTP), the website automatically redirects you to https://whatever (secure HTTPS). If a browser loads a page using HTTPS (secure), and that page then tries to load CSS stylesheets or Javascript files using plain old HTTP (insecure), the browser won’t do it. GitHub uses Jekyll to convert Markdown and HTML to published pages. If Markdown or HTML files in the gh-pages branch have the right kind of header, GitHub doesn’t publish them as-is. Instead, it uses a tool called Jekyll to translate them. Jekyll reads variables from a file in the project’s root directory called _config.yml and makes it available to pages as they’re being translated. For example, if the configuration file defines a variable called title, pages can refer to site.title. This lets people avoid repeatedly repeating information repeatedly. Our web pages need to know where to find their CSS and Javascript. Our lesson pages and workshop website pages have to refer to the CSS and Javascript we use to style them. The simplest way to do this is to use absolute references from the root like this: <link rel="stylesheet" type="text/css" href="/css/pretty.css" /> The only part of this that matters for present purposes is the href URL. It looks like an absolute path (i.e., it starts with a slash), so web browsers will automatically put the name of the website’s domain in front of it. For example, if the website is http://woe.com, and the page is http://woe.com/misery.html, then the browser will convert /css/pretty.css to http://woe.com/css/pretty.css. But wait: if the GitHub repository’s URL is http://github.com/gloom/despair. its website is published at http://gloom.github.io/despair. The last part of that URL — despair — isn’t part of the domain name, so the browser cuts it out when following absolute references. For example, imagine that the GitHub Pages website contains a page called index.html, and that page has the CSS link above to pretty.css. The browser will convert the URL to http://gloom.github.io/css/pretty.css, which is wrong, because the despair part of the path has been chopped out. Oops. OK, let’s just add the domain name. One way to solve this is to use full URLs for resources instead of absolute paths. For example, instead of loading /css/pretty.css, our web page could explicitly refer to http://gloom.github.io/despair/css/pretty.css. That’s easy… …except we want to share page templates between many different websites, each of which has a different base URL. More specifically, we want to have a single HTML file (let’s call it _layouts/page.html) that specifies our pages’ fonts and color scheme, places the logo in the right place, and so on. We don’t want to have to edit that page for each website, because then we’d have to re-do all our edits each time we wanted to make a style change that affected all our sites. Variables to the rescue. We’re not the first people to run into this problem, so GitHub provides some help. When GitHub runs Jekyll to convert our pages, it gives Jekyll all the variables we define in our repository’s _config.yml file, and another bunch of variables that GitHub automatically defines for us. One of these is called site.github.url, and its value is exactly the URL we want: the sub-domain with the base URL of our website. In our running example, the value of site.github.url is http://gloom.github.io/despair. Our layout can then use: <link rel="stylesheet" type="text/css" href="{{site.github.url}}/css/pretty.css" /> to refer to things. The double curly braces tell Jekyll to insert the variable’s value, so the link here becomes what we want. Or not. Unfortunately, GitHub always sets site.github.url to be the HTTP version of the site’s URL, rather than the HTTPS version. Boom: if the page is loaded via HTTPS (secure), the URL for the CSS is just HTTP (insecure), so the browser refuses to load it, and the page appears without any styling. It gets worse. There’s another problem here. We don’t want our pages to have URLs that start with gloom.github.io — we want them to start with optimism.org, because that’s the name of our website. GitHub lets us do this using something called a CNAME. In brief, we can tell GitHub that we want gloom.github.io to pretend to be optimism.org, so that: If someone goes to http://gloom.github.io, they are automatically redirected to http://optimism.org. If someone goes to http://optimism.org, the pages are served from http://gloom.github.io, but the URL still appears to be http://optimism.org. Oops: if Jekyll used the variable site.github.url when creating the web pages, all the URLs for CSS and Javascript in those pages will have http://gloom.github.io/despair as their URL. If the browser thinks it’s going to https://optimism.org (with secure HTTPS), then it has two reasons to refuse to load the CSS: those files are coming from insecure URLs (HTTP instead of HTTPS), and they’re coming from a completely different domain. Let’s load the styles from a fixed domain. But hang on: there’s nothing wrong per se with loading files from another domain. Why don’t we do something like this for our CSS: <link rel="stylesheet" type="text/css" href="https://content.org/css/pretty.css" /> The difference here is that the URL always refers to a fixed site (in this case, content.org) and always uses HTTPS. As long as that site has a valid certificate for HTTPS, the browser will quite happily load this file. And since the URL is independent of which website is hosting the page, the configuration file can define a variable like site.content_url to be a fixed value, and everything can refer to that and it will all just work and we can go home. But suppose we want to do some more work on the subway ride home. We make a change to a page, run Jekyll to convert the page to HTML, open it in the browser—and the CSS doesn’t load, because we’re offline. This isn’t a big problem for people who are creating workshop websites (which is by far the most common use of our templates). It is a problem for people who want to contribute to lessons, though, since they will often want to preview their changes locally, and may well be doing that work on a plane or while otherwise disconnected. Let’s define our own variable. All right, let’s try another approach. Suppose each of our websites defines a variable called site.baseurl in its configuration file to be the name of the project with a leading /. All of our web pages can then refer to things using: <link rel="stylesheet" type="text/css" href="{{site.baseurl}}/css/pretty.css" /> which Jekyll expands to something like: <link rel="stylesheet" type="text/css" href="/despair/css/pretty.css" /> If we access the page using HTTPS (secure), everything is fine, because this now looks like an absolute path below the name of the domain. If we access the page using HTTP (insecure) and are redirected to HTTPS, this is still fine (same reasons). And if we are using a CNAME, and have mapped http://gloom.github.io to http://optimism.org, then: http://optimism.org/despair/index.html is mapped to http://gloom.github.io/despair/index.html. The browser translates the reference inside that page from /despair/css/pretty.css to http://optimism.org/despair/css/pretty.css. The web then finds that file at https://gloom.github.io/despair/css/pretty.css, which is exactly what we want. Yay! We’re done! We can— Wait. What about offline work? When we run Jekyll locally to preview pages, it starts up a little web server at http://localhost:4000, and tells you “please go to this URL to preview your pages”. That URL is wrong if we are using this site.baseurl trick: we actually need to go to http://localhost:4000/despair to get everything. Interlude: What’s standard may not be right for everyone. Defining site.baseurl is the standard workaround for the problem we’re trying to solve, but it’s not a good solution for us. First, many of our users are newcomers to HTML templating, web servers, and pretty much everything else we’ve been discussing. If we rely on site.baseurl, people will (quite reasonably) follow Jekyll’s instructions to go to http://localhost:4000, get a “page not found” error, and wonder what they’ve done wrong. (This is not speculation.) Second, if we rely on site.baseurl, then everyone who creates a new workshop website will have to edit that site’s _config.yml file as well as its index.html file. Given what we’ve seen in instructor training workshops, that will significantly increase people’s frustration quotient. Overriding variables. Here’s another approach. When Jekyll runs on GitHub, it reads its configuration from _config.yml, and only from _config.yml. When we run it on our desktops, though, we can tell Jekyll to read several configuration files, each of which can re-set variables set in previous files. We can therefore create a second configuration file called _config_local.yml (or any other name we choose) and have it define site.baseurl to be the empty string. When we want to preview locally, we pass Jekyll extra parameters to tell it to read this configuration file, and all the URLs are then correct for a local build. This works — until someone just runs jekyll serve on the command line as they would normally do (and as all the online documentation tells them to). Boom: the CSS isn’t loaded. Again, this isn’t speculation (though it probably affects fewer people). Let’s use relative URLs. What if we don’t use absolute URLs at all? What if we use relative URLs everywhere? If a page is in the root directory of our website, it can refer to the CSS files using: <link rel="stylesheet" type="text/css" href="./css/pretty.css" /> If a page is in a sub-directory, it can use: <link rel="stylesheet" type="text/css" href="../css/pretty.css" /> i.e., use .. instead of . as the first part of the path to the CSS file. That will always work; the trick is to get the path to the root directory of the website into each page. A sensible system would automatically give us a variable with the path to the project’s root directory. Jekyll doesn’t, but we can define a variable for ourselves in each page’s header. If the page is in the root directory, page.root is .; if it’s a level down, page.root is .., and so on. The layout pages can then link to the CSS using: <link rel="stylesheet" type="text/css" href="{{page.root}}/css/pretty.css" /> Requiring every single page to define a particular variable when almost all of those pages will give it the same value feels like sloppy programming practice. Luckily for us, Jekyll provides a way to set a default. If we add this: defaults: - values: root: .. to _config.yml, then every page gets a variable called root with the value ... This almost does what we want: when we compile the Markdown file melancholy.md, we are creating a page melancholy/index.html in the output directory, so that its URL is http://gloom.github.io/despair/melancholy/. (By convention, a URL that ends with a slash / is assumed to refer to a directory, and the file we actually want is the index.html file in that directory.) Thus, all of our pages are one level below the root directory in the output directory, so they all want page.root to be .. But there’s one exception: the home page of the lesson itself. This page is ./index.html, i.e., it’s the index.html file in the root directory of the whole lesson, so its page.root needs to be . rather than .. We can handle that by explicitly defining page.root in index.md, which overrides the default set in _config.yml. Once we’ve done that, our pages, layouts, and included HTML fragments can all use {{page.root}}/this/that to refer to whatever they want. It’s not ideal — we’ll have to explain it to people who’ve used Jekyll before, and if we ever create deeper directory hierarchies, it will quickly become as complicated as the alternatives we’ve discarded — but it’s good enough for now. How this got into production. The new template that we deployed in June 2016 uses site.github.url. We recognized the problem with HTTP vs. HTTPS early on, so the standard layouts shared by all the lessons do this: <link rel="stylesheet" type="text/css" href="{{ site.github.url | replace_first: 'http:', 'https:' }}/css/pretty.css" /> i.e., they convert the http prefix given in site.github.url into https. That solved the problem for pages served from github.io domains, but not for domains using CNAME: GitHub even says that they don’t support HTTPS and CNAME domains (paragraph 3). I didn’t spot this because I didn’t think to test pages on CNAME’d domains: once it worked for HTTPS on GitHub, I assumed it would work everywhere. I should have known better. Hacks like turning http into https always break, and if one of my GSoC students had tried to put something like this into production, I would have told them to think again. The real lesson from this episode is that we still can’t have nice things — or rather, we can’t have them all at once. GitHub Pages are a great way for people to build simple little web sites. Templating tools like Jekyll are great too, and HTTPS is essential, but when you try to combine them, you wind up with this. If we really want people to do open research, we have to make openness a lot less frustrating. Read More ›

Small Teaching
Greg Wilson / 2016-09-05
Elizabeth Green’s Building a Better Teacher changed how I think about teaching, and sparked some good discussion in our community. Therese Huston’s Teaching What You Don’t Know had a similar impact a few years earlier, and now there is James Lang’s Small Teaching. As its title suggests, Lang’s book focuses on little things that teachers can do right now to improve their teaching, rather than on big, systemic changes that might have larger impact, but which require larger effort (and probably buy-in from other people). To be included, a practice had to: have some foundation in the learning sciences, have been shown to have impact in real-world situation, and have been used or observed by the author. His suggestions are all either: brief (5-10 minute) classroom or online learning activities, one-time interventions in a course, or small modifications in course design or communication with students. Most importantly, they require minimal preparation and grading. Frequent low-stakes quizzes to prompt recall, interleaving different material, having students write a one-minute thesis or draw a concept map, making the assessment criteria clear, setting aside time for self-explanation and peer explanation—none of these should be new to anyone who has been through our instructor training course, but Lang does an excellent job of organizing them and connecting them back to research and theory. We do less than half of what Lang recommends in our workshops. I’m going to start suggesting Small Teaching as an auxiliary text in our training, and I hope that a year from now, some of our instructors will be able to tell us how these techniques have worked for them. Read More ›

Google Summer of Code 2016 ended
Raniere Silva / 2016-09-05
As announced in April, we had some Google Summer of Code Students working with us this year. Manage workflow for Software Carpentry and Data Carpentry instructor training Chris Medrela, under the mentoring of Greg Wilson and Piotr Banaszkiewicz, worked on AMY implementing instructor training workflow that is already in use as we reopened instructor trining. Result-aggregation server for the installation-test scripts for Software Carpentry and Data Carpentry Prerit Garg, under the mentoring of Piotr Banaszkiewicz and Raniere Silva, worked on a web server that receive and store information provided by the installation script. Other projects under NumFOCUS umbrella This year our projects were under NumFOCUS umbrella. We thanks NumFOCUS for their support and we want to highlight their other Google Summer of Code projects. Dynamic Topic Models for Gensim Bhargav Srinivasa, under the mentoring of Lev Konstantinovskiy, Radim Rehurek and Devasena Inupakutika, worked on Gensim that now supports Dynamic Topic Model. Upgrade to datapackage.json standard for EcoData Retriever Akash Goel, under the mentoring of Henry Senyondo and Ethan White, worked on EcoData Retriever that now is compatible with Datapackage.JSON standard when saving the scripts that retrieved the data requested by the user. Also, now EcoData Retriever works with Python 3. Improving the state of Optim.jl for JuliaOpt Patrick Kofod Mogensen, under the mentoring of Miles Lubin, worked on JuliaOpt that now has a faster implementation of Simulated Annealing solver. Also, now JuliaOpt’s documentation includes tons of examples. Presolve Routines for LP and SDP within Convex.jl for JuliaOpt Ramchandran Muthukumar, under the mentoring of Madeleine Udell, worked on JuliaOpt MathProgBase that provides high-level one-shot functions for linear and mixed-integer programming. Ramchandran’s work focus on the presolving the linear and mixed-integer programming problems, a important step to improve benchmarks. Categorical Axis for matplotlib Hannah Aizenman, under the mentoring of Michael Droettboom and Thomas Caswell, worked on matplotlib to reduce the code that users need to write when working with categorical data. Read More ›

Feedback Sought on Two Papers
Greg Wilson / 2016-09-02
We would be very grateful for feedback on two papers co-authored by members of our community: Taschuk & Wilson: “Ten Simple Rules for Making Research Software More Robust”. Wilson, Bryan, Cranston, Kitzes, Nederbragt, & Teal: “Good Enough Practices for Scientific Computing”. Each paper has a link at the top to send us email; we look forward to hearing from you. Read More ›

Election Announcement: Amending Steering Committee election procedures
Kate Hertweck / 2016-09-01
The Steering Committee will be holding a special election on October 10-14 regarding the following amendment to Steering Committee elections: “Following an election, the new Steering Committee will meet jointly with the previous Steering Committee for no less than 60 days. For meetings during the first 30 first days, the new SC will not have voting privileges. After 30 days, voting privileges are transferred to the new SC.” We believe this amendment is necessary to provide continuity in leadership following elections as the Steering Committee transitions to its new members. We envision the timeline as follows: November: Elections announced (90 days prior) January: Candidate applications due; lab meeting to discuss candidates February: Elections, first joint meeting, new committee nominates officers March: Second joint meeting, new committee elects officers, old committee attends meeting but no longer has voting rights April: New committee continues normal operations An elected member, therefore, is obligated from the February in which they are elected through March of the following year. This amendment does not change the time period during which an elected Steering Committee has voting privileges. To see how this amendment fits into existing governance rules, please go here. If you have questions regarding this amendment, you are welcome to add them to this pull request. Given that the current Steering Committee is only the second to be elected by the community, we are committed to continue developing procedures and guidelines to best serve our community. Please stay tuned for information on how to vote! Read More ›

Ten Ways to Turn Off Learners
Greg Wilson / 2016-08-19
PLOS has published a very useful set of articles called Ten Simple Rules that covers everything from effective statistical practice to winning a Nobel Prize. I’m just as interested in what not to do and what mistakes to avoid, so as part of our instructor training course, I’d like to put together a list of ten simple ways you can turn off your learners. My first five are listed below; if you’d like to add your own, comments would be very welcome. Sneer at what they’re doing right now by saying things like, “OMG, you’re using a spreadsheet!?” or, “If it isn’t open, it isn’t real science.” Most scientists have been doing first-rate work for decades with their existing tools and practices; we may think we now have better ones, but telling them they’ve been wrong all these years isn’t likely to make them listen. Trivialize their difficulties by saying things like, “Oh, it’s easy, you just fire up a VM on Amazon, install this variant of Debian, and rewrite your application in a pure functional language.” This stuff is genuinely hard; talking as if it’s not (and implying along the way that they must be stupid if they don’t get it right away) isn’t going to motivate them. Choose exciting technology. “There’s this cool new language I’ve been meaning to try…” should send the listener running: “new” usually means “rapidly changing” and “poorly documented”, and while that may be fun for the 5-15% who like computing for its own sake, it’s just an extra load for the majority. (See also Dan McKinley’s talk Choose Boring Technology.) Insist on doing everything the right way. You don’t draw architectural blueprints before you paint a wall. Similarly, you don’t need a cross-referenced design document (with appendices) for a twenty-line script that merges two bibliographies. Insist that people use a different operating system or package because it’s more convenient for you. They have to deal with the intrinsic cognitive load of the actual lesson material; don’t also impose the extraneous load of new keyboard shortcuts and unfamiliar menus. Read More ›

Teaching Library Carpentry to Librarians at UCSD
Juliane Schneider / 2016-08-17
I sort of knew what I was getting into. I’d done the excellent instructor training in February at UC Davis, which is a good thing, because I didn’t know the first thing about instruction. I didn’t know the first thing about organizing workshops, either, but I figured what the hell; my colleague and partner-in-crime Tim Dennis had reserved the big conference room in our library, which is really hard to get. If you’re in academia you know the first rule, which is never waste a reserved prime conference room. With lots of prompts and help from my colleague Tim Dennis, we put together a Library Carpentry workshop at UC San Diego. Since the instructors were all from the institution (UC San Diego), the host/instructor issue wasn’t much of an issue. The workshop website was https://ucsdlib.github.io/2016-07-18-UCSD/ The workshop ran from July 18-22, and because some library staff would be unable to attend the entire workshop, I allowed people to register by the day. The schedule ran: Day one: Fundamentals/Regex Day two: Bash/shell/command line Day three: Git/Github Day four: Open Refine Day five: Office hours There were 40-50 people on days one and four, and about 30-40 on days two and three. We had five or six people come to office hours for help with Open Refine after the workshop. I taught the Foundation/Regex and Open Refine, Tim Dennis and Reid Otsuji taught Bash/Shell, and Matt Critchlow taught Git/Github. We had at least two helpers per day who were indeed very helpful. What We Learned While Preparing It takes a lot of time to think through what you need as far as room setup, materials, refreshments, and publicity. I wish I’d made a list for what we needed each day, as requirements differed throughout the week (the first day needed pens and paper, for example). You can rarely overestimate the amount of coffee needed when librarians and coding meet. Also, savory snacks are a hit, although I think someone lost a hand in the scramble for Babybel cheese. What We Learned While Instructing Etherpad, etherpad etherpad! Mention the etherpad! There were a few obstacles to getting our attendees to collaboratively note-take. First of all, I think that the fact that our helpers were taking notes gave them an ‘official’ air that we didn’t mean to give. Secondly, especially during the Bash class, people were saying that with the Bash, Notepad, browser and etherpad open, it was too much. I think that next time, we’ll make sure to have a second screen with the etherpad on it, so that people can see that others are taking notes and they can reference it without having it on their screen if inconvenient. The helpers shouldn’t huddle together in one spot in the room. Scatter them around the room so that they spend their time interacting with the students, not each other. Audience Things The audience was very engaged, helping each other and answering questions. From some of the comments, though, I think we may have erred on the side of caution in the Open Refine lesson by pausing too long to let everyone catch up. Next time, I’m going to try to strike a better balance in the speed of the lesson and more actively encourage students to help each other (and encourage the practice of staying off of peripheral devices and suspension of Pokemon play during the class). Particular Things In the Open Refine session, I am going to try to create some exercises in order to break up the demo. I think that three hours of demo is hard for an audience to take in even if they’re following along in the tool, so perhaps creating a second, ‘test’ dataset that can be used for exercises will drive home the concepts while allowing some hands-on expermentation and thought processes/co-learner discussion about the tool’s context and use once outside the class. The librarians who made up the workshop participants struggled to find context for the Bash/Shell and Git tools in their work. Matt made the excellent observation that while librarians are great at using tools, they don’t really know how or have experience in how to use a computer. I think this lack of experience of using computers ‘as computers’, so to speak, makes it more difficult to understand how Bash and Shell can be used in their current tasks. Git, on the other hand, has the interesting problem of being a tool for collaboration, which usually takes two people, or maybe one person and their doppleganger. Learning the steps in setting up a repository took so much time that the application of the space was not able to be effectively examined. My thought was that if we taught it again, especially within the library, we could find people interested in the setup, and have Matt hold a pre-class to set up a repository for each department. Then, in the class proper, we could concentrate on the collaborative aspect of Git/Github and then let them all work with their department-specific repositories as departments in the class. This will emphasize the collaborative uses of the tool and perhaps uncover use cases for the various departments. Of course, this would only work with a workshop that was institution-centered. We’d rounded up a bunch of whiteboards for the classes to use, but we never really incorporated them. If, like us, you find yourself unable to get a second screen up for the etherpad, encourage the groups of students to use the whiteboards. They can be used to work out errors or roadblocks, write down commands, and record ideas which can then be transferred to the etherpad. Also, remind students that if they get an error message while working with a tool during an exercise or challenge, you can do a search for that error message and find solutions which they can then record on the etherpad or whiteboard. Giving Back to the Community Things After the workshop, all of us were inspired to improve current lessons, and create new ones. We sat down together the week after the workshop and Matt led us through the approved Software Carpentry method of adding materials to the Library Carpentry repository via Git. Some of the suggested new lessons are R for Librarians and an advanced Open Refine class concentrating on using Regex and GREL. We also want to work on the Git/Github lesson and the Open Refine lessons that currently exist. The Grand and Glorious Conclusion We had a great time teaching the workshop! There were no brawls, nor were we pelted with muffins or laptops, so the students seemed to find it useful, which was reinforced by the comments and sticky-note feedback we received. We got several inquiries about when we were going to hold another one, so ongoing Library Carpentry instruction is something that is definitely a need for UC San Diego. Note: The original Library Carpentry repository created by Dr James Baker is here. Reworked and updated lessons from the recent global sprint are here. They are the lessons prefixed with ‘library’. Read More ›

1 -16 August, 2016: Assessment Deputy Director, Policy Subcommittee, Code of Conduct, Workshop Resources, Bug BBQ, and Vacancies.
Martin Dreyer / 2016-08-15
##Highlights Dr. Kari L. Jordan has been appointed as Data Carpentry’s new Deputy Director of Assessment. Join the Policy Subcommittee and/or provide feedback on the Reporting Guide for handling potential Code of Conduct Violations. Data Carpentry has put together a terrific resource for workshop organisation. Read about our recent Bug BBQ that was held in the lead up to publishing our lessons. ##Vacancies NCSA at the University of Illinois are looking for applicants for the position of Training Coordinator. NumFOCUS is looking for a full-time Projects Director. ##Tweets @CISTIB is now hiring research software engineers. Find the Carpentries on Facebook. High Energy Physics software training for the 21st century inspired by the principles of Software Carpentry. You can now also order Software Carpentry apparel and accessories from Cafe Press. ##General Could we used the principles discussed in Michael Kölling and Fraser McKay’s Heuristic Evaluation for Novice Programming Systems to create an evaluation system for our lessons? How well do you understand open source licenses? A survey has been set up to investigate. Did you know that the skills you learn from Software Carpentry might be able to help you change careers. James Cook University’s first Library Carpentry workshop was recently run by two newly qualified instructors. The University of Toronto Libraries also recently hosted a Library Carpentry workshop. Seymour Papert - one of the inspirations for Software Carpentry - recently passed away. 14 workshops were run over the past 30 days. For more information about past workshops, please visit our website. Upcoming Workhshops: August: Colorado State University, Michigan State University, Western Sydney University, University of Oklahoma, Vanderbilt University Medical Center, Interacting Minds Centre & Cognition and Behavior Lab, Aarhus University, Johns Hopkins University, Department of Earth and Planetary Sciences, University of Namibia, University of North Texas, Federal Reserve Board, University of Tasmania, University of Wisconsin, Madison. September: Griffith University, University of Chicago, European Molecular Biology Laboratory, University of Southern Queensland, James Cook University, Townsville, University Of British Columbia. October: Simon Fraser University, Aristotle University of Thessaloniki, The River Club, University of Colorado Boulder, University of Würzburg. 2017: AMOS / MSNZ Conference. Read More ›

2016 Bug BBQ Summary
Tiffany Timbers / 2016-08-15
At the beginning of the summer, the Software Carpentry community joined forces hold their first ever Bug BBQ. The goal of this event was to squash as many bugs in our core lessons as possible before we published and shipped the new version (2016.06) of the lessons. In addition to the goal of getting a large amount of work done as quickly as possible, we also aimed to use this event to engage and connect with our world-wide community. In anticipation of the event, we worked with the lesson maintainers to identify and create specific milestones (issues and pull requests) that needed to be resolved before we could publish the new lesson versions. On the day of the event, our community worked hard to address these milestones, as well as to proofread and bugtest the lessons. The Software Carpentry community embraced the Bug BBQ event. We had 7 local sites spread across North America and Europe, as well as many many people participating remotely across the globe. On the day of the Bug BBQ alone, we observed a tremendous increase in the number of submitted, merged and rejected pull requests per day compared to the previous month. Analysis courtesy of Bill Mills. The new version (2016.06) of the lessons have now been published, and details about who contributed, and citations can be found here. We would like to thank all who contributed to the new versions of the lessons, including those who participated before, during, and after the Bug BBQ. Our materials are far from perfect, but we’re very proud of what our community has built. The Bug BBQ was organized by the Software Carpentry Mentoring Sub-Committee. The committee welcomes feedback and ideas for future Bug BBQs and other community events. To get in touch with us, please email us at mentoring@lists.software-carpentry.org. Read More ›

Training Coordinator Position at NCSA
Greg Wilson / 2016-08-10
The Computational Science and Engineering program and the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign seek applicants for the position of Training Coordinator. This position will report jointly to the Director, CSE and the Assistant Director for Scientific Software and Applications at NCSA. The Training Coordinator will enable cutting-edge research involving the use and/or development of advanced digital software and hardware across all disciplines by delivering, coordinating, and administrating training programs for students and researchers at the University of Illinois and area institutions, including industry. The individual in this position will play a key role in the research life of the campus, identifying important computational and data science skills and technologies and demonstrating how they can be used to solve problems in computational science and engineering and other disciplines. For details, please see the full job posting. Read More ›

Resources for Running Workshops
Erin Becker / 2016-08-08
A successful workshop is the result of coordinated effort among many different types of participants, including instructors, helpers, hosts, learners and staff. Software and Data Carpentry offer two types of workshops: Self-Organised and Centrally-Organised. These workshop types differ in terms of instructor training requirements, fee structures, and participant responsibilities, with local hosts and instructors at Self-Organised workshops taking on administrative responsibilities normally handled by Carpentry staff. Instructors (both new and experienced) and workshop hosts often have questions about their roles in workshops logistics, especially with how their responsibilities differ between Self-Organised and Centrally-Organised workshops. To help clarify the roles played by the different participants, and the differences between self- and Centrally-Organised workshops, we’ve put together some resources to guide participants through the workshop organizational process. These resources are available on Data Carpentry’s “Host a Workshop” and “Self-Organised Workshops” pages and include: Checklists for: instructors hosts for Centrally-Organised workshops hosts/lead instructors for Self-Organised workshops and lead instructors for Centrally-Organised workshops Email templates for communicating with co-instructors, helpers, and learners An accessibility checklist A list of necessary equipment and A troubleshooting page We want these resources to be as useful as possible to our instructor, helper, and workshop host community. If you find that anything is unclear, incomplete, or would like to suggest an additional resource, please email ebecker@datacarpentry.org. Read More ›

Code of Conduct and Call for Volunteers for Policy Subcommittee
Erin Becker / 2016-08-08
The Carpentries are proud to share a common Code of Conduct (CoC), which outlines acceptable standards of behavior for our community members and those interacting with the Carpentries at in-person events and online spaces. Historically, however, we have not had an official process for reporting potential Code of Conduct violations or for adjudication and resolution of reported incidents. Thanks to input from our community, we recognize that defining these procedures is an important step in ensuring that any such issues are dealt with transparently in order to keep our community welcoming and safe for all. Members of the Carpentry Steering Committees and staff have been working on defining these policies, and have put together a Reporting Guide and Enforcement Manual for handling potential CoC violations. These documents are based on valuable insights gained from previous community discussions of this issue (especially here and here). While we have made every effort to represent the views voiced in these discussions, ultimately, the CoC impacts every member of our community. To ensure that these policies meet the community’s needs, we would like your input. The Carpentries are convening a joint Policy Subcommittee. Members of this group will be responsible for serving as advocates for the CoC, moderating Carpentry listservs, adjudicating reported CoC violations and developing and enforcing related policy as needed. If you are interested in serving the Carpentry community as a Policy Subcommittee member, please use this form to tell us about yourself, your involvement with the Carpentry community, and what valuable skills and perspectives you would bring to the Policy group. Applications will be open until Monday, August 15th at 5pm Pacific (Monday midnight UTC). Regardless of your interest in joining the Policy Subcommittee, we invite all of our community members to give us feedback on the CoC Reporting Guide and Enforcement Manual. These documents can be found here as a Google Doc. The finalized policy will take into account community comments, so please add your voice to the discussion! If, for any reason, you would be more comfortable communicating your comments privately, please feel free to email DC’s Associate Director Erin Becker (ebecker@datacarpentry.org) and I will ensure that your voice is represented in the discussion. The upcoming Lab Meeting will include a discussion of these issues. We encourage all community members to attend and share your thoughts. The Lab Meeting will be held Tuesday, August 16th at 1pm UTC and 10pm UTC. We greatly appreciate the diverse insights our community members have brought to this discussion so far and look forward to hearing more from you as we continue to engage on this important topic. Read More ›

Seymour Papert 1928-2016
Greg Wilson / 2016-08-02
Seymour Papert passed away on Sunday at the age of 88. I never had the privilege of meeting him, but Software Carpentry would probably never have existed if I hadn’t stumbled across his inspirational book Mindstorms. You can read more about his life and work here; when you’re done, please go and help someone learn something—I think he’d have liked that. Read More ›

NumFOCUS Project Director
Greg Wilson / 2016-08-02
NumFOCUS (the organization which shelters Software Carpentry, Data Carpentry, and several other open science projects) is seeking to hire a full-time Projects Director to develop and run a sustainability incubator program for NumFOCUS fiscally sponsored open source projects. This is the first program of its kind, with opportunity for enormous impact on the open source ecosystem. The learnings from this program will be public, meaning it has the potential to change how all open source projects are managed. For more information, please see the job posting. Read More ›

How Well Do Developers Understand Open Source Licenses?
Greg Wilson / 2016-08-02
You are invited to participate in a survey on software licensing designed to investigate how well software developers understand common open source software licenses. We are looking for software developers that have built or are currently building on open source software in their projects (and I am personally interested in hearing from people building open source software for research). The study is being conducted by Prof. Gail Murphy (murphy@cs.ubc.ca) and graduate student Daniel Almeida (daa@cs.ubc.ca); participating in the anonymous online survey will take approximately 30 minutes. If you are interested in participating, please go to: https://survey.ubc.ca/surveys/danielalmeida/software-licensing-survey/ If you have any questions, please contact us at daa@cs.ubc.ca. Read More ›

Heuristic Evaluation for Novice Programming Systems
Greg Wilson / 2016-08-02
I have recently been reading and enjoying a new paper by Michael Kölling and Fraser McKay titled “Heuristic Evaluation for Novice Programming Systems”. In it, the authors say: With the proliferation of competing systems [for novices], the problem [of evaluation ] has become more complicated. Not only should we ask the question whether such kinds of tools are helpful at all (which many instructors strongly believe them to be, even in the absence of hard evidence), but we need to decide which of a significant number of competing systems is “better” for a given task in a given context. Educators have to make choices, not only between using an educational IDE or not, but between a number of direct competitors. Studies evaluating the actual learning benefit of the use of a specific system are rare. This is not for lack of interest or realisation of the usefulness of such studies, but because they are difficult to conduct with a high degree of scientific reliability… Running two groups (experiment group and control group) in parallel is usually difficult to resource: the teacher almost doubles the workload and has to avoid bias. It also introduces an ethical problem: If we expect one variant to be superior, and the setting is an actual examined part of a student’s education, then we would knowingly disadvantage a group of students. However, if we run the two trials sequentially, it becomes very difficult to compensate for possible other factors influencing the outcome, such as difference in teachers or populations. They then propose 13 heuristic criteria by which programming systems presented to novices can be evaluated: Engagement: The system should engage and motivate the intended audience of learners. It should stimulate learners’ interest or sense of fun. Non-threatening: The system should not appear threatening in its appearance or behaviour. Users should feel safe in the knowledge that they can experiment without breaking the system, or losing data. Minimal language redundancy: The programming language should minimise redundancy in its language constructs and libraries. Learner-appropriate abstractions: The system should use abstractions that are at the appropriate level for the learner and task. Abstractions should be driven by pedagogy, not by the underlying machine. Consistency: The model, language and interface presentation should be consistent – internally, and with each other. Concepts used in the programming model should be represented in the system interface consistently. Visibility: The user should always be aware of system status and progress. It should be simple to navigate to parts of the system displaying other relevant data, such as other parts of a program under development. Secondary notations: The system should automatically provide secondary notations where this is helpful, and users should be allowed to add their own secondary notations where practical. Clarity: The presentation should maintain simplicity and clarity, avoiding visual distractions. This applies to the programming language and to other interface elements of the environment. Human-centric syntax: The program notation should use human-centric syntax. Syntactic elements should be easily readable, avoiding terminology obscure to the target audience. Edit-order freedom: The interface should allow the user freedom in the order they choose to work. Users should be able to leave tasks partially finished, and come back to them later. Minimal viscosity: The system should minimise viscosity in program entry and manipulation. Making common changes to program text should be as easy as possible. Error-avoidance: Preference should be given to preventing errors over reporting them. If the system can prevent, or work around an error, it should. Feedback: The system should provide timely and constructive feedback. The feedback should indicate the source of a problem and offer solutions. The full explanation of each criterion runs to half a page or more, and includes references to the research literature to clarify and justify it. As I read through these, a few things struck me: Most of the tools we teach in Software Carpentry score very poorly on these criteria. The Unix shell and Git, for example, are not engaging, are definitely threatening (in the sense that users quite reasonably fear the consequences of making a mistake), do not present level-appropriate abstractions or make system status clearly visible, (definitely) do not have human-centric syntax, and so on. (They do well, however, on edit-order freedom: both tools encourage tinkering and allow users to leave tasks partially finished and return to them later. On the other hand, Excel and OpenRefine score quite well: they’re engaging, there’s little redundancy, they present tabular data as tables (which programming languages could have started doing thirty years ago—but that’s a rant I’ll save for some other time), they make system status very visible, support edit-order freedom, and so on. Together, #1 and #2 make me think that there should be another couple of heuristics: authenticity (i.e., do practitioners use it in their daily work) and upper bound (i.e., how far can you go with the tool before you have to switch to something else). Git and the Unix shell score highly on both, as does OpenRefine, but Excel does less well. Tools like Scratch come up short on both counts: while it’s a wonderful way to teach programming to newcomers of all ages, most people quickly outgrow it. Having invented two more heuristics, though, I can’t help but wonder whether doing so is actually rationalization. I’ve said many times that if you can’t win, you should change the rules: it’s entirely possible that if (for example) Git had scored highly on Kölling and McKay’s heuristics, I wouldn’t have thought to suggest others. It’s interesting to compare RStudio and the Jupyter Notebook using these heuristics. In both cases, I think that when they do poorly it’s because they are containers for a purely-textual programming language: for example, neither does particularly well on the “human-centric syntax” heuristic, but that’s not their fault. I think RStudio does better than the Notebook overall, primarily because of its interactive debugger and continuous redisplay of the state of the workspace. One final thought: it would be really interesting to have a similar set of heuristics for evaluating lessons. Some criteria would transfer directly (e.g., engagement, being non-threatening), but others are thought-provoking: what’s are the equivalents of error avoidance and edit-order freedom for teaching materials? If anyone knows of a rubric like this, I’d be grateful for a pointer. See also this post from Mark Guzdial that identifies five principles for selecting a programming language for learners: Connect to what learners know Keep cognitive load low Be honest Be generative and productive Test, don’t trust Michael Kölling and Fraser McKay: “Heuristic Evaluation for Novice Programming Systems”. ACM Transactions on Computing Education, 16(3), June 2016, 10.1145/2872521. Read More ›

Data Carpentry's New Deputy Director of Assessment
Greg Wilson / 2016-08-02
Data Carpentry has just announced that Dr. Kari L. Jordan will be joining them as the Deputy Director of Assessment. Kari holds a PhD in STEM Education from Ohio State University, and has worked most recently at Embry-Riddle Aeronautical University in Florida, where her postdoctoral research focused on understanding factors that influence faculty adoption of evidence-based instructional practices. To learn more, please see the post on the Data Carpentry website or follow Kari as @drkariljordan on Twitter. Read More ›

Library Carpentry in Toronto
Greg Wilson / 2016-07-30
On July 28-29, a group of volunteers from the University of Toronto’s libraries ran a two-day workshop for thirty-five of their fellow librarians. People came from as far away as Sudbury, Ottawa, New York City, and even Oregon to spend two days learning about: regular expressions, XPath and XQuery, OpenRefine, programming in Python, and scraping data off the web. While there were inevitably some hiccups getting software installed on learners’ machines, everything ran pretty much on schedule, and the instructors got through most of the material they had planned to cover. What was particularly nice was the way the modules fit together: the Python lesson closed by showing people how to write programs using regular expressions, while the scraping lesson referred back to the XPath material. Kim Pham, Leanne Trimble, Nicholas Worby, Thomas Guignard, and a large roster of helpers did a great job organizing and delivering this event. The best part was the email that arrived an hour after it finished: Hey Kim (and the rest of the Software Carpentry Team), I just wanted to let you know that straight after the workshop, I went back to my office, scraped data off of a website and into OpenRefine and solved a problem that’s been plaguing me for a month. THANK YOU for such a great workshop, it’s already useful!! Read More ›

Library Carpentry workshop at James Cook University, Townsville
Jay van Schyndel / 2016-07-29
We held a two-day Library Carpentry workshop at James Cook University, Townsville on 14–15 July, 2016. The workshop was a first on several fronts - the first course run by Software Carpentry trainers Jay van Schyndel and Daniel Baird from the JCU eResearch Centre and the first Library Carpentry course run at JCU. Collin Storlie, the local QCIF eResearch Analyst, agreed to help out too. How was the workshop proposed? Clair Meade from the JCU Library contacted Jay van Schyndel (eResearch Centre) enquiring about Library Carpentry. After some discussion Clair quickly found 12 interested librarians. With encouragement from Belinda Weaver at QCIF, Jay and Clair organised the workshop. Given the-mid semester break, finding an empty room proved difficult, but luckily we found two empty rooms. With the rooms booked, dates sorted, it was time to create the workshop web page and start advertising. Very quickly we received 17 attendees. Excitement was building!! Day 1 of the workshop Jay van Schyndel, Collin Storlie and Clair Meade started early setting up the room. Jay and Collin quickly learnt that librarians are great at organising morning tea. The trolley came with tea, coffee, hot water urn, biscuits, cheeses, dried fruit. Great start to the day, thanks to Clair for organising the morning and afternoon tea. Very quickly the librarians arrived and started setting up their laptops. The morning started well with the jargon busting session being a good ice breaker. The section on Data Structures was covered quickly as librarians already understand the importance of well-organised and -structured data. This gave Jay more time to focus on Regular Expressions. Some feedback received on the session: “exercises were really useful in reinforcing the ideas we are learning :)” “Enjoyed the exercises it helped understand the different regexes. Struggling to connect the information we are learning and how I will be able to use it.” “Cheat sheet in the hand would be nice. Love the interesting websites and good exercises.” After lunch we started on Shell. Most people were using GitBash with a few people using Terminal in OS X. Some feedback received from this session: “awesome power of grep/pipe - shame some people had to leave because the ‘finale’ was great but people were also tired.” “very powerful!! So much to learn!” “playing with the program was great. Next time do a bit of a demo of what’s happening first, then get us to play along.” We did run over time to try to cover everything. It’s a fine line between covering the material but also ensuring the audience can keep up. Jay was very appreciative for the assistance provided by Collin and Clair during teaching day 1. Day 2, New Room This was actually much better as there were screens spread around the room and everyone sat at smaller tables. Daniel Baird presented the session on Git. Here is some of the feedback. “Git session: very useful + lots learnt. Blog session: not relevant for me, maybe too basic.” “Github: Good, able to follow, starting to make a bit of sense. Git: ditto, starting to make sense can see benefits.” “Good foundation of what it is and how it works. How can we use it in the library?” The last session was OpenRefine presented by Collin Storlie. The librarians quickly saw the usefulness of this tool. Here is some feedback received: “Great workshop!! Well explained and paced. Good to leave the window open so that we can see the steps taken. (and not open another window and tab to the other)” “Found this tool very interesting and keen to test out back in the office. Pace and delivery of the lesson was great and easy to follow.” “sessions structured well and very useful stuff included :) Open refine looks great!” In summary, Library Carpentry was well received at JCU. After reviewing the feedback, it is plain most people found the training beneficial in gaining new skills to assist in their daily tasks. We will happily run another course in the future. Jay and Daniel are now planning to run their first Software Carpentry course at JCU since qualifying as instructors. Collin will assist. A previous Software Carpentry R workshop was taught at JCU in 2015 by fly-in trainers Sam Hames and Paula Martinez from Brisbane. Read More ›

1 - 28 July, 2016: Lesson Publication, Instructor Training Open, Creating New Material, Revamped Lesson Template, and Instructor Testimonial.
Martin Dreyer / 2016-07-28
##Highlights Congradulations and thanks to everyone who contributed to Version 2016.06 of the Software Carpentry lessons which have just been published. You can now partner with Software and/or Data Carpentry to develop new lessons. Get in touch to discuss your ideas. Please take note of our new lesson template. ##Intstructor Training Apply now for online instructor training. Some frequently asked questions about the reopening of the Instructor training have been addressed. Why attend a Software & Data Carpentry instructor training workshop? About ten hours is all you will need to qualify as an instructor after the initial training event. ##General University of Otago held a three day workshop which allowed attendees to also have some “productive” hours afterwards. Robin Wilson suggested some changes one could show during teaching in order not to confuse and bombard students with too much information. Universtiy of Auckland held a succesfull two day Genomics R workshop despite some random challenges. Universtiy of Auckland also held a Python-based winter bootcamp and received overwhelmingly positive feedback. Full house attendance taught by newly qualified Australian instructors under the guidance of Belinda Weaver. Read about the suggested inclusion of real-world challenges. Three workshops took place in Brazil in Florianópolis,Campinas, and São Paulo. The feedback was postive, attendees learned a lot. Currently they are looking for sponsors to promote the First Brazilian Software Carpentry workshop for Women in Science and Engineering. 17 workshops were run over the past 30 days. For more information about past workshops, please visit our website. Upcoming Workhshops: July: Colorado State University. August: RDA-CODATA Research Data Science Summer School, Universidade Federal do Paraná, Dept of Science, Information Technology & Innovation, Federal Reserve Board, University of Tasmania, Compute Ontario Summer School - University of Ottawa, Colorado State University, Cornell University, Vanderbilt University Medical Center, Stony Brook University, Interacting Minds Centre & Cognition and Behavior Lab, Aarhus University, University of Edinburgh, University of Namibia, University of Rhode Island, Coastal Institute. September: European Molecular Biology Laboratory. October: Aristotle University of Thessaloniki, The River Club, University of Colorado Boulder Read More ›

How Software Carpentry can help you switch careers
Marco Fahmi / 2016-07-27
You’ve heard how Software Carpentry is important not only to get your research done but also to make it better and thus more likely to be published. It might even get you a job in your favourite field of research. But, for those of us who do not like to do the same thing forever, Software Carpentry skills can help you switch to another career. But, first, let’s start from the beginning and look at what Software Carpentry is meant to do. Software Carpentry is primarily meant to teach you important analytical skills to think computationally: to break down research activities into simple steps, define how often to repeat certain tasks (and conditions to start and stop them) and how to deal with special cases. In addition, Software Carpentry introduces you to a suite of tools and languages that allow you to quickly and easily translate these activities into software code that you can run on your computer, run on the cloud, share with others, or even publish. However, none of this learning is specific to research and researchers. Anybody can benefit from a Software Carpentry course, except, perhaps a computer scientist … ;-) One can easily imagine educators using Software Carpentry to better analyse the performance of students in their schools. Journalists can use the skills to analyse data for investigative articles. Non-profit organisations can use them to crunch data on how well they serve their clients. In fact, a large number of fields and professionals are increasingly depending on empirical data in ways that are only now possible due to the increased ability to collect data electronically, the availability of volumes of open data through the Internet, and an increased emphasis on quantitative measurement of such things as performance and effectiveness. Some of these fields, such as the financial services industry, have long been leading in this domain for obvious reasons (Money!) For the rest, this is all still very new and those responsible for making quantitative evaluations often lack even rudimentary programming and coding skills. This is a great opportunity for anyone who does Software Carpentry to explore new ways to put their Software Carpentry skills to good use. Not only do they have the know-how to apply these skills to another industry, they also have the hands-on experience (from applying skills to their own research) to provide expert advice on what tools to use, how to use them and the pitfalls one needs to avoid. Are you a Software Carpentry instructor? Even better. Perhaps you could organise carpentry sessions for people in other industries. Just like in research, the point is not to turn learners in those industries into computer programmers (and the instructor is not meant to be a software guru). The purpose is to provide the necessary programming literacy to produce better and more reliable outcomes. Marco Fahmi is a Brisbane-based data scientist and former research project manager, with a strong interest in open data and data journalism. He tweets as dataronin. Read More ›

More on Instructor Training
Greg Wilson / 2016-07-26
Since we announced yesterday that we are re-opening applications for instructor training, several questions have come in by email and on Twitter. We’ve answered the general ones below, and will update this post as others arrive. Why are you running open-enrollment classes? Our primary goal is to increase our reach: many geographic regions and research disciplines are sparsely represented in our instructor pool, or not represented at all, and we are always conscious of the need to maintain and improve diversity. How does this relate to instructor training for partner organizations? Partner organizations are guaranteed a certain number of instructor training slots as part of their agreement with us, and those trainings are given priority over open-enrollment offerings. How many classes are being offered, and how many spaces are available? We are currently planning to run two classes, each for about 30 participants. Where will the training take place? These two classes will take place online, and will run over two full days. We will choose dates and times to accommodate as many people as possible. How are participants going to be selected? We will select participants based on previous involvement with Software and Data Carpentry, location, research discipline, previous teaching experience, ability to commit time to teaching and mentorship, and all of the other factors in the application form. When will selections be made? We will start notifying people in a few weeks (i.e., the second half of August). In what time zones will classes take place? We haven’t decided yet, but they will be aimed at different timezones so that we can accommodate the widest possible range of participants. Will you offer more open-enrollment classes in future? Yes. Wil people have to re-apply to take part? No: we will keep all applications, and contact people as spots become available to see if they’re still interested. Read More ›

Software Carpentry at Curtin
Matthias Liffers, Andrea Bedini / 2016-07-26
The first Software Carpentry workshop to be held at Curtin University (and the third so far in WA) started on Monday 18 July. We decided early to experiment with the timetable and spread the course over four half days. Monday was for the Unix Shell, Tuesday and Wednesday for Programming with Python and Thursday for Version Control with Git. We offered 40 places, and it was already fully booked with a couple of weeks to go. Most participants were from Curtin University, but there were also some attendees from CSIRO and other places. Matthias, Andrea, Philipp Bayer and Andrew Rohl took turns to instruct with a large group of friendly and enthusiastic helpers (thank you David, Janice, Kevin, Rebecca, Rob, Stef & Vicky). More thanks go to Shiv and Rebecca, who staffed the software installation help-desk on the Friday before. We find the software installation helpdesk to be an important part of a Software Carpentry workshop to help people with any problems. However, we found that only a few people take the opportunity to come along to a helpdesk organised before the main workshop. In light of some of the problems we had – on the first day we found out nano was not working on some Windows installations, and on the second day, we had some Python installation problems – we will move the helpdesk from a pre-workshop timeslot and make it part of the first day instead. The next Software Carpentry workshop held at Curtin will have an ‘installation party’ to kick off the workshop, helping us to figure out problems early on and ensuring the smooth running of the following lessons. Despite some minor hiccups with sticky notes that didn’t want to stick and some misbehaving software, everything went smoothly. We look forward to seeing some SWC graduates at CU Hacky Hour next week! The workshop was a collaboration between the Curtin Institute for Computation and the Curtin University Library. It couldn’t have taken place without volunteers from Curtin University and the University of Western Australia. Read More ›

Reopening Instructor Training
Greg Wilson / 2016-07-25
For the last ten months, the Software Carpentry Foundation has worked toward three goals for its instructor training program: Make the content more relevant. Increase the number of people able to deliver instructor training. Find a format that meets everyone’s needs in a sustainable way. We have made a lot of progress on all three, and are therefore now able to offer instructor training once again to people who aren’t affiliated with our partner organizations, but would like to teach Software Carpentry, Data Carpentry, or both (as the course is shared by both organizations). If you wish to apply to take part in one of the two open-enrollment classes we will offer this fall, please fill in the form at: https://amy.carpentries.org/workshops/request_training/ to tell us about yourself, what excites you about teaching, and how Software and Data Carpentry can help in your community. We will notify applicants as spaces become available. If you have any questions, please mail training@software-carpentry.org. If you would like to accelerate the process, check out our Partnership program. Organizational partners make ongoing commitments to supporting our organization and are prioritized for instructor training. If you need help making the case at your organization, feel free to contact us at partnerships@software-carpentry.org: we’d be happy to help. Please note that as a condition of taking this training, you must: abide by our code of conduct, which can be found at http://software-carpentry.org/conduct/ and http://datacarpentry.org/code-of-conduct/, agree to teach at a Software Carpentry or Data Carpentry workshop within 12 months of the course, and complete three short tasks after the course in order to complete certification. The tasks take a total of approximately 8-10 hours, and are described at https://carpentries.github.io/instructor-training/checkout/. For more information on Software and Data Carpentry instructor training, please see the course material at: https://carpentries.github.io/instructor-training Please also see this additional post, which answers some frequently-asked questions about this training. Read More ›

Showing Changes When Teaching
Robin Wilson / 2016-07-25
A key - but challenging - part of learning to program is moving from writing technically-correct code “that works” to writing high-quality code that is sensibly decomposed into functions, generically-applicable and generally “good”. Indeed, you could say that this is exactly what Software Carpentry is about - taking you from someone bodging together a few bits of wood in the shed, to a skilled carpenter. As well as being challenging to learn, this is also challenging to teach: how should you show the progression from “working” to “good” code in a teaching context? I’ve been struggling with this recently as part of some small-group programming teaching I’ve been doing. Simply showing the “before” and “after” ends up bombarding the students with too many changes at once: they can’t see how you get from one to the other, so I want some way to show the development of code over time as things are gradually done to it (for example, moving this code into a separate function, adding an extra argument to that function to make it more generic, renaming these variables and so on). Obviously when teaching face-to-face I can go through this interactively with the students - but some changes to real-world code are too large to do live - and students often seem to find these sorts of discussions a bit overwhelming, and want to refer back to the changes and reasoning later (or they may want to look at other examples I’ve given them). Therefore, I want some way to annotate these changes to give the explanation (to show why we’re moving that bit of code into a separate function, but not some other bit of code), but to still show them in context. Exactly what code should be used for these examples is another discussion: I’ve used real-world code from other projects, code I’ve written specifically for demonstration, code I’ve written myself in the past and sometimes code that the students themselves have written. So far, I’ve tried the following approaches for showing these changes with annotation: Making all of the changes to the code and providing a separate document with an ordered list of what I’ve changed and why. (Simple and low-tech, but often difficult for the students to visualise each change) The same as above but committing between each entry in the list. (Allows them to step through git commits if they want, and to get back to how the code was after each individual change - but many of the students struggle to do this effectively in git, and it adds a huge technological barrier…particularly with Git’s ‘interesting’ user-interface) The same as above, but using Github’s line comments feature to put comments at specific locations in the code. (Allows annotations at specific locations in the code, but rather clunky to step through the full diff view of commits in order using Github’s UI) I suspect any solution will involve some sort of version control system used in some way (although I’m not sure that standard diffs are quite the best way to represent changes for this particular use-case), but possibly with a different interface on it. Is this a problem anyone else has faced in their teaching? Can you suggest any tools or approaches that might make this easier - for both the teacher and students? Read More ›

Genomics R Software Carpentry workshop at the University of Auckland, New Zealand
Dan Jones, Vicky Fan / 2016-07-25
A two day Software Carpentry workshop with R was held at the University of Auckland Winter Bootcamp on 11-12 July. After a brief battle with the projector in the room, Day 1 consisted of an eventful morning session on Unix Shell, the spontaneous explosion of a glass door, followed by an introduction to programing with R. Dan Jones on Unix Shell. Dan’s talk on Unix Shell shatters the automatic door in the corridor. Day 2 consisted of the Git session, which was extremely relevant to our later bioinformatics-specific workshops, since these are also Git repositories. After the Git session, we had an open Q&A session where all the attendees could ask questions about any of the topics that we covered. The days 1 & 2 made for a great build up to the bioinformatics sessions that were run later in the week. As most bioinformatics-related software are optimised to run in the command line, the Software Carpentry sessions enabled researchers to build confidence with using a Unix terminal and R. The Genome Assembly, Annotation and Visualisation started with Dan Jones’ declaration, "I'm expecting everything go horribly wrong at setup" while setting up Virtual Box on the attendees laptops. Thankfully Dan’s prediction was completely incorrect. The workshop consisted of a virtualbox OVA file with Ubuntu 16.04 LTS, test data, and preinstalled bioinformatics programs. Pro tip: Some intel chips on some computers will completely block all virtualisation in the BIOS! Once the participants had their shiny new virtual machine set up, we went through the process of assembling and annotating a new Eukaryotic genome from scratch. We made all the workshop materials available on GitHub. The associated virtual machine is available on request. Day 3 was a workshop on Transcriptomics, again using the virtual machine we have constructed. As before, this was a workshop delivered as a Git repository, using the Git wiki as the workshop material. It’s based (and forked from) the excellent workshop produced by the Griffith Lab, but was modified to allow us to make it a 1-day workshop, and to add handling of ERCC spike-in controls and to simplify some of the code. Again, the materials are on GitHub. Our excellent Metabarcoding workshop was run on day 5, with the objective being to take raw sequence data from the machine and produce from that a table of OTUs, with associated taxonomy. This workshop used a virtual machine and a set of premade scripts to work through the different steps required to take raw sequence data and transform it into a useable form for downstream analysis. We used QIIME and vsearch, which are two different sets of software for metabarcoding analysis, to do this. A big thanks to all the presenters and helpers who made this series of workshops run so smoothly: Dan Jones, Luke Boyle, Vicky Fan, Alex Stuckey and Nooriyah Lohani. Note: transcriptomics tutorial heavily modified from: Malachi Griffith*, Jason R. Walker, Nicholas C. Spies, Benjamin J. Ainscough, Obi L. Griffith*. 2015. Informatics for RNA-seq: A web resource for analysis on the cloud. 11(8):e1004393. *To whom correspondence should be addressed: E-mail: mgriffit[AT]genome.wustl.edu, ogriffit[AT]genome.wustl.edu Read More ›

A Tale of Two Workshops
Belinda Weaver / 2016-07-22
Brisbane and Toowoomba are 125 km apart in Queensland. Software Carpentry workshops were held in both cities a week apart (11-12 July and 18-19 July). I taught at both. The Brisbane R workshop was held at The University of Queensland. This was a tie-in workshop for the annual UQ Winter School in Mathematical and Computational Biology. Attendees are generally very keen to learn as they want to emulate the amazing computational work they have seen demonstrated during the week by the stellar speaker line-up at the Winter School. We had no trouble filling a workshop with 40 places, and no-one left the workshop. It was one of the best workshops I have organised - the buzz in the room was palpable and the feedback was overwhemingly positive. It was also our first workshop where women attendees outnumbered men. There were four female instructors as well. We were very lucky with our helpers - we had some R experts there, and Othmar Korn from Stemformatics even wrote a script in response to a problem one of the attendees posted in the etherpad. We will probably do that again - call for specific problems to be posted as attendees always want ‘real life’ solutions to consolidate what they have learned. The University of Southern Queensland hosted the Toowoomba Python workshop - their first ever Software Carpentry workshop. They have already requested a subsequent workshop on R. Again, the feedback was very positive. Newly minted instructor Francis Gacenga taught part of Git for the first time, while Leah Roberts taught Python for the first time, having taught her first session of Git at the Brisbane workshop a week before. Apart from Leah, the instructors for the Brisbane workshop were a mixture of experienced trainers - Areej Al-Sheikh, Paula Martinez, and me, with one newbie, Joshua Thia, who certified as an instructor this week, having trained in the same January cohort as Leah and Francis under the expert eye of Aleksandra Pawlik. At both workshops, we used a mixture of cloud - the DIT4C setup - and local laptops which caused a bit of confusion, especially in downloading the data to the right place for the shell and Python exercises in Toowoomba. The DIT4C cloud option does simplify matters for people who have struggled to get the software installed, or who find they can’t cut and paste easily from their Windows command line. But it is always difficult to cater for the different systems, so next time, we will print out the different data set up instructions for Mac, Windows and cloud and have those on tap. (Linux is never a problem.) Our other gotcha was the eduroam wireless we use for workshops. We had quite a few connection issues in Toowoomba, and my own wireless connection dropped out just as I tried to do a git push at the Brisbane workshop. The only way I could reconnect was to reboot the machine, which delayed things at a crucial point. I was wiser in Toowoomba, rebooting my laptop just before I had to teach the second part of Git. Git push still took three goes as the repository I was pushing to had not yet granted the necessary permissions. But it all worked out in the end, and people said they enjoyed the session. To help people keep up, we had the lesson open on one of the projector screens in the room, while the instructor live-coded on another. This really helped people stay on track and not get lost. We could not do this in Brisbane as we had only one screen to work with, but where there are multiple screens available, this can work really well. Toowoomba was the fifth city in Queensland where Software Carpentry has been taught since we ran our first-ever workshop here in 2014. We hope to clock up a sixth town with a workshop at the University of the Sunshine Coast later this year. Read More ›

Software Carpentry workshop at the University of Auckland - Winter bootcamp, New Zealand
Juliana Osorio-Jaramillo, Teererai Marange, Chen Wang, Kaitlin Logie / 2016-07-20
A two day Software Carpentry Workshop wiht Python was held at the University of Auckland 11-12th July as part of Winter Bootcamp. For myself,it was the first time helping with Software Carpentry training. It was a great experience to assist as a collaborator, helping others resolving problems with software installation and hands-on exercises, but also learning from all the unexpected situations that could happen with different people around the room, with different laptops, configurations, and different sets of skills. We happily managed to start training at 9:00 on 11th of July. The first topic covered was Unix Shell, presented by Sina Masoud-Ansari. We warmed up with some useful exercises for real life research or work. In the afternoon the Python session appeared to show its magic to assistants with Prashant Gupta as the presenter. On 12th July we start the day with the presentation of Cameron McLean on Git or “the lifesaver” as Cam himself defined it ;). During the afternoon participants were consulted about which topic they desire to go deeper. Python was the winner of the survey, and as a consequence we enjoyed another great afternoon in its company. Additional to SWC sessions, on Wednesday 13th of July a session on Research Data Management was held by Cam, with a different audience, and a more theoretical approach. In general the participants had some experience with the topic, which made for an engaged group, generating some discussions and more knowledge to share among all. An invitation to hacky-hour on Thursdays was extended to the participants of all the workshops and happily it was a totally success! To finish, I should say that was a very gratifying experience be part of this event, helping, empowering, learning and motivating others to be part of this exciting world of software development in research environments where world transformation could start! Some remarkable facts: Software Carpentry training was held in the context of the Winter Bootcamp at the University of Auckland. 75% of participants didn’t have any experience with programming or just basic skills, however they managed to go smoothly in all sessions. Approximately 40 assistants during three days. Some assistants were so excited learning new skills that they decided to start new online courses using websites as codeacademy and coursera. On Thursday 14th after the workshops, hacky-hour was packed, we needed to join three more tables than as usual to bring some space to all the new participants. Helpers around the room (we were 4 and sometimes 5) were essential to maintain the pace of the sessions. All the problems with not working exercises and software were addressed by ourselves allowing the presenter to focus in the topic and manage better the time. The learners were very active, asking questions and excited with the new knowledge seen in action :) We are thinking to include next time, an exercise that could cover all the topics solving a real life problem and finish the workshops with an example that is useful for later consultation of the participants. The last but not the least...we receive some great feedback from participants which inspires us to keep improving and running future sessions. Here is one of them: "...I found it extremely stimulating and very helpful as an introduction to effective methods and resources for coding in python. Have to say that I was impressed by the skill, humour, good nature and patience of all the eResearch team..." We hope to start soon on more exciting, empowering activities and workshops. :) Read More ›

Publishing Our Lessons, Version 2016.06
Greg Wilson / 2016-07-19
We are very pleased to announce the publication of Version 2016.06 of the Software Carpentry lessons. Thanks to a lot of hard work by their maintainers and Rémi Emonet (who has acted as release manager), we have: removed the need to build and commit HTML (everything except our R lessons is now pure Markdown); updated the appearance of the templates (including a top menu, a nicer color scheme, and previous/next links); and merged over 3100 pull requests from over 200 people. Our materials are far from perfect, but we're very proud of what our community has built. Please see the releases page for links to the archived release, and the main lessons page for links to the updated lessons. Publication Records Daisie Huang and Ivan Gonzalez (eds): "Software Carpentry: Version Control with Git." Version 2016.06, June 2016, https://github.com/swcarpentry/git-novice/tree/2016.06, 10.5281/zenodo.57467. Doug Latornell (ed): "Software Carpentry: Version Control with Mercurial." Version 2016.06, June 2016, https://github.com/swcarpentry/hg-novice/tree/2016.06, 10.5281/zenodo.57469. Christina Koch and Greg Wilson (eds): "Software Carpentry: Instructor Training." Version 2016.06, May 2016, https://github.com/swcarpentry/instructor-training/tree/2016.06, 10.5281/zenodo.57571. Greg Wilson (ed.): "Software Carpentry: Lesson Example." Version 2016.06, June 2016, https://github.com/swcarpentry/lesson-example/tree/2016.06, 10.5281/zenodo.58153. Mike Jackson (ed.): "Software Carpentry: Automation and Make." Version 2016.06, June 2016, https://github.com/swcarpentry/make-novice/tree/2016.06, 10.5281/zenodo.57473. Ashwin Srinath and Isabell Kiral-Kornek (eds): "Software Carpentry: Programming with MATLAB." Version 2016.06, June 2016, https://github.com/swcarpentry/matlab-novice-inflammation/tree/2016.06, 10.5281/zenodo.57573. Azalee Bostroem, Trevor Bekolay, and Valentina Staneva (eds): "Software Carpentry: Programming with Python." Version 2016.06, June 2016, https://github.com/swcarpentry/python-novice-inflammation/tree/2016.06, 10.5281/zenodo.57492. Thomas Wright and Naupaka Zimmerman (eds): "Software Carpentry: R for Reproducible Scientific Analysis." Version 2016.06, June 2016, https://github.com/swcarpentry/r-novice-gapminder/tree/2016.06, 10.5281/zenodo.57520. John Blischak, Daniel Chen, Harriet Dashnow, and Denis Haine (eds): "Software Carpentry: Programming with R." Version 2016.06, June 2016, https://github.com/swcarpentry/r-novice-inflammation/tree/2016.06, 10.5281/zenodo.57541. Gabriel Devenyi, Christina Koch, and Ashwin Srinath (eds): "Software Carpentry: The Unix Shell." Version 2016.06, June 2016, https://github.com/swcarpentry/shell-novice/tree/2016.06, 10.5281/zenodo.57544. Abigail Cabunoc and Sheldon McKay (eds): "Software Carpentry: Using Databases and SQL." Version 2016.06, June 2016, https://github.com/swcarpentry/sql-novice-survey/tree/2016.06, 10.5281/zenodo.57551. Greg Wilson (ed): "Software Carpentry: Workshop Template." Version 2016.06, June 2016, https://github.com/swcarpentry/workshop-template/tree/2016.06, 10.5281/zenodo.58156. Contributors Hakim Achterberg James Adams Joshua Adelman Aron Ahmadia Matthew Aiello-Lammens Joshua Ainsley Inigo Aldazabal Mensa Phillip Alderman Harriet Alexander James Allen Areej Alsheikh-Hussain Paula Andrea Alison Appling Jeffrey Arnold Sean Aubin Pete Bachant Sung Bae Daniel Baird Alex Bajcz Piotr Banaszkiewicz Pauline Barmby Diego Barneche Ewan Barr Greg Bass Radovan Bast Berenice Batut Rob Beagrie Erin Becker David Beitey Trevor Bekolay Evgenij Belikov Jason Bell Jared Berghold Mik Black Kai Blin John Blischak Simon Boardman Maxime Boissonneault Jessica Bonnie Madeleine Bonsma Jon Borrelli Azalee Bostroem Olga Botvinnik Andy Boughton Daina Bouquin Amy Boyle Ry4an Brase Rudi Brauning Erik Bray Matthew Brett Karl Broman Amy Brown Kyler Brown C. Titus Brown Eric Bruger Dana Brunson Orion Buske Abigail Cabunoc Mayes Gerard Capes Greg Caporaso Scott Chamberlain Jane Charlesworth Billy Charlton John Chase Kyriakos Chatzidimitriou Daniel Chen Jin Choi Garret Christensen Kathy Chung Richard Clare Liam Clark Sarah Clayton Peter Cock Ruth Collings Matthew Collins John D. Corless Marianne Corvellec Thomas Coudrat Steve Crouch Mike Croucher Remi Daigle Ryan Dale Harriet Dashnow Matt Davis Neal Davis Andrew Davison Harrison Dekker Raffaella Demichelis James A. Desjardins Gabriel A. Devenyi Catherine Devlin Matt Dickenson Deborah Digges Emily Dolson David Dotson Alastair Droop Laurent Duchesne Jonah Duckles Susan Duncan Stevan Earl Dirk Eddelbuettel Rémi Emonet K. Arthur Endsley Loïc Estève David Eyers Sean Farley Emmanouil Farsarakis Bennet Fauber Nicolas Fauchereau Noel Faux Filipe Fernandes Hugues Fontenelle Marianna Foos Talitha Ford Félix-Antoine Fortin Anne Fouilloux Auriel Fournier David Fredman Konrad Förstner Francis Gacenga Javier García-Algarra Stuart Geiger Noushin Ghaffari Heather Gibling Matthew Gidden Ivan Gonzalez Jan Gosmann John Gosset Alistair Grant Jeremy Gray Norman Gray Bastian Greshake Pip Griffin Marisa Guarinello Thomas Guignard Jessica Guo Jordi Gutiérrez Hermoso Jonathan Guyer Melissa Guzman Jamie Hadwin Ryan Hagenson Varda F. Hagh Denis Haine Mary Haley Sam Hames Chris Hamm Jessica B. Hamrick Nicholas Hannah Michael Hansen David J. Harris Rayna Harris Emelie Harstad Ian Hawke Fabian Held Donna Henderson Felix Henninger Martin Heroux Kate Hertweck James Hiebert Konrad Hinsen Johan Hjelm Xavier Ho Amy Hodge Toby Hodges Jeff Hollister Derek Howard Adina Howe Daisie Huang Fatma Imamoglu Liz Ing-Simmons Luiz Irber Damien Irving Yuandra Ismiraldi Michael Jackson Mike Jackson Christian Jacobs Elsie Jacobson Nick James Seb James Dorota Jarecka Michael Jennings Ben Jolly Luke W. Johnston Dan Jones David Jones Nick Jones Blake Joyce Zbigniew Jędrzejewski-Szmek Alix Keener Kristopher Keipert Tom Kelly David Ketcheson Jan T. Kim W. Trevor King Isabell Kiral-Kornek Justin Kitzes Sigrid Klerke Thomas Kluyver Christina Koch Alexander Konovalov Bernhard Konrad Alex Kotliarskyi Andrew Kubiak Avishek Kumar Mateusz Kuzak Kathleen Labrie Sherry Lake Benjamin Laken Hilmar Lapp Doug Latornell Mark Laufersweiler David LeBauer Kate Lee Joona Lehtomäki Michael Levy Jean-Christophe Leyder Peter Li Matthias Liffers Philip Lijnzaad Johnny Lin Gang Liu Tom Liversidge Andrew Lonsdale Catrina Loucks Julia Stewart Lowndes Eric Ma Keith Ma Andrew MacDonald Joshua Madin Mark Mandel Alexandre Manhaes Savio Camille Marini Carlos Martinez Kunal Marwaha Ben Marwick Sergey Mashchenko Fernando Mayer Dan Mazur Sue McClatchy Sheldon McKay Emily Jane McTavish Lauren Michael François Michonneau James Mickley Ryan Middleson Jackie Milhans Eric Milliman Bill Mills Amanda Miotto Nora Mitchell Jason K. Moore Kim Moir Tim Moore John R. Moreau Joaquin Moris Elise Morrison Sarah Mount Andreas Mueller Zakariyya Mughal VP Nagraj Joshua Nahum Hani Nakhoul Narayanan Fran Navarro Lex Nederbragt Ryan Neufeld Daiva Nielsen Matthias Nilsson Juan Nunez-Iglesias Adam Obeng Brenna O'Brien Aaron O'Leary Jeffrey Oliver Randy Olson Catherine Olsson Adam Orr Jeramia Ory Natalia Osiecka Nina Overgard Therkildsen Braden Owsley Kirill Palamartchouk Elizabeth Patitsas Aleksandra Pawlik Chris Pawsey John Pearson Frank Pennekamp Sam Penrose Fernando Perez Adam Perry Stefan Pfenninger Raissa Philibert Jon Pipitone Adrianna Pińska Timothée Poisot Pawel Pomorski Hossein Pourreza Timothy Povall Paul Preney Leighton Pritchard Andrey Prokopenko Diego Rabatone Oliveira Louis Ranjard Florian Rathgeber Joey Reid Timothy Rice Adam Richie-Halford Kristina Riemer Janet Riley David Rio Deiros Scott Ritchie Natalie Robinson Andrew Rohl Ariel Rokem Noam Ross Marjorie Roswell Halfdan Rydbeck Michael Sachs Mahdi Sadjadi Elliott Sales de Andrade Maneesha Sane Michael Sarahan Pat Schloss Sebastian Schmeier Hartmut Schmider Peter Schmiedeskamp Henry Senyondo Bertie Seyffert Genevieve Shattow Leigh Sheneman Jason Sherman Arron Shiffer Ardita Shkurti Beth Signal Raniere Silva Sarah Simpkin Gavin Simpson John Simpson Clare Sloggett Luc Small Arfon Smith Byron Smith Brendan Smithyman Nicola Soranzo Donald Speer Erik Spence Ashwin Srinath Karthik Srinivasan Joseph Stachelek Mark Stacy Daniel Standage Valentina Staneva Jim Stapleton Meg Staton Peter Steinbach Sarah Stevens Marcel Stimberg Brian Stucky Michael Sumner Sarah Supp Marc Sze Scott Talafuse Morgan Taschuk Cody Taylor Tracy Teal Bartosz Telenczuk Andy Teucher Florian Thoele Adam Thomas Ian Thomas Brian Thorne Tiffany Timbers Chris Tomlinson Giovanni Torres Danielle Traphagen Tim Tröndle Daniel Turek Stephen Turner Fiona Tweedie Drew Tyre Olav Vahtras Giulio Valentino Dalla Riva Roman Valls Guimera Thea Van Rossum Jay van Schyndel Edwin van der Helm Anelda van der Walt Ioan Vancea Steve Vandervalk Jill-Jênn Vie David Vollmer Philipp Von Bieberstein Jens von der Linden Andrew Walker Jordan Walker Alistair Walsh Josh Waterfall Ben Waugh Belinda Weaver Lukas Weber Derek Weitzel Daniel Wheeler Mark Wheelhouse Ethan White Tyson Whitehead Chandler Wilkerson Jason Williams Carol Willing Frank Willmore Greg Wilson Donny Winston Kara Woo Tom Wright Steven Wu Lynn Young Nick Young Lee Zamparo Qingpeng Zhang Naupaka Zimmerman Andrea Zonca Read More ›

Lesson Incubation
Greg Wilson / 2016-07-19
The Data Carpentry and Software Carpentry Steering Committees recently approved [a process for supporting the incubation of new lessons. The goal is to provide a clear path for creating new material and getting it into the hands of people who can teach it while taking into account the resources we have available and the need to maintain lessons as well as create them. If you would like to build something substantial for either of the Carpentries, we’d enjoy hearing from you. (Please also see the announcement on the Data Carpentry site.) Read More ›

Using RMarkdown with the new lesson template
François Michonneau / 2016-07-08
Our lesson template is getting a face-lift. Actually, it is a lot more than that: all the internal mechanics are also affected. What’s new with the template? The lesson maintainers have developed a new template that comes with features that have been repeatedly requested such as an easy way to navigate among episodes within a lesson: there are now “previous” and “next” arrows in each episode. This new template does not require having to use pandoc. Currently, the lessons are written in Markdown, and converted into HTML by our lesson maintainers using a wonderful piece of software called pandoc. Instead, the new template uses jekyll to take care of the conversion from Markdown into HTML. Lesson maintainers will still need to have jekyll installed to check that the website gets generated correctly, but the conversion from Markdown to HTML for our online lessons will be handled directly by jekyll on the GitHub servers. This means less work for our lesson maintainers and contributors, as they will only to change to the markdown files. It also means that we will not have to put the generated HTML files in the repository, which removes another common source of error and frustration. What does it mean for the lessons written in RMarkdown? For the lessons covering R, we still need to keep in the repository both the RMarkdown files and the Markdown files. We write our lessons in RMarkdown to ensure that all the code included in the lessons works correctly. Because of the organization of the files with the new template, a few details had to be adjusted but for the most part, writing lessons in RMarkdown with the new template should not be too different from the current template. Contributors to the lessons should edit the Rmd files, and the lesson maintainers will run make lesson-md to generate the corresponding Markdown files using knitr, before pushing these changes to the lesson repository. The Rmd files live in the ./_episodes_rmd folder, and their respective Markdown files live in ./_episodes. The Makefile takes care of calling knitr::knit() on each Rmd files in ./_episodes_rmd and writing the output in ./_episodes. It is possible to have Markdown files in ./_episodes that don’t have their counterparts in ./_episodes_rmd as long as their names are different. The required preamble Each episode needs to start with a chunk that includes: source("../bin/chunk-options.R") knitr_fig_path("01-") The first line ensures that all the knitr options required to make the output compatible with the template are set correctly. The second line adds the episode number in front of each figure file. Here I used 01- but it should be adjusted to the correct episode number. We can’t use the knitr function opts_chunk$set(fig.path="01-") here as it would place the figures in a folder that wouldn’t make them accessible to the template by overwriting the global definition of this variable. As currently configured, all figures generated by code included in the episodes will be prefixed with rmd-, such that figures generated by the first episodes will all start with rmd-01-, making it easy to identify the origin of each figure in the ./fig folder. Data files If the code in one of your episodes relies on data files (or other files), they will need to be placed in a sub-folder (e.g., ./_episodes_rmd/data). This allows contributors to work interactively with each episode, and the code included in the episode looks like how we ask learners to get setup. The new template in the wild If you want to see examples of the new template in action, you can check out: the rendered lesson-example episode about using RMarkdown and its source the r-novice-gapminder and its source that I converted to the new template Read More ›

Why attend a Software & Data Carpentry instructor training workshop?
Carrie Andrew / 2016-07-06
Note: the following post was written by Carrie Andrew, University of Oslo, after a request to write a testimonial of her participation in a Software/Data Carpentry instructor training workshop. The SWC/DC initiative is a cutting-edge program that promotes computer and data skills to those who need the greatest help, but are often the most put-out to learn: beginners. It is taught by an ever-increasing, diverse assemblage of people across private and public, academic and non-academic, research institutions. The common denominator is that all have an interest and motivation to promote computer and data skills within STEM organizations. Even more interesting: The program is designed to be self-destructing. Once educational and research organizations are globally saturated with keen, well-trained individuals who can stand-in as the SWC/DC person(s) for their institutions, the initiative promises to end. When the need is met, it self-destructs, thus promoting change from the bottom-up, a grass-roots initiative for our futures! SWC/DC are built on a volunteer network. This promotes a welcoming and positive atmosphere throughout the entire hierarchy, from teachers to students. People understand the need for computer and data skills. Meaning: anyone with even a novice skill set can contribute in teaching activities (from blogs, course content and rubric, helping and even being instructors)! The only stipulation, beyond interest and motivation, is that all who contribute in teaching must take an instructor training course. Wait. A course?! For the teachers?! Yeah… uhm… ….[sigh; glance away; tapping foot]….I’m kind of busy…. No, no, the SWC/DC instructor training course is highly worthwhile to take! At first it could seem like a potential barrier to building a volunteer network, who by definition are already volunteering time from their jobs for the inititive, and maybe have already taught. It’s not a barrier, however. It’s a bridge. Or a helping hand. Or a community that helps, so many helping hands. And, to remind, they are all so nice and positive, because they are volunteers. Nice, positive, helpful hands. Isn’t that what education should be? As you could learn in the workshop, they think so. There are many reasons to attend an SWC/DC training course: The instructional training is quality, to the point that it would be worthwhile to attend irrespective of future contributions to SWC/DC (but we all hope you do contribute). It opens discussion on pedagogy, teaching-types, technological advances, equality and stereotypes, and provides a wealth of reference material to continue the thoughts beyond the two-day workshop limits. Second, it is also a training course to become an instructor for the SWC/DC initiative. Want to contribute? Then attend the workshop! Third, it builds a network of computer and data people, from across institutions, and with as open and positive an atmosphere as possible. The instructor training course begins with a crash-course in pedagogy before gently corralling students towards the diving board of instructional experiences, teaching, with video-recording exercises. These exercises are stressful for most, but are designed to help elucidate good points, point out blemishes, and recommend new methods — all before standing in front of the classroom. What better way to find out you pull your hair whenever you are not sure about an answer? Or that your voice raises two octaves? Do you know your ‘tell’? Just think, you could learn about this in a safe, open atmosphere (as with the instructor training course), or you can wait for your next job, which might be teaching to undergrads who really don’t want to take the class. And you have to design the course. A little guidance would be nice in that situation, and this instructor workshop extends beyond the SWC/DC goals by providing that. And if you already have teaching experience, bring that with, discuss and learn how to tackle those issues that always bothered you! People seem constantly leave the instructor workshops satisfied, open to discussion and better equipped to converse and investigate teaching further. The final reason to take it: it’s free. You could easily pay for the same education elsewhere, but with more competitive and less-nice people. Just make sure to help out those who helped you, and contribute to the initiative after! Read More ›

Instructor Training Completion Times
Greg Wilson / 2016-07-05
How long does it take to complete instructor training once the class itself is done? Two dozen people who recently qualified told us this: Reading Lesson(s)Writing ExerciseDiscussion SessionLesson DemoTotal 1.00.52.00.54.0 5.01.02.01.09.0 5.03.02.03.513.5 2.00.51.01.04.5 10.03.01.05.019.0 5.02.01.03.011.0 1.02.51.010.014.5 16.04.04.08.032.0 1.01.01.01.54.5 2.01.02.02.07.0 1.00.51.50.53.5 5.02.02.01.010.0 7.02.02.01.012.0 3.00.52.01.57.0 3.51.01.02.58.0 1.01.02.03.57.5 6.01.02.03.012.0 2.51.01.02.06.5 5.51.02.53.012.0 2.02.01.52.07.5 10.02.53.01.517.0 5.01.04.01.511.5 5.03.02.00.510.5 6.02.01.51.010.5 2.53.52.05.013.0 1.50.50.82.55.3 Min1.00.50.80.53.5 Ave4.41.71.82.610.5 Median4.31.02.02.010.3 Max16.04.04.010.032.0 There are a few outliers (which may be due to different interpretations of what time should be assigned where), but overall these numbers are pretty consistent, and will be shared with future training course participants. Read More ›

Software Carpentry workshop at the University of Otago, New Zealand
Mik Black, Tom Kelly, Murray Cadzow / 2016-07-04
A three day Software Carpentry workshop was held at the University of Otago from June 29 - July 1. The instructors for the course were Mik Black, Nick Burns, Murray Cadzow, David Eyers, Tanya Flynn and Tom Kelly, with Anthony Shaw and Riku Takei providing additional assistance as helpers. This was Tanya’s and Nick’s first time teaching at a SWC workshop, so it was great to add them to our Otago carpentry team. For the three day format, we reduced the number of contact hours per day so that we ran from 9am to 2:30pm - this gave our attendees with family commitments the ability to experience the whole workshop, and also meant that attendees with other work demands were still able to clock up some “productive hours” after the workshop finished each day. To better fit the three day format we also adjusted the order of our material a little - rather than teaching Unix shell, R and Git as discrete modules, we split the Shell and Git lessons into two, and gave an intro to these topics on day 1, then spent all of day 2 on R, then returned to Shell and Git on day 3, finishing with R functions at the very end. The goal was to show how all of the topics could link together as part of a single workflow - we didn’t quite get it right (next time we’ll use a single example that runs through all the modules), but it has definitely given us “food for thought” for our next Software Carpentry offering. The feedback overall was very positive, and there was certainly an improvement over our previous workshops, in particular with regards to pace, delivery and dealing with technical issues. We had a surplus of helpers so Tom became more involved in the “sharing” part working on the training materials, including a git lesson on branches and several pull requests to the Software Carpentry lessons repositories. It would be interesting so see whether others have had issues with the material where we did. It was valuable to have extra helpers avaiable at times (with technical issues springing up across the room). Favourite moments of Software Carpentry: Sitting with a learner helping her to find the errors in her code, and hearing her exclaim (something like): “Yes, it works! Now I mustn’t delete that. Wait, I’ll back it up with Git!” We’ll call that a victory for SWC. :) Scanning the Eventbrite bar codes with my iPhone. So fun! Having one of our learners adhere so closely to the “bring your own computer” requirement that she arrived with her 27 inch iMac desktop each day (admittedly she only had to carry it downstairs to the venue, but it demonstrated wonderful enthusiasm!) Having great gender ratios: 13 of our 19 learners were female, which is exciting given the traditional male-skew that we see in many computational fields. At the end of our final session we promoted ResBaz 2017 (it’s never too early to start spreading the word!), and our Otago Mozilla Study Group - many of our learners expressed interest in joining our regular informal training sessions, so hopefully our local community will continue to grow. Read More ›

Three workshops in Brazil
Raniere Silva / 2016-07-01
In Brazil, like in many other countries that we are starting to run workshops, there are many open questions that we need to answer such as how do we advertize our workshops? do we charge a registration fee? If yes, how much do we charge? how do we create a local community after the workshop? how do we get new partners after create a local community? Thanks to the work of many instructors like Anelda van der Walt, Belinda Weaver, Bill Mills, Lex Nederbragt, Selene L. Fernandez-Valverde, and Tiffany Timbers we are getting some hypothesis to the questions above across the world. In terms of Brazil, the three workshops that we ran in May added more data to help answering the questions. Workshop in Florianópolis This was the second workshop that we run in Florianópolis with more than one year between each. Our 20 seats sold out in a few days (we didn’t charge anything) but only four students attended all the sessions making this the workshop in Brazil with the smallest number of attendees. The low number of attendees can be due our workshop running at the same time as others activities of SciPy Latin America 2016 but I believe that the main reason was the missing of a local champion to motivate people to attend the workshop (in 2014 we had Diego Barneche Rosado). Workshop in Campinas This was the second workshop that we run in Campinas. The first one was an remote workshop last year. We had 40 seats available but we sold less than 20. At this workshop we charged R$100.00 (that at the time was something around US$25.00) per seat. A friend complained that the ticket was expensive but the workshops aren’t free of costs and if the host isn’t covering the costs we need to charge the attendees. Not having a full room helped a lot when we needed to find another one since the reserved room wasn’t available due an strike promoted by the Student Union. Workshop in São Paulo This was our forth workshop in São Paulo in the last three years making the place in Brazil with most workshops so far. All this four workshops had the support of The FLOSS Competence Center and I hope they will continue to support us. The 18 seats for this workshop sold out in a few days and we had a full room on all the sessions. This probably happend because our history of workshop lots of positive feedbacks. At previous events we had a waiting list filled to double the capacity, this time we did not because of less advertisement and the workshop was scheduled during a holiday. Learners feedback The feedback that we received on all three workshops were similar. The learners loved the friendly and welcoming learning environment that we provided, acquired and improved skills but also commented that the workshop could be longer than two days and be offer more times over the year. Pedagogical exchange In all three workshops we had a amazing team of instructors and helpers. Felipe Bocca, Diego Rabatone Oliveira, Filipe Pires Alvarenga Fernandes, Francisco Palm, Haydee Svab, Kally Chung, Monique Oliveira, Yuri Teixeira mentioned to learn something during the workshop that will help them when teaching the next one. We are seeking financial sponsors to promote the First Brazilian Software Carpentry workshop for Women in Science and Engineering to be delivery by Kally Chung, Haydee Svab and Monique Chung. We will be grateful for introductions to possible sponsors. Answers Coming back to the questions at the begin of this post, some possible answers: how do we advertize our workshops? Direct email to people on previous waiting lists works (and they say ‘thanks’ when attending the workshop) and asking help of local champions to invite learners is 100% effective. Messages on social networks can get you more learners. Ask your local host to put some flyers since this is the best way to reach out. do we charge a registration fee? If yes, how much do we charge? This is a case by case scenario. Having a basket of biscuits and some coffee and tea outside the room right before the breaks facilitate new collaborations and local community building so if you don’t have a sponsor I suggested that you charge for the cathering. how do we create a local community after the workshop? I don’t have a answer for this one yet. how do we get new partners after create a local community? I also don’t have a answer for this one yet. And the lack of funding on the current year budget makes this more challenge. But keep in mind that the Software Carpentry Foundation can provide you with letter of support for grant opportunities. Acknowledge I’m grateful for the support provided by the Software Sustainability Institute, Centro de Informática e Automação do Estado de Santa Catarina S.A., Espaco de Apoio ao Ensino e Aprendizagem, and FLOSS Competence Center to the workshops. Read More ›

1 - 30 June, 2016: Efficacy and Usefulness, Minutes, Discussions, Onboarding Documents, Teaching Undergraduates, and Library Carpentry Material
Bianca Peterson / 2016-06-30
##Highlights Have you previously attended a Software Carpentry workshop? Get involved with the follow-up study on “The Efficacy and Usefulness of Software Carpentry Training” by completing a survey. Feedback on your experience will be greatly appreciated! Minutes of the Steering Committee Meeting is available - suggestions to modify the workflow of the minutes are welcome! ##Discussions Please add your comments to the topics in ongoing discussions. There is a discussion on the mailing list about “onboarding” documents for laboratories or small research groups. Also see Lab Carpentry. Mexican/Spanish speaking instructors are needed for a Software Carpentry workshop later in the year. Please let us know if you’re interested to teach. Read about approaches and experiences on teaching python to undergraduate geoscientists. ##Other Huge progress was made on updating Library Carpentry material during the Mozilla Science Lab Global Sprint. Kally Chung wrote about their recent workshop with Raniere Silva in Brazil. Another successful Software Carpentry workshop was hosted in Palmerston North (New Zealand), also including a session on HPC. Belinda Weaver recently ran a Library Carpentry at the University of Queensland using the new materials. Call for submissions for the 4th Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4) are open. The third Software Carpentry workshop were run at the University of Canterbury, New Zealand in June. 41 workshops were run over the past 30 days. For more information about past workshops, please visit our website. Upcoming Workhshops: June: University of Otago & NeSI, Federal Reserve Board July: Pacific Northwest National Laboratory, University of Auckland and NeSI - Winter Bootcamp - SWC with Python, University of Auckland and NeSI - Winter Bootcamp - SWC with R, R workshop - The University of Queensland, SciPy 2016, University of Oxford, The Jackson Laboratory, University of Southern Queensland, University of Zurich, Curtin University, South Florida Water Management District, Philippines August: Colorado State University September: European Molecular Biology Laboratory October: Aristotle University of Thessaloniki Read More ›

Software Carpentry workshop at the University of Canterbury, New Zealand
Richard Clare, Paul Gardner, Constantine Zakkaroff / 2016-06-30
A two day Software Carpentry course was hosted at the University of Canterbury (UC) the 22nd and 23rd of June. The instructors, all of whom are from UC and are certified Software Carpentry instructors, were Paul Gardner (School of Biological Sciences), Richard Clare (QuakeCoRE), and Constantine Zakkaroff (Accounting & Information Systems). They were aided by three helpers: Christopher Thomson, Viktor Polak and Michael Gauland. Organising the event was greatly assisted by Sung Bae of the New Zealand eScience Institute (NeSI). NeSI also sponsored the refreshments for the coffee breaks during the workshop. The three courses taught were the Unix shell (Gardner), Python (Clare) and version control with git (Zakkaroff). The fourth session was used as an open discussion session. We had a total of 30 attendees, which was somewhat disappointing given that 40 had signed up and the event was listed as ‘sold out’ for the week prior to the workshop. The feedback from those that did attend was very positive of both the lessons and the instructors. The room we used (Kirkwood KE04), which has two independent projectors and can seat 40, was ideal. This was the third workshop at the University of Canterbury which also hosted the very first Software Carpentry in New Zealand back in 2013. We are looking forward to training more researchers in NZ and growing the community in the whole country. Read More ›

Minutes of Steering Committee Meeting
Raniere Silva / 2016-06-23
The Steering Committee informs that the minutes of their last meeting held last week are now available on GitHub. As the Secretary, I’m sorry for the delay to release the minutes of the meetings due some experiments on the workflow to write the minutes. Now I’m happy with the workflow that we have in place that will assign one ID for all our motions making easy for us to refer them. I still want to make a few improvements on the workflow to make the minutes more machine redable. If you critics and suggestions related to the minutes please send those by email to me at raniere@rgaics.com. Read More ›

Workshop Satisfaction Survey
Belinda Weaver / 2016-06-22
Calling all Software Carpentry workshop attendees! I’d like 10 minutes of your time to complete a survey about the Software Carpentry workshop you attended. Why should you bother? The data will be useful for Software Carpentry in the future planning of workshops. While individual Software Carpentry workshops are assessed at the time of delivery, no long term follow-up study has been done on the efficacy of the training delivered, nor of any impact the training might have had on attendees’ work practices, further skills acquisition, or subsequent career paths. I am now proposing to do that follow-up for an MPhil project at The University of Queensland entitled “The Efficacy and Usefulness of Software Carpentry Training”. The survey is the first phase of a two-phase study. All Software Carpentry workshop attendees are invited to complete it. It should take no more than 10 minutes to answer. It will capture information about workshops as well as demographic information (confidentiality is assured, and people don’t have to identify themselves if they don’t want to). Space is provided for attendees to write at length (if they want to) about their experiences, both good and bad. The second phase will involve in-depth interviews with people prepared to talk further about their experiences of Software Carpentry training. If that is you, I would love to hear from you. You can contact me directly at thesiscarpentry AT gmail.com Thank you in advance ! About me I am a certified Software Carpentry and Data Carpentry instructor, based in Brisbane Australia. I was recently elected to the Software Carpentry Steering Committee, and have served on the Mentorship Committee. I am really hoping this research will provide useful insights into what Software Carpentry does well, and where it could improve. I tweet as cloudaus. Read More ›

Teaching Library Carpentry
Belinda Weaver / 2016-06-22
Teaching Library Carpentry I ran a two-day ‘Library Carpentry’ workshop for librarians at The University of Queensland on 13-14 June. The class covered jargon-busting and data structures writing and using regular expressions command line tools to find data in files version control OpenRefine for data cleanup Eighteen people from six organisations attended, out of a pool of 26 who intially expressed interest. (In the small lab I was using, I only had room for 18.) Because of my wish to spread these skills far and wide, people were only welcome to attend if they agreed to teach the material to others within 12 months – to teach some of it at least. Despite some degree of trepidation among attendees, pretty much everyone signed up for that. Though I am a certified Software Carpentry instructor, this was my first time teaching the full Library Carpentry curriculum, so I was learning it myself as I went along. The material was developed by Dr James Baker, Owen Stephens and Daniel van Strien. This material was updated by a team during the recent Mozilla Global Sprint - see blog post, so there are updated lesson repositories sitting on GitHub under data-lessons as well (which made preparing for the workshop tricky). Teaching someone else’s material is always a challenge, and I made many blunders, but this helped people relax about their own mistakes. In the end, my lack of familiarity did not prevent my teaching at least some of the material efectively, which is worth noting. A little knowledge can be useful! I did warn attendees that the material would be challenging. If you have never used the command line before, it can seem an unforgiving place. I likened it to learning to drive, where nothing is familiar or intuitive, and you have to remember to do complicated things in the right sequence. Some people loved that section and could see uses for both grep and sed, which I covered in detail. People also enjoyed the jargon busting and regular expressions challenges from the first session. The most popular session was definitely OpenRefine. This tool was the one most attendees said they would use after the workshop. Some of the suggested uses included: Tidying up EndNote libraries by identifying where data is missing Wrangling data for bibliometric benchmarking exercises Cleaning up and standardising free text fields from surveys Anonymising sensitive data Turning messy, unstructured data into categorised data Using regular expressions to locate specific words or concepts within messy data Combining course feedback from different courses The Git session was run only through the Web interface. People created a blog using JekyllNow, and also created files to for a repository we set up at the workshop. This gave them the opportunity to fork and branch repositories and do pull requests in a visual-friendly environment. I was very fortunate to have the assistance of Marco Fahmi at the workshop. He knew regex backwards and came up with lots of good ideas to encourage greater involvement and input by attendees, and to use the tools later. He also helped people who got stuck at different points, not least me! He dug me out of the many holes I got into because of my inexperience with the material. Would I do it differently next time? I would. I learned it is vitally important, up front, to establish how people will use the different tools post-workshop. While we did brainstorm ideas for this in the workshop, it would help to have more ready-made examples at the start. Certainly the ideas for OpenRefine will be incorporated into the updated lessons, as will other examples from the workshop. But my key message really is : Just do it! Even if you are not an expert, you can teach this material, or learn it with others as a group. The full curriculum is available, including notes, handouts, challenges, scripts, quizzes and slides. Skills like these will only become widespread if people take responibility for sharing them. Have a go. If you want to be part of developing and refining the material, or if you just want to be in touch with others in this area, join our chatroom. I hope I’ll see you in there. Read More ›

Ongoing Discussions
Greg Wilson / 2016-06-20
We've been using GitHub issues to manage discussions for the last eighteen months or so. It's working pretty well: when something on the main mailing list looks like it's going to run for a while, we push the conversation over to GitHub so that people who want to go into more detail can do so without flooding other people's mailboxes. I've being going through those discussion issues while tidying up our lessons for their next release, and found a few that are still relevant, but which never reached a resolution. If you have thoughts on any of these, please go ahead add your comments—if your ideas don't get folded into the lessons this time, they will next. Debug in notebook Managing data analysis pipelines Adding prompt with CSS Get rid of humandate How to motivate use of licenses Citing studies of open access Storing instructors' notes online Read More ›

Teaching Python to undergraduate geoscientists: A summary of our approaches and experiences over the years
Christian T. Jacobs, Gerard J. Gorman, Huw E. Rees, Lorraine E. Craig / 2016-06-18
In 2010, a new course commenced at Imperial College London that aimed to teach undergraduate geoscience students how to program in Python. Over the course of five years (2010 to 2014 inclusive) our teaching methodology evolved significantly in response to student feedback and examination performance. The course lecturers (Gerard Gorman and myself, Christian Jacobs) along with several of our Teaching Assistants also undertook Software Carpentry instructor training which offered a fantastic pedagogical insight into how to effectively teach novices how to program. As a result, many of our teaching techniques were influenced by this which helped to greatly improve our course. In this blog post we summarise what we changed throughout the five years and why, in the hope that this will benefit other instructors currently offering, or planning to offer, an undergraduate computer programming course. To give a bit of context, each class normally comprised about 80-90 geoscience undergraduates, most of whom had no prior knowledge of computing. All teaching took place in a computer lab during a 3-hour time slot each week, for a total of 8 weeks. Initially we decided to adopt a traditional ‘talk and chalk’ lecturing style in 2010, spending most of the 3 hours describing theoretical concepts with very little time for practical exercises. The formal, summative feedback from the students at the end of the course (rating both the lecturer and the content via an online form) indicated a high level of student satisfaction. This was most likely because the traditional lecturing style was what the students were used to. Surprisingly, the examination marks painted a very different picture; a low mean mark of 50.5% in the final exam indicated that the learning outcomes were poor, despite the positive feedback. We quickly realised that, since programming is a practical skill, just like learning to ride a bike or learning how to swim, more time must be allocated for practice. The main change in 2011 was therefore to introduce an additional 3-hour practical workshop. Significantly more positive comments were received from the students regarding the quality and quantity of support on offer, and the overall high score from the summative feedback was maintained. However, while the mean examination mark of 68.9% was a significant improvement over the previous year, the amount of extra time allocated was unsustainable for future years; this highlighted a common problem of trying to fit in more computing into an already full curriculum (a point that was also noted by Greg Wilson in his SciPy 2014 talk). Furthermore, the trending issue indicated by the students’ comments was that of the pace being too fast. In 2012 we moved away from traditional lecturing and instead introduced online YouTube videos to deliver the course content. This successfully addressed the aforementioned issue of pace by allowing students to work at their own speed and watch the videos again if necessary. Yet in the summative feedback at the end of the course we realised that the students did not feel supported by this unfamiliar method of content delivery. The students gave the content and lecturer a considerably lower score this year, and largely negative comments were received along the lines of ‘the lecturer is not lecturing us’. It also became clear from the mean mark of 60.3% that this approach offered no benefit to learning outcomes. After taking Software Carpentry instructor training we became aware of the flipped classroom approach, which we implemented in 2013. This is a type of blended learning in which a brief lecture is given to establish context, followed by a much longer practical session in which the students tackled exercises within IPython Notebooks. The learning outcomes were significantly improved, with examination marks (74.5% mean) being greatly skewed towards the positive end of the scale. Positive summative feedback was received with respect to both support and pace. On the other hand, the majority of the negative feedback was about lecturing style, which once again did not match student expectations. In light of the success of blended learning we continued to use this approach in 2014, but justified the approach to students and emphasised the benefits throughout. Students felt reassured, resulting in only positive comments regarding lecturing style. We also implemented several other changes: Rather than having one long practical session, we split up the workshop into ‘bitesize chunks’. This mixed short lectures (10-15 minutes) with multiple, in-class exercise sessions (30-40 minutes) so that students did not become exhausted. We introduced the use of ‘sticky notes’, a technique frequently used by Software Carpentry workshops. Students were given one red and one green sticky note at start of class. Each student posts the green note on top of their computer monitor when they complete a particular exercise, which acts as a visual indicator of progress. The student posts the red note if they require assistance; this alerts the lecturer and TAs, and allows the student to continue working without keeping their hand raised, thereby improving productivity. Both the green and red notes were also used to provide positive and negative feedback at the end of each workshop. Once again the exam marks were positively skewed in 2014, and most students did very well in meeting learning outcomes. Overall, we feel that we have converged on an effective teaching methodology involving blended learning and formative student feedback. We hope to conduct a more formal, quasi-experimental pedagogical study in the future using a planned method of data collection from the outset, unlike the present study which was more retrospective in nature. All of our teaching material is available on GitHub (https://github.com/ggorman/Introduction-to-programming-for-geoscientists) under a Creative Commons Attribution 3.0 Unported (CC-BY 3.0) licence. Feel free to adopt/tailor it to your needs! Further details, including additional student and course data, can be found in our paper that has been accepted for publication in the Journal of Geoscience Education: C. T. Jacobs, G. J. Gorman, H. E. Rees, L. E. Craig (In Press). Experiences with efficient methodologies for teaching computer programming to geoscientists. Journal of Geoscience Education. Pre-print: http://arxiv.org/abs/1505.05425 Read More ›

Software Carpentry and HPC class hits Palmerston North (New Zealand)
Ben Jolly, Hannes Calitz, Markus Mueller, Wolfgang Hayek / 2016-06-16
On 7th and 8th June 2016 we gave a Software Carpentry training consisting of four blocks. We started with bash (presented by Markus, Landcare Research Hamilton), Python (Ben, Landcare Research Palmerston North), went on to git (Hannes, Massey University) and finally had a session around using High Performance Computing (HPC) as it is offered by the NeSI, the New Zealand eScience Infrastructure (Wolfgang, NIWA Wellington and NeSI). This last session is not one of the official SWC modules, but was taylor-made and rounded up the training very well, as it used the skills acquired during the preceding three sessions. It included hands-on training, working remotely on the NeSI Pan cluster in Auckland; all attendees managed to log on and submit jobs to the SLURM scheduler successfully. This was a very successful workshop with almost 40 attendants, organised and presented by a mixed team of new and more experienced instructors. The participants were very responsive and there was a lot of learning happening for the attendees and for instructors. The feedback was mostly positive and the negative feedback was very constructive. An aspect mentioned by several participants was that the number of attendees was probably too large, possibly with respect to the venue size - something we have to consider in the future. The logistical support with refreshments, received from the Massey in the person of Colleen Blair, was behind the scenes, but timely and well prepared. So we had a hydrated group, and a lot of fun was had by all. The new friendships made will hopefully lead to more cooperation between colleagues. Hannes’ favourite critique was: “It is impossible to listen and type as well!” – Note to self :) Many attendees were very positive about the tools and approaches presented, seeing applications for them in their current work or study. We received very good feedback about the number of instructors and helpers and their attited towards teaching and helping participants; most smaller and larger problems that people had could be sorted out relatively quickly. A number also expressed interest in on-going engagement through the likes of ‘hacky-hour’ type weekly meetups. Read More ›

Workshop at Unicamp
Kally Chung / 2016-06-13
A Software Carpentry workshop took place at Unicamp (State University of Campinas, Brazil) on May 23rd and 24th. Raniere Costa was the host and lead instructor, while Felipe Bocca, Monique Oliveira, and I completed the instructor’s team. Day 1: Although we had issues with the first booked classroom due to the student’s strike against budget cuts in public university education, Renato Santos gave the Unix shell lesson without problems. To solve the complication with facilities, we found an available computational lab to continue the workshop with the help of professor Doctor Francisco de Assis Magalhães Gomes Neto, IMECC’s director. After lunch, Felipe Bocca taught some R and then Raniere Costa taught RMarkdown. Day 2: We had the pleasure of welcoming Jonah Duckles, executive director of the Software Carpentry Foundation. I gave the first lesson on Git, and after lunch Raniere Costa continued with the lesson. A final lesson on R—the last of the workshop—was given by Felipe Bocca. To finish, Jonah Duckles encouraged the students to continue the learning started in the workshop and also spoke about how to contribute and participate in the Software Carpentry community. Conclusion: Despite of the initial facility problem, we managed to settle in and continue the workshop without loss in quality. All of the instructors did their best to attend the student’s needs and expectations. The feedback received from them was positive and most of them were interested in the Data Carpentry workshops. Read More ›

Updating Library Carpentry
Belinda Weaver / 2016-06-06
A global team worked to update the Library Carpentry curriculum and lesson material at this year’s Mozilla Science Lab Global sprint. The work kicked off in Brisbane, where Clinton Roy, Natasha Simons and I worked with Carmi Cronje in Sydney to start the ball rolling. Matthias Liffers came online in Perth two hours later. At 5.30 pm our time, we handed over to teams in South Africa, the Netherlands, and the UK, led by Anelda van der Walt, Mateusz Kuzak, and James Baker respectively. James, with his colleagues Owen Stephens and Daniel van Strien, were the original developers of the Library Carpentry material. During the sprint, James developed a new regex quiz, while Owen helped update and migrate the OpenRefine material. Jez Cope volunteered as a maintainer for the git lesson. Canadian and US teams began work as the sun moved around the globe. These included Juliane Schneider from UCSD working on OpenRefine and Gail Clement from Caltech, working on Author Carpentry. Laurel Narizny and Robert Doiel also signed on from Caltech. Cam Macdonell seemed to be up all hours and was a key driver for getting the material migrated into gh-pages and the new lesson templates. The original four-module lesson (covering shell, regular expressions, git, and OpenRefine) has now become seven, with a new SQL module (based on the Data Carpentry lesson) being added, along with others for persistent identifiers and computational thinking. Draft learning objectives were created for most of the main lessons which are linked from here. Material for five modules was migrated to the new Data Carpentry lesson template. Library-based datasets were swapped in to make the lessons more relevant to librarians. During the sprint, all the Library Carpentry action was co-ordinated through a dedicated chat room and via daily evening Hangouts where the day’s work was reported, before being handed over to the incoming team, rather like a baton being passed in a relay race. We hope to continue the conversation through the chat room as we continue to develop the material. This was a really great experience, and I thank all the amazing people who dedicated their time to the project, and who have volunteered as maintainers into the future. There is still a lot of work to be done, but we made huge progress. I hope Anelda, Cam, James, Mateusz and some of the other participants will also chime in with their own stories and achievements. My take is only partial. Please also correct any errors I have made. Thanks again all - what a fantastic effort. Read More ›

9 - 31 May, 2016: The First Bimonthly Report, Instructor Data Analysis, R Instructor Training, Measuring the Right Stuff, RSE Conference, and a Bug Barbeque
Bianca Peterson, Anelda van der Walt / 2016-05-31
##Highlights The 2016 Steering Committee’s first bimonthly report provides an overview on happenings in the community in the past two months. The joint partnership agreement with Data Carpentry is now available, together with a written version of the instructor training model, a call for subcommittees and task forces, and an official space on Facebook. Make sure to read the Code of Conduct in support of our welcoming, friendly, and diverse community. ##Instructor training The results from the first analysis of instructor training data is now available. Also see the discussion on the discuss mailing list. If you have a suggestion for further analysis, please share your thoughts! Applications for R instructor training in Cambridge, UK, are now open. Beth Duckles have recently published two reports on Software Carpentry instructors’ experiences and views. In a more recent blog she’s asking whether we are measuring the things that we actually care about. ##Events Are you a Research Software Engineer? Apply to attend the First Conference of Research Sofware Engineers. Reminder: the bug BBQ is coming up on 13 June. ##Other The Brisbane Software Carpentry community has expanded over the past few years and invite all newcomers to upcoming events. For an interesting take on Software Carpentry Workshops, see Christopher Lortie’s Common Sense Review. Attendees can make or break a workshop. Read about Paula Andrea Martinez’s experience of the First Data Carpentry workshop in Darwin, Australia. Read Cathy Chung’s post on what digital humanists do if you are curious about real-life digital humanities projects. Want to run a workshop for Women in Science and Engineering? These tips and tricks by Aleksandra Pawlik might help you! Still not sure what is good and not-so-good practices when teaching workshops? Watch these videos by Lex Nederbragt for demonstrations. David Andersen posted a great summary of how to apply what he has learned in Google’s Visiting Faculty program in academia. 25 workshops were run over the past 22 days. For more information about past workshops, please visit our website. Upcoming Workhshops: May: Central Queensland University (CQU), University of Puerto Rico Mayagüez June: McGill University, University of Wisconsin - Madison, Federal Reserve Board, Berkeley Institute for Data Science, Great Plains Network, Université Bishop’s, Massey University Albany & NeSI, Massey University, Palmerston North, University of Cincinnati, Elizabeth City State University, LANGEBIO-Cinvestav, Online, Online, University of Wisconsin - Madison, National Institutes of Health - FAES, Cornell University Statistical Consulting Unit (CSCU), University of Washington - Seattle, SRRC, USDA-ARS, New Orleans, LA, The University of Leeds, iHub, SIB @ University of Lausanne, University College London, University Library Basel, The University of Leeds, Materials Physics Center - University of the Basque Country, NERC / The University of Leeds, NERC / The University of Leeds July: Pacific Northwest National Laboratory, R workshop - The University of Queensland, SciPy 2016, University of Southern Queensland, Philippines August: Colorado State University September: European Molecular Biology Laboratory October: UC San Diego Read More ›

Further Analysis of Instructor Training Data
Greg Wilson / 2016-05-31
Following Erin Becker’s analysis of our instructor training data, Byron Smith has done another analysis using survival statistics. The three key figures are: Long story short, about half of instructors teach within 200 days of certifying, and about half of those teach again with 100 days. If anyone has similar stats from other volunteer teaching organizations, we’d be grateful for pointers. Read More ›

What Digital Humanists Also Do
Kathy Chung / 2016-05-30
Following up on an earlier post, here are two more user stories from digital humanists to help us figure out what they need and how we can help. Gamma Gamma is a senior scholar working on two projects. The first looks at communities of workers and architects in the Medieval and Renaissance eras, and relies on records of specific buildings, diaries of artisans, sketches, and building plans (some of which are very large: the width of a sidewalk, and the length of a city block). The end goal is to learn about the people who created the buildings, who they did the work with, the processes they used, what they were paid, and so on, and more broadly to understand the social network of people involved in creating the great buildings of the time. Gamma spends a lot of time in archives, copying and studying various original manuscripts (many of which have never been digitized). Since many of the buildings are still standing, they can also take photographs (which they have to do for mason’s marks, which can’t be represented in any existing font). There’s a lot of textual description to keep track of, and a lot of linkages between items as well. Delta Delta studies metalwork objects used to store religious relics. The research questions center around the designs, which can be used to identify regions and creators. Who saw the objects? Who used them? What was their sphere of influence (i.e., how far did they travel)? What influence did they have on the design of other things (like buildings)? This work also involves storing and managing lots of images and the relationships between them. There’s also a lot of map work: with 4000 objects in one collection and 5000 in another, finding out what was where, when, can help determine what might have influenced what. There are a few useful databases for this, but mostly Gamma has documents (containing text, images, and citations) in various folders on a hard drive. Epsilon Epsilon started the interview by saying, “I am hopeless with technology,” and like many people, has stuck to out-of-date versions of software rather than risk breaking anything. Their dissertation topic is the pictorial tradition for illustraing the late Medieval story “The Three Living and the Three Dead”, which was recorded in a number of languages by different poets. Their goal was not a comprehensive catalog of every image, but to characterize how the image was understood in continental Europe, and when and why differences arose between northern and southern Europe. 13th and 14th Century images tend to come with a copy of the poem; by the 15th Century the images took on religious meanings of their own. A growing number of these images available in Google search, and smaller libraries are also beginning to digitize their collections, but collecting images was still hard work, and costly (so much so that Epsilon eventually restricted the search to France, Germany, and the Low Countries). It was also hit and miss: institutions might commission more detailed descriptions of items in their collections, and even digital reproductions, but if no textual description from an archive or museum said “three living and three dead”, those images might never turn up. At the same time, while it’s easy enough now to take pictures with a phone, managing those is difficult, since there’s no easy way to add annotations directly on the phone. There is FADIS (the Fine Arts Digital Imaging System), but it can’t use it for personal research on a personal computer. Read More ›

What Digital Humanists Do
Kathy Chung / 2016-05-25
I have started gathering user stories from humanists to give us a better sense of what they do and how (or whether) our kind of skills training might help. Two of these are summarized below; there’s a lot to think about, but the most important observation is that DH really is different in important ways from what Software and Data Carpentry have been doing so far. Alpha Alpha is studying the meeting of cultures in a text which mixes vernacular (Judaeo-Provençal) and Hebrew. The text’s author was a physician, and says in the introduction that the vernacular was for women and children, while the Hebrew was for men. The text exists in only one hand-written manuscript from 1401, which is based on the book of Esther used in the Purim holiday. The original is at the Bodlein Library in Oxford, but is available on microfilm, and weighs in at about 22 folio pages. (Alpha heard that there was another copy in Italy, and obtained digitized copy, but the original was in such bad shape that they can’t even tell if it’s the same manuscript.) There are also two 18th Century handwritten copies, and a much later 19th Century copy from the Balkans, all of which have variations from the original. The first task was to transcribe the document, which meant learning the quirks of the scribe’s handwriting. (The good news is, those quirks can help to date the manuscript.) Once it was transcribed, the next step was to track down and record the sources of its Biblical quotations. Luckily, there’s already a database of Biblical and other religious texts from different periods, so this can be done with keyword searches. Determining the significance of the choices of quotations is a harder task, but necessary for writing a scholarly edition of the work. This obviously relies on the Medievalist’s expertise, but good bibliography management tools are essential. Step three was to translate the whole document for personal use, then do a better translation of the bits actually used in the Medievalist’s dissertation. The end product was based on the 1401 version, but included variants in an appendix; those variants were also used to fill in gaps in the original. (The author of the original documen wrote a colophon as an acrostic, which probably doesn’t matter from a software skills point of view, but is still pretty cool.) Alpha’s data management challenges mostly revolve around tracking different formats: microfilm, digitized images, PDF versions of a microfilm, and so on. Each page is a different image, so this has to be done page by page. The other major component is the critical analysis, which requires linking discussion of the literary, cultural, religious, and medical context of the work back to the original. Beta Beta has already used some computer tools in their MA research (described below), and tried to take a Software Carpentry workshop but was unable to schedule it. A friend introduced them to Markdown and Pandoc and they use it to write all their work now. They also uses Zotero. Their MA research revolved around everyday use of vernacular medical manuscripts in late Medieval England. These contain a lot of text about everyday small-time medieval practices, but not much about everyday life. Their starting point was a single manuscript made up of several booklets which are full of marginalia. The first task was to transcribe all the marginalia with important metadata (the folio, the marginalia’s location, the color of the ink, the date, the hand it was written in, the beginning of the associated medical recipe, etc.). The result was 700 lines of marginalia data, which they ended up analyzing using a pivot table in Excel. The analysis revolved around classifying and grouping comments; 700 records is a manageable number, but obviously doing this with a computer is a lot faster. (It turns out, by the way, that the most common subject was headaches, but this was probably influenced by the content of the main text.) Beta is now working with several other people on a content management system for DH projects. This relies on the International Image Interoperability Format, which they explained as “like DOIs but for images”. It links metadata to images so that you can open any image with an IIIF-compliant viewer, then add more metadata, and the original image holder can decide if they want to incorporate the new data or not. Their main goal is to help people produce authoritative scholarly editions of manuscripts. Lots of people will be involved in editing and annotating, and these will be used in many ways that the original creators cannot anticipate. One interesting challenge is how to handle variants: if there are 25 slightly different versions of a text, what do you print and how? And how much of the original appearance do you try to preserve? As noted in Alpha’s story, the handwriting can be as important as the actual text, and in Beta’s MA research, the physical location or ink color of comments mattered too. Read More ›

Darwin Data Carpentry at Charles Darwin University
Paula Andrea Martinez / 2016-05-23
##First Data Carpentry workshop in Darwin Australia## The COMBINE Data Carpentry R workshop at Charles Darwin University was filled with joy, enthusiastic attendees and extra degrees of warm. This was a Self-Organised workshop led by COMBINE with help of local organisers mainly from the Menzies School of Health Research. After some emailing back and forth, three months before the workshop, we planned to have an R data carpentry workshop based on the answers of a custom-made survey (prepared by Steven). Then, two COMBINE instructors were called, so Westa and I volunteered for this workshop and started with preps. The workshop was capped for 30 people with a nominal fee to cover for catering ($50 AUD). People were well-disposed and a good sign was that 45 days before the workshop the registration reached its limits. Later on, there were 2 dropouts and we were able to add 2 people from the waiting list. ##The workshop## Westa and I arrived to warm Darwin at around midnight and saw an amazing starry sky. It was going to be our first workshop together, coming from two different institutions (University of Sydney and The University of Queensland. We had COMBINE in common and a few prep chats. On the first day, we arrived 45 min earlier to the location and found Amar and Zuli organising morning tea and setting up the Menzies seminar room at Charles Darwin University. People started to arrive and everyone came in with friendly smiles. We had close to zero installation problems, on day one. Attendees actually did install all the software and this must be highlighted as most of them used Windows laptops. Our schedule was focused on teaching R in RStudio, with a bit of data management and version control. From the 30 registered attendees, there was a 50-50 ratio between females and males. About half of the attendees were from Menzies, the other half from other CDU schools. We had a group of mostly PhD candidates, but also masters and undergraduate students, one third were Drs (PhDs), with one Associate Professor and one Professor. All very smart people. Also, a bit less than one-third of the attendees were already using R. It was definitely one of the best workshops I’ve taught at and participated in. Why was that? Well, because of the group. You guys made it work out by asking questions, by being creative with the answers to challenges, and by sharing with your neighbours. In summary, by enthusiastically interacting, which is the essence of a Software/Data Carpentry workshop. Another thing I really appreciated with this group of people was the constant feedback. We had comments during the lesson, breaks, and after the workshop. This only showed how involved everyone was and how much they were enjoying the new learning. The first day, we did not finish just because the workshop did. Although we gave them voluntary exercises for the next day, we also invited them for the first COMBINE meetup in Darwin! We had some nibbles and refreshments from COMBINE and Sam/Beachfront. It ended up being a very bonding experience - I got to ask about their projects and about Darwin. I discovered that many people actually do not come from there, that there are crocs everywhere, and that we were in the best season weather of the year! At the end everyone was happy to say: “see you guys tomorrow!” The second day, we did some plotting and it was nice to hear all those “WOWs”, “That’s beautiful”, “Ahhhhs” and so on, and so on. I think, and from the feedback I can say that everyone had a good time plotting and finished with a thirst for more! Our last part of the workshop involved interacting with Git. We had some trouble connecting Git with RStudio, although everyone had it installed. Windows laptops were very troublesome - at the end probably about 4 people were not able to link these two due to administrative restrictions on their laptops. Finally, we showed them how to use Git from the command line and from RStudio. Also we managed to produce the first version of the R scripts they wrote during the workshop. We tried to uplift the motivation with a few individual examples and then shortly afterwards, we wrapped up the workshop. I loved that crowd and I will be happy to return (wink), also I have unmet expectations of visiting Kakadu (wink, wink). ##Acknowledgements## Funding from the Australasian Genomic Technologies Association (AGTA) through a COMBINE workshop development scheme. To COMBINE former and current presidents: Harriet Dashnow and Jane Hawkey for setting this up, and Westa Domanova as co-instructor in this workshop. To Menzies School organiser and helper [Amar Aziz] (https://twitter.com/s_lump). To everyone who helped with organising locally - Steve, Jess, Jess, Linda, Erin, Zuli and Sam. To all awesome attendees, you made this a real success!!! ##Highlights## This workshop was included as part of the Charles Darwin University Research Enhancement Program (REP), from which PhD students benefit - they need to attend a number of these workshops during their candidature. The first day of the workshop May 12, we had the first Darwin COMBINE meetup, where all attendees were welcomed plus all other interested in Bioinformatics/Computational Biology. It was a great night, with high chances to be repeated. #### I hope you enjoyed this experience as much as I did. Read More ›

First Analysis of Instructor Training Data
Greg Wilson / 2016-05-20
Following up on Wednesday’s post about instructor training stats, Erin Becker (Data Carpentry’s new Associate Director) has posted an analysis. I was very surprised to discover that less than 20% of people trained over a year ago haven’t taught yet: I believed the number to be much higher. 51% of those trained in the last 12 months haven’t taught yet, but that’s less surprising, since in many cases there simply hasn’t been time. Overall, we seem to be doing pretty well… Read More ›

Looking for a Model
Greg Wilson / 2016-05-18
Updated: this CSV file has information on who taught when. The three columns are the person's unique identifier, the date on which they first qualified, and the dates on which they taught. (If someone has taught multiple times, there is one record for each teaching event.) People who haven't taught at all are at the bottom with empty values in the third column. Erin Becker's analysis of this data is posted on the Data Carpentry blog and discussed here. We rebooted instructor training in October 2015, and things have been going pretty well since then. If we average over all 23 new-style classes, it looks like two thirds of people who take part actually qualify as instructors within four months of finishing the class: Date Site(s) Days Since Participants Completed Percentage Cum. Participants Cum. Completed Cum. %age 2015-10-15 online 170 48 30 62.5% 48 30 62.5% 2015-12-07 Paris 162 7 7 100.0% 55 37 67.2% 2015-12-07 Potsdam 162 5 5 100.0% 60 42 70.0% 2015-12-07 Thessaloniki 162 4 4 100.0% 64 46 71.8% 2015-12-07 Arlington 162 10 4 40.0% 74 50 67.5% 2015-12-07 Vancouver 162 5 4 80.0% 79 54 68.3% 2015-12-07 Wisconsin 162 7 5 71.4% 86 59 68.6% 2015-12-07 Australia 162 3 2 66.6% 89 61 68.5% 2015-12-07 Curitiba 162 3 3 100.0% 92 64 69.5% 2015-12-07 Toronto 162 14 12 85.7% 106 76 71.7% 2016-01-05 Oklahoma 133 19 5 26.3% 125 81 64.8% 2016-01-13 Lausanne 125 20 16 80.0% 145 97 66.9% 2016-01-18 Brisbane 120 20 14 70.0% 165 111 67.2% 2016-01-21 Melbourne 117 27 6 22.2% 192 117 60.9% 2016-01-21 Florida 117 25 8 32.0% 217 125 57.6% 2016-01-28 Auckland 111 20 7 35.0% 237 132 55.7% 2016-02-16 Online 91 26 8 30.7% 263 140 53.2% 2016-02-22 UC Davis 85 23 9 39.1% 286 149 52.1% 2016-03-09 U Washington 69 14 2 14.2% 300 151 50.3% 2016-04-13 online 34 33 1 3.0% 333 152 45.6% 2016-04-17 North West U 31 23 0 0.0% 356 152 42.7% 2016-05-04 Edinburgh 13 15 0 0.0% 371 152 40.9% 2016-05-11 Toronto 6 27 0 0.0% 398 152 38.1% One of our goals for this year is to lower the majority completion time from four months to three; another is to increase the throughput from two thirds to three quarters. What I'd really like, though, is some help figuring out what statistical model to use for the other important aspect of our training and mentoring: how many of the people we train go on to actually teach workshops, and how quickly. The data we have includes the following for each person: unique personal identifier (we can easy anonymize individuals) date(s) of the instructor training courses they took (someone may enroll, drop out, enroll again, and so on) date(s) on which they were certified (they may have qualified for Software Carpentry and Data Carpentry at different times) the date on which they taught their first workshop (if any) "Mean time to teach first workshop" isn't a good metric, since roughly 1/3 of the people we've trained haven't taught yet. Should we use an inverted half-life measure, i.e., how long until the odds of someone having taught hit 50%? Or would something else give us more insight? Whatever we choose needs to be robust in the face of a big spike in our data in January 2016, when we retroactively certified a big batch of Data Carpentry instructors. If you have suggestions, comments on this post would be very welcome. Read More ›

A Common Sense Review of a Software Carpentry Workshop
Christopher Lortie / 2016-05-18
Re-posted with permission from the author's blog. Rationale This Fall, I am teaching graduate-level biostatistics. I have not had the good fortune of teaching many graduate-level offerings, and I am really excited to do so. A team of top-notch big data scientists are hosted at NCEAS. They have recently formed a really exciting collaborative-learning collective entitled ecodatascience. I was also aware of the mission of Software Carpentry but had not reviewed the materials. The ecodatascience collective recently hosted a carpentry workshop, and I attended. I am a parent and use common sense media as a tool to decide on appropriate content. As a tribute to that tool and the efforts of the ecodatascience instructors, here is a brief common sense review. ecodatascience software carpentry workshop Spring 2016 What You Need to Know You need to know that the materials, approach, and teaching provided through software carpentry are a perfect example of contemporary, pragmatic, practice-what-you-teach instruction. Basic coding skills, common tools, workflows, and the culture of open science were clearly communicated throughout the two days of instruction and discussion, and this is a clear 5/5 rating. Contemporary ecology should be collaborative, transparent, and reproducible. It is not always easy to embody this. The use of GitHub and RStudio facilitated a very clear signal of collaboration and documented workflows. All instructors were positive role models, and both men and women participated in direct instruction and facilitation on both days. This is also a perfect rating. Contemporary ecology is not about fixed scientific products nor an elite, limited-diversity set of participants within the scientific process. This workshop was a refreshing look at how teaching and collaboration have changed. There were also no slide decks. Instructors worked directly from RStudio, GitHub Desktop app, the web, and gh-pages pushed to the browser. It worked perfectly. I think this would be an ideal approach to teaching biostatistics. Statistics are not the same as data wrangling or coding. However, data science (wrangling & manipulation, workflows, meta-data, open data, & collaborative analysis tools) should be clearly explained and differentiated from statistical analyses in every statistics course and at least primer level instruction provided in data science. I have witnessed significant confusion from established, senior scientists on the difference between data science/management and statistics, and it is thus critical that we communicate to students the importance and relationship between both now if we want to promote data literacy within society. There was no sex, drinking, or violence during the course :). Language was an appropriate mix of technical and colloquial so I gave it a positive rating, i.e. I view 1 star as positive as you want some colloquial but not too much in teaching precise data science or statistics. Finally, I rated consumerism at 3/5, and I view this an excellent rating. The instructors did not overstate the value of these open science tools – but they could have and I wanted them to! It would be fantastic to encourage everyone to adopt these tools, but I recognize the challenges to making them work in all contexts including teaching at the undergraduate or even graduate level in some scientific domains. Bottom line for me – no slide decks for biostats course, I will use GitHub and push content out, and I will share repo with students. We will spend one third of the course on data science and how this connects to statistics, one third on connecting data to basic analyses and documented workflows, and the final component will include several advanced statistical analyses that the graduate students identify as critical to their respective thesis research projects. I would strongly recommend that you attend a workshop model similar to the work of Software Carpentry and the ecodatascience collective. I think the best learning happens in these contexts. The more closely that advanced, smaller courses emulate the workshop model, the more likely that students will engage in active research similarly. I am also keen to start one of these collectives within my department, but I suspect that it is better lead by more junior scientists. Net rating of workshop is 5 stars. Age at 14+ (kind of a joke), but it is a proxy for competency needed. This workshop model is best pitched to those that can follow and read instructions well and are comfortable with a little drift in being lead through steps without a simplified slide deck. Read More ›

R Instructor Training Applications Open
Greg Wilson, Laurent Gatto / 2016-05-16
Thanks to generous sponsorship from the R Consortium, Software Carpentry is running a two-day R instructor training class in Cambridge, UK, on September 19-20, 2016. If you are active in the R and/or Software and Data Carpentry communities, and wish to take part in this training, please fill in this application form. We will select applicants, and notify everyone who applied, by June 30, 2016; those who are selected will be responsible for their own travel and accommodation. If you have any questions, please mail training@software-carpentry.org. Please note that as a condition of taking this training: You are required to abide by our code of conduct, which can be found at http://software-carpentry.org/conduct/. You must complete three short tasks after the course in order to complete certification. The tasks are described at https://carpentries.github.io/instructor-training/checkout/, and take a total of approximately 2 hours. You are expected to teach at a Software Carpentry or Data Carpentry workshop within 12 months of the course. For more information on Software and Data Carpentry instructor training, please see https://carpentries.github.io/instructor-training. Read More ›

Software Carpentry in Brisbane
Belinda Weaver / 2016-05-14
Brisbane Software Carpentry sputtered into life some time in 2014 when scattered local supporters finally met and began to form a group. Having heard about the initiative via Twitter, I managed to contact Nathan Watson-Haigh who got me on to the Aus/NZ mailing list. Then I met Philipp Bayer (now in Perth) and we started planning our first workshop at The University of Queensland. PyCon Australia was coming up in Brisbane and we were able to get Damien Irving from Melbourne and Tim McNamara from New Zealand to teach for us as they were in town for that. So we ran our first ever Software Carpentry workshop with Python at UQ in July 2014. Helpers included Nick Coghlan and Dan Callaghan from RedHat, and Kaitao Lai and Michal Lorenc from Dave Edwards’ bioinformatics group. In February 2015, five Brisbaneites – Amanda Miotto, Sam Hames, Areej Al-Sheikh, Mitch Stanton-Cook and Paula Andrea Martinez - went to Software Carpentry instructor training in Melbourne. Areej had been an attendee at the July 2014 workshop, along with Darya Vanichkina. Both were keen to train as instructors. Darya and I trained online with Greg Wilson during early 2015, so Brisbane was suddenly rich in instructors. Areej, Mitch, Paula, Sam and I ran a Python bootcamp in July 2015, and there was a second one in late September. Paula and Sam flew to Townsville in Queensland later that same week to run an inaugural R bootcamp there, and four attendees there registered interest in instructor training. We were lucky to have the services of Python wizard Caleb Hattingh at both the July and September workshops. Heidi Perrett and Kim Keogh from Griffith University helped at the September workshop, and liked it so much they decided to train as instructors. Aleksandra Pawlik ran instructor training for 20 people in Brisbane in January 2016, so our instructor pool has grown a little bigger, though not by 20. Many of the attendees were from other states (NSW, ACT) – and one, Selene Fernandez Valverde, has gone on to trailblaze Software Carpentry in Mexico. However, in addition to Heidi and Kim, we now have Leah Roberts, Nouri Ben Zakour and Anup Shah, though we will lose Nouri to a research job in Sydney soon (boo hoo). We have already said goodbye to Darya who has moved to Sydney for a job. Sam, Paula and I taught an R bootcamp at the Translational Research Institute late last year, and then ran concurrent Python and R bootcamps at the fabulously successful Brisbane Research Bazaar in February, where many of our new instructors got their first chance to teach. Amanda, Kim and I taught an R bootcamp in April for Queensland government staff in the Department of Science, IT and Innovation. A week later, the first ever Software Carpentry bootcamp was run at Griffith University. Our next scheduled workshop is the R bootcamp on 11-12 July, a tie-in for the UQ Winter School in Mathematical and Computational Biology. Areej Al-Sheikh taught Software Carpentry in Bali last year, and Mitch Stanton-Cook taught Software Carpentry in Nanning in China. Sam taught five workshops in his first year since certifying as an instructor, flying to Adelaide and other places to spread the word. Paula is off to Darwin next to run a Data Carpentry class, since she, Sam and I have all certified as Data Carpentry instructors as well. Tim Dettrick has been a great supporter for Brisbane Software Carpentry, letting us use his DIT4C cloud option for workshops. This gets around knotty installation problems which really helps when many attendees bring Windows laptops over which they have no admin rights. Hacky Hours, informal get togethers where people can brush up on skills, or follow up on questions after Software Carpentry workshops, are held weekly at both UQ and Griffith University. We hope to extend these to other universities soon, just as we plan to run Software Carpentry workshops at other Queensland universities. Workshops are a fertile recruiting ground for new instructors and helpers. Four people expressed interest in instructor training after the July 2015 workshop, while six wanted to train after September’s workshop. The Brisbane Software Carpentry community welcomes newcomers. Please get in touch if you want to come to a workshop, or find out about upcoming events. Or come to a Hacky Hour. Our next get together will probably be a welcome for Steering Committee member Kate Hertweck, in town for a fortnight with Sugar Research Australia. Watch Twitter for details of her talk on 17 May. Read More ›

First bimonthly report from 2016 Steering Committee
Raniere Silva / 2016-05-11
On May 8th and 9th the Steering Committee had an in-person meeting at Cold Spring Harbour Laboratory to conclude the discussions during the first two months and make plans for the next ten months of activities. In the following weeks we will write more about the results of the in-person meeting on the blog but for now we want to give you an overview. Partnerships Our Executive Director is doing an amazing job increasing the number of Software Carpentry partners. Last year, we were asked to provide a join partnership agreement with Data Carpentry and after months of work this agreement is now available. Instructor Training Last year we tried several models for Instructor Training that now are more stable and are available here thanks to the work of our Instructor Trainers. We will continue improving our Instructor Training programme and increasing our capacity to high demand. Subcommittees and Task Forces Last year we accomplished many things only because of our amazing community. This year we can help you, including financially, to shape Software Carpentry and for that we opened a call for subcommittees and task forces. Communication Communication is our greatest weakness and we are going to work really hard to improve it. (This post is part of this goal.) Something that we decided during the in-person meeting was provide an official space on Facebook because on some countries and some audiences this is the first place that they will look for us. We will also create an Instagram account to display and share photographs from Software Carpentry workshops and activities. Diversity We will continue to support a diverse community and healthy spaces for communication. We have a Code of Conduct to help with that. Future Directions We will continue to work close with other communities that are aligned with our vision and mission. Read More ›

First Conference of Research Software Engineers: Call for Participation
Simon Hettrick / 2016-05-10
The RSE Conference on September 15-16, 2016, in Manchester will be the first conference to focus exclusively on the issues that affect people who write and use software in research. We’re looking for submissions that will investigate and communicate ideas and expertise from the RSE community. This is not a standard academic conference! We welcome researchers, but we also want to hear from people who may not typically attend conferences. From running a workshop, sharing your ideas or simply attending, there are many ways in which you can participate. We want to hear from you about the new technologies and techniques that help you in your work. We want your opinions on what will make the conference even more useful. And, of course, we want you to attend! For more information, please see the full announcement. What is a Research Software Engineer? Are you employed to develop software for research? Are you spending more time developing software than conducting research? Are you employed as a postdoctoral researcher, even though you predominantly work on software development? Are you the person who does computers in your research group? Are you sometimes not named on research papers despite playing a fundamental part in developing the software used to create them? Do you lack the metrics needed to progress your academic career, like papers and conference presentations, despite having made a significant contribution through software? If you answered ‘yes’ to many of these questions, you may be an RSE. To learn more, visit http://www.rse.ac.uk/. Read More ›

24 April - 4 May, 2016: Subcommittees and Task Forces, Partnerships, Instructor Training, A Vacancy, Lab Meeting, Bug Barbeque, Discuss, and New Videos and a Book
Bianca Peterson, Anelda van der Walt / 2016-05-09
Highlights The 2016 Steering Committee invites you to help organize and develop resources and activities to support our community members by proposing new initiatives in the form of subcommittees and task forces. Full instructions on the proposal process can be found here. The Steering Committee will gladly help! Also check out the subcommittee listing for information on existing and past subcommittees. Joint partnerships are now offered by Data Carpentry and Software Carpentry! Read the partnership information on how to become a partner and get in touch. Vacancy Data Carpentry is hiring a Deputy Director of Assessment. Visit the Data Carpentry jobs page for a full job description and application procedure. Events Tune in to the Software Carpentry Lab Meeting on May 10th to discuss what’s new and upcoming in the community. Check out the Lab Meeting Etherpad for the schedule and connection details. Join the worldwide Software Carpentry Bug Barbeque on June 13 to help fix bugs in Version 5.4 of Software Carpentry lessons before publication. All contributors will receive credit for your hard work with a citable object. Conversations A recent conversation on the discuss mailing list prompted the creation of two new issues open for discussion in GitHub: How should we deal with high-volume discussions on the mailing list? Is the Code of Conduct saying what the community want/need it to say and how should it be enforced? Videos and Books Greg Wilson published a video of “things not to do while teaching”. The material is great to use during instructor training to demonstrate to instructor trainees what we’re trying to avoid when teaching and is based on suggestions made in this discussion on GitHub. Support Data Carpentry by buying the book How to Be a Modern Scientist written by Jeff Leek. The book includes guides for reviewing papers, reading papers, career planning, and giving talks. Iñigo Aldazabal Mensa (CSIC-UPV/EHU) provided a video of him explaining Software Carpentry in great detail at the 2016 HPC Knowledge Portal Conference - Software Carpentry: teaching computing skills to researchers. Other Did you recently participate in a Software Carpentry workshop and had questions that weren’t answered by the lessons? Do you teach workshops and hear questions from participants that can’t be addressed by the existing materials? Please add these questions to this post. What have we learned from our lesson discussion sessions so far? We welcome all new instructors to provide feedback about their experience of the sessions by commenting on the post. Aleksandra Pawlik recently blogged about her experience when she visited South Africa for the first face-to-face Software and Data Carpentry Instructor Training in Africa. Planning on running a Software Carpentry workshop with R? Read how three sticky notes and chocolates helped participants in the Sotware Carpentry workshop at Griffith University. Staff members from iDigBio gave their perspective on attending a Software Carpentry workshop. They encourage all researchers who’d like to work with the iDigBio datasets to particpipate in a workshop near them or request one at their own institution. 16 workshops were run over the past 16 days. For more information about past workshops, please visit our website. Upcoming Workhshops May: University of Toronto, Bancroft Building - University of Toronto, Charles Darwin University, University of British Columbia, University of Cambridge, CSDMS Annual Meeting, Royal Holloway - University of London, SciPy Latin America, UBC Koerner Library, University of British Columbia, National Institute of Standards and Technology, Oklahoma State University, Universidade Estadual de Campinas, McMaster University - Kenneth Taylor Hall (KTH) B121, Colorado Special Libraries Association @ CU Boulder, Lund University, National Institutes of Health, National Institute of Standards and Technology, South Florida Water Management District, University of Toronto, Centro de Competência em Software Livre June: University of Wisconsin - Madison, University of Puerto Rico Mayagüez, Université Bishop’s, University of Cincinnati, NeSI-Massey Albany Software Carpentry, LANGEBIO-Cinvestav, University of Wisconsin - Madison, SIB @ University of Lausanne, University Library Basel, Materials Physics Center - University of the Basque Country, NERC / The University of Leeds, NERC / The University of Leeds July: R workshop - The University of Queensland, Philippines October: UC San Diego Read More ›

Our Code of Conduct
Jonah Duckles, Tracy Teal / 2016-05-09
The amazing Software and Data Carpentry community of instructors and learners is the foundation of our organizations. We have more than 500 instructors from 30 countries and have had over 20,000 learners in our workshops. Software and Data Carpentry are community driven organizations. We value the involvement of everyone in this community - learners, instructors, hosts, developers, steering committee members, and staff. We are committed to creating a friendly and respectful place for learning, teaching and contributing. All participants in Software and Data Carpentry events or communications are expected to show respect and courtesy to others. Core to our organizations is creating a friendly and welcoming community. Therefore, we would like to reiterate that anyone participating in Software and Data Carpentry activities must comply with our Code of Conduct. This code of conduct applies to all spaces managed by Software and Data Carpentry, including, but not limited to, workshops, email lists, and online forums. We are so fortunate to have such a strong and supportive community of contributors, instructors, and learners and we are committed to supporting and maintaining that community! Read More ›

Software Carpentry with R at Griffith University
Amanda Miotto / 2016-05-04
We had 27 people register to attend, with 22 attending the first day. We offered these workshops free of charge so we had been cautious about drop-out numbers. I had 2 who had previously mentioned they were only coming to partial sessions and one dropped out the next day. We accepted everyone who registered and attempted the software install (our capacity was 30). The attendees were quite interactive and attentive and seemed to pick it up quite well - even by the end of the second day. We had one or two hurdles - mostly in the way of French keyboards which turns out are quite tricky to code on. Two students had firewall issues with eduroam but had been quite prepared and downloaded all the programs and data so it didn’t slow them down until the Git tutorial. We trialled a few new things this workshop: Firstly, we used three sticky notes instead of two - ‘I’m going okay’, ‘please go slower’, ‘help’. We had the thought afterwards that we should have gone green, orange and red for these respectively (similar to traffic lights). Secondly, we bribed them with chocolates. We used Freddo frogs as prizes for those who got answers correct or had really good questions. This was extremely popular. Towards the afternoons as well we were quite generous with the chocolates in the thought that a sugar kick was probably quite welcome. Thirdly, we ended up skipping the etherpad and just including all links and links to data in the workshop page. I included all direct data links for our pages and how to unpack the zips for the data, and the direct link for the lessons for each section. I also included the link to the Resbaz cloud that we used as a back up. That way, everything was in the one place that was going to stay online for a while. Fourthly, I created a registration page via Google Docs that asked them what class they were attending, whether they were a staff/student or had eduroam and what operating system they were using, then presented them with customised install instructions. These custom instructions were then emailed to them along with a confirmation, location for the class, pre-workshop survey and the github information page. This gave us a final number of attendees and helped us gauge how people went with the initial install, and made the start of class a bit speedier. I’m more than happy for anyone to steal the code for this - It’s just done up in a Google form. As a follow-up, the attendees were emailed with the post-workshop survey and links to Hacky Hours and the HPC/data storage services (as these seem to be often the high priority with those researchers who are learning to code). Some quick fun stats: 2 attendees were UQ, the rest from Griffith Uni 57% hadn’t attempted to code in R before the workshop 71% hadn’t attempted to code in Bash before the workshop 89% hadn’t attempted to code in Git before the workshop Read More ›

Software Carpentry Bug BBQ
Bill Mills, Tiffany Timbers / 2016-05-04
Software Carpentry is having a Bug BBQ on June 13th Software Carpentry is aiming to ship a new version (5.4) of the Software Carpentry lessons by the end of June. To help get us over the finish line we are having a Bug BBQ on June 13th to squash as many bugs as we can before we publish the lessons. The June 13th Bug BBQ is also an opportunity for you to engage with our world-wide community. For more info about the event, read-on and visit our Bug BBQ website. How can you participate? We’re asking you, members of the Software Carpentry community, to spend a few hours on June 13th to wrap up outstanding tasks to improve the lessons. Ahead of the event, the lesson maintainers will be creating milestones to identify all the issues and pull requests that need to be resolved we wrap up version 5.4. In addition to specific fixes laid out in the milestones, we also need help to proofread and bugtest the lessons. Where will this be? Join in from where you are: No need to go anywhere - if you’d like to participate remotely, start by having a look at the milestones on the website to see what tasks are still open, and send a pull request with your ideas to the corresponding repo. If you’d like to get together with other people working on these lessons live, we have created this map for live sites that are being organized. And if there’s no site listed near you, organize one yourself and let us know you are doing that here so that we can add your site to the map! The Bug BBQ is going to be a great chance to get the community together, get our latest lessons over the finish line, and wrap up a product that gives you and all our contributors credit for your hard work with a citable object - we will be minting a DOI for this on publication. Read More ›

Save the Date: Software Carpentry Lab Meeting May 10
Bill Mills / 2016-05-03
Software Carpentry’s next Lab Meeting calls will be at 14:00 UTC and 23:00 UTC, May 10th; all are welcome to join in to discuss what’s new and upcoming in the Software Carpentry community. Connection details are on the Lab Meeting Etherpad. Our first feature for the month is the new Subcommittee and Task Force program, recently announced by the 2016 Steering Committee. Following on the success of our subcommittees in 2015, the Steering Committee has decided to open up the process of proposing and running projects and committees that advance Software Carpentry’s mission and enrich our community to all of our community members. More details on the proposal process can be found here; tune into the Lab Meeting to here more comments and ask questions from the program designers. Also this month, we’re excited to announce Software Carpentry’s first ever Bug Barbeque, coming up worldwide on June 13. It’s almost time to publish Version 5.4 of all the Software Carpentry lessons listed here, and we need your help to polish up all the details. We’ll be getting together or working remotely worldwide on June 13 to get ready for publication; after version 5.4 is finished, we will be minting a DOI so that all contributors get a citable reference to add to their CVs. Watch this space for a blog post coming shortly, or see more details on the Bug BBQ Website. Check out the schedule on the etherpad and add your name if you will be attending. Feel free to add points of interest and goings-on you’d like to mention to the community under the non-verbal updates section there, too. We hope you’ll join us! Read More ›

New Joint Partnerships with Data Carpentry
Tracy Teal, Jonah Duckles / 2016-05-02
We’ve been hearing of the interest of organizations to build local capacity for training and to be able run both Data and Software Carpentry workshops. We are excited to announce that Data Carpentry and Software Carpentry are now offering joint partnerships! These partnerships will give member organizations the benefits of running workshops from either the Software Carpentry or Data Carpentry community. At the Silver and above tiers there will also be instructor training and capacity building services provided. Partnership Information There are four tiers of Partnerships: Bronze, Silver, Gold and Platinum. We wanted to provide opportunities for organizations to run multiple workshops, but who aren’t currently planning to train instructors (Bronze) and help organizations build local capacity with instructor training, coordinated workshops and Self-Organised workshops (Silver, Gold). There is also a flexible tier for organizations who are advancing beyond just capacity building and on to sustainment and wide adoption of our methods across disciplines (Platinum). In all Partnerships, some coordinated workshops are included, so that organizations are freely able (with a small travel budget) to bring in outside instructors to help mentor new instructors and continue to encourage cross-connections of instructors across organizational boundaries. Also, all Partner organizations can run as many Self-Organised workshops as they like. All currently in-place partnerships with the Software Carpentry Foundation will be grandfathered into a joint partnership consistent with their current contract until the current partnership expires, at which time they can select to have a joint partnership or a standalone Software Carpentry or Data Carpentry partnership as they choose. Interested in a partnership or want more information, please get in touch! Read More ›

Buy This Book and Support Data Carpentry
Jeff Leek / 2016-04-29
Thanks to the efforts of Len Epp at Leanpub and Tracy Teal at Data Carpentry, 50% of the royalties from the book How to Be a Modern Scientist will be donated to Data Carpentry. The book is pay what you want with a suggested price of $10. I am very excited to help support the efforts of Data Carpentry since I believe data science education is a fundamental need in the modern scientific era. About the book: The face of academia is changing. It is no longer sufficient to just publish or perish; we are now in an era where Twitter, Github, Figshare, and Alt Metrics are regular parts of the scientific workflow. Here I give high level advice about which tools to use, how to use them, and what to look out for. This book is appropriate for scientists at all levels who want to stay on top of the current technological developments affecting modern scientific careers. The book is based in part on the author’s popular guides including guides for Reviewing papers Reading papers Career planning Giving talks The book is probably most suited to graduate students and postdocs in the sciences, but may be of interest to others who want to adapt their scientific process to use modern tools. About the author: Jeff Leek is an Associate Professor of Biostatistics at the Johns Hopkins Bloomberg School of Public Health. He is a co-founder and co-director of the Johns Hopkins Data Science Specialization on Coursera that has enrolled over 3 million aspiring data scientists. His research has helped contributed to our understanding of the genomic basis of brain development, blunt force trauma, and cancer. He is blogs at Simply Statistics and can be found on Twitter at @jtleek and @simplystats. Read More ›

Summarizing Our Lesson Discussion Sessions
Greg Wilson / 2016-04-29
For the first four months of this year, we ran hour-long lesson discussion sessions to give people going through instructor training a chance to ask questions of people who had taught our material several times. Trainees told us they were useful for getting information and as a way to meet more of the community. We have now decided to merge those sessions with our weekly workshop debriefing sessions in order to turn up the volume on both aspects, so this seems like a good time to summarize what we’ve learned so far ourselves. It was helpful to frame the session as a confidence building exercise rather than a test. Equally, conveying enthusiasm and answering basic questions seemed really useful, which has us asking yet again if we should record some model workshops. Aleksandra Pawlik’s guidelines were very helpful, and have now been folded into the instructor training course’s guidelines. Organizing things through an Etherpad is easy but error-prone. We are responding to this by building support for the instructor training process into AMY. The breadth of questions asked was challenging. Some people had many very specific points about certain challenge questions or pieces of demo code, while others wanted more general information about teaching, and it was sometimes difficult to satisfy both sets of needs in a single session. (Ironically, many participants’ primary concern was how to handle workshops when learners have diverse backgrounds.) The intermittent demand meant these sessions often weren’t an efficient use of instructors’ time. In particular, people often hosted sessions with only one or two people, or tried to host a session only to have scheduling issues with the one attendee who needed a host. We’re addressing this by setting a regular weekly time, which in turn makes it feasible for us to have more than one host. Despite these problems, some attendees were very happy to have one-on-one attention where they felt comfortable asking questions they feared others might find too basic. It was good to know that these new instructors weren’t falling through the cracks. Some instructors wound up fielding questions about lessons they had never taught themselves, or even ones they had never seen taught. When this happened, they tempered their advice with comments along the lines of, “You’ll come to own the lesson after you’ve taught it a couple of times.” Some instructors found that people have a somewhat passive attitute to the material, i.e., they see themselves as a vehicle for delivering set material rather than as innovators and interpreters. Most importantly, some trainees had low awareness of what actually goes on in a workshop. For example, people asked questions like whether or not people would have their own laptops, whether they were supposed to ask the challenge problems in class, and so on. This was surprising, given that most trainees are now workshop alumni, but clearly signals that we need to spend more time covering nuts and bolts in the training course. Timezones make events difficult to schedule, while daylight savings time may be the most egregiously stupid idea our species has ever implemented (and I say this as a professional programmer). My thanks to Kate Hertweck, Bill Mills, Neal Davis, Naupaka Zimmerman, Sue McClatchy, Harriet Dashnow, April Wright, Karin Lagesen, and everyone else who helped make these sessions possible. Comments from new instructors who participated in these sessions would be very welcome. Read More ›

Data Carpentry is Hiring a Deputy Director of Assessment
Greg Wilson / 2016-04-24
Data Carpentry is hiring! Data Carpentry seeks to hire a full-time staff member to direct its assessment activities. This person will design, implement, monitor, analyse, and report on a comprehensive system of metrics to help the Data Carpentry project and its sibling organization, Software Carpentry, evaluate the impact and effectiveness of the training they offer, to both learners and instructors. As the Deputy Director of Assessment, you will have primary responsibility for developing methods and standards for the evaluation of all aspects of Data Carpentry’s training including relevance of curriculum, learning experience, long-term adoption of tools and skills and impact on productivity and reproducibility. You will also work with the Software Carpentry Foundation to build evaluation of the instructor development program, including effectiveness of instructor training and mentorship and the longer term impacts of instructor training on career development for instructors. You will also have the opportunity to collaborate with the training coordinators from related organizations to coordinate strategies and initiatives. For details, including a full job description and the application procedure, please see the Data Carpentry jobs page. Read More ›

Call for Software Carpentry Foundation Subcommittees and Task Forces
The 2016 Steering Committee / 2016-04-24
Purpose and Rationale Software Carpentry is volunteer-powered organization that is rapidly growing. One of the greatest things about Software Carpentry is the huge diversity of people that have come together to collaborate, teach, and share ideas and resources. Your contributions to the community have made Software Carpentry a huge success! To meet the needs of our growing community, the 2015 Steering Committee Launched the first Subcommittee program to allow groups of members to organize and develop resources and activities that support our members. The 2016 Steering Committee would like to encourage and invite the community to propose new initiatives in the form of subcommittees (a standing group for ongoing activities) and task forces (an ad hoc group focused on a finite task). Read on to learn more about existing initiatives and for information about how to propose a new initiatives that will shape our community. Current Subcommittees and Task Forces For information on existing and past subcommittees, please visit the subcommittee listing. How to Propose a New Subcommittee or Task Force Do you have an idea for an activity or service that would make Software Carpentry even more awesome to participate in for its instructors or students? Is there a project, tool or document that would benefit you and your community? We are very excited to hear and support your ideas! However, we do need to read and review all proposals. To propose a new subcommittee or task force, please read the information here. The key elements of a proposal that the Steering Committee will be looking for (and which we will help you with!) are: Specificity: do you have a specific goal, with a plan and timeline to achieve it? Nothing has to be cast in stone at this point, but the more specific your plan, the easier it will be for you to put it into action. Resources: have you anticipated what you’ll need, in terms of personnel, tools, external support and budget? Modest budgets will be available to support Subcommittee and Task Force activities, to be approved on a per-item basis by the Steering Committee; we will be looking for realistic (but non-binding) predictions of what you’ll need to succeed. Community Fit: is there clear vision for how your project will advance Software Carpentry’s mission, or enrich our community? The Steering Committee will be coordinating all these groups by making sure we all work together effectively; understanding how your project fits into the bigger picture is an important first step. More details are in the full instructions; if you have any questions or anything is unclear, don’t hesitate to ask! The most important step is to let us know your ideas, so after reading the link above, open an issue and let us know what you’re thinking, even if you don’t yet know how to answer all the points above; the Steering Committee will happily help you craft a great proposal. Concluding Remarks The Software Carpentry community is full of tremendous people (that’s you!) doing great things; the Steering Committee hopes that our new, more open-ended Subcommittee and Task Force program will amplify your awesome work even further. We can’t wait to hear from you! Read More ›

Questions, Answers, and Lessons
Greg Wilson / 2016-04-24
While working on an outline of a new lesson on Python, I began thinking about the overall coherence of what we teach. In particular, I started to worry that we might be teaching some things because we teach them, i.e., that the curriculum might lose its connection to researchers' actual issues. One method for keeping things grounded in the other field I still occasionally work in (empirical software engineering) is called Goal, Question, Metric. As the name suggests, it defines three questions: what are you trying achieve, what questions do you need answered in order to achieve it, and what metrics will you accept as answers to those questions. An educational equivalent is Question, Answer, Lesson: what questions do novices have, what answers do competent practitioners give them, and what lessons are needed to teach those answers. (The "do novices have" modifier is crucial: in order for our workshops to be appealing, they must answer the questions that novices actually have, not the ones we wish they would ask.) Here's what I've come up with so far: Questions Answers Lessons How can I choose what tool to use? How can I get help/fix this? How can I get started? How can I work in a team? How can I make my software more useful? How can I get my software to do more? How can I make my work reproducible? How can I get the right answer? How can I understand the project I've inherited? Automate tasks and analyses. Avoid duplication. Be welcoming. Choose the right visualization. Program defensively. Document intention not implementation. Use the experimental method. Modularize software. Normalize data. Be open by default. Organize projects consistently. Do pre-commit reviews. Publish software and data. Reduce, re-use, recycle. Create re-runnable tests. Search the web. Store raw data as it arrived. Tune programs. Understand data formats. Understand error messages. Understand how programs run. Use checklists and to-do lists. Use configuration files. Use more hardware. Use version control. Collaboration Data Management Make Managing Software Performance Programming Authoring and Publishing Quality Assurance Unix Shell Version Control Visualization But by themselves, these three lists aren't very useful. What really matters is the connections between them: which answers address which questions, and which lessons teach the ideas used in those answers? The obvious way to represent this is as a graph, since both relationships are many-to-many. So far, though, I haven't produced anything better than this: (You can click on the image to see the full thing, or look here for the GraphViz source: run dot -Tsvg design-01.gv > design-01.svg to regenerate the SVG. Note that I've added a fourth column to the graph to show the half-day modules within each lesson, primarily to give a sense of how much time would be devoted to what.) Drawing up these lists has already helped me figure out what we might teach in a two-week Carpentry-style class (a long-standing dream of mine), but: I'm pretty sure these still aren't the questions novices actually have, and as presently drawn, the graph is unreadable. The first is more important right now than the second, so I would be grateful for feedback to go with that I've already received from Jackie Kazil, Noam Ross, Karen Cranston, and Andromeda Yelton. Please add comments to this post about which questions you'd add, delete, or change, and what you think the answers should be. Read More ›

Welcome to Google Summer of Code Students
Raniere Silva / 2016-04-23
Thanks to a lot of hard work from some members of our community and members of others open source scientific projects we are happy to welcome the students and mentors that will participate on Google Summer of Code this year under NumFOCUS umbrela. Student Project Mentors Bhargav Srinivasa Dynamic Topic Models for Gensim Lev Konstantinovskiy, Radim Rehurek and Devasena Inupakutika Chris Medrela Manage workflow for Software Carpentry and Data Carpentry instructor training Greg Wilson and Piotr Banaszkiewicz Akash Goel Upgrade to datapackage.json standard for EcoData Retriever Henry Senyondo and Ethan White Patrick Kofod Mogensen Improving the state of Optim.jl for JuliaOpt Miles Lubin Prerit Garg Result-aggregation server for the installation-test scripts for Software Carpentry and Data Carpentry Piotr Banaszkiewicz and Raniere Silva Ramchandran Muthukumar Presolve Routines for LP and SDP within Convex.jl for JuliaOpt Madeleine Udell Hannah Aizenman Categorical Axis for matplotlib Michael Droettboom and Thomas Caswell This is the second year that NumFOCUS participate on Google Summer of Code and they have more than the double of slots from last year. Selecting the students was very difficult due the high quality of the proposals that are archived on GitHub. As last year, we suggested that possible students use GitHub to communicate with possible mentors and use pull requests if they want to have feedback on their drafts. After the application deadline we talked with some students and they mention be afraid to work in the open with their proposal at the begin but they received valuables suggestions what is an reasonable trade-off. In terms of Software and Data Carpentry communities, Piotr Banaszkiewicz, our student last year, will be one of our mentors this year aside Ethan While, Henry Senyondo, Greg Wilson and I. We also welcome the students and mentors from some of our friends: Python Software Foundation, Open Astronomy, R Project, Julia, GitHub, BioJS, CERN and Mozilla. Read More ›

Instructor Training in South Africa
Aleksandra Pawlik / 2016-04-21
Thanks a lot of hard work from Anelda van der Walt and the support from the North-West University, University of Cape Town and Talarify we were able to run the first face-to-face instructor training in South Africa. We trained 23 new instructors from South Africa, Namibia, Zimbabwe and Kenya. The event was a great success and we are looking forward to expansion on Software and Data Carpentry workshops South Africa and other African countries. Anelda has been developing Software and Data Carpentry in South Africa for a while now and her efforts led to this first face-to-face Instructor Training. We started off on Sunday, 17th April . evening with a short ice-breaking session of lightning talks during which the participants talked about “The coolest thing about their job” for 2 minutes. Everyone had something interesting to say. What was even more impressive, nobody ran over time, which is not the usual case. On Monday we introduced the participants to the Mozilla Science Lab Study Groups. The goal of this session was to create foundations for collaborative, peer-to-peer environments in which researchers can share their knowledge and skills. All participants, split into groups of 4, came up with possible topics for their study groups. Then they discussed who they would like to see attending these sessions. (Photo: Wille du Plessis) After lunch on Monday we started off with the Instructor Training curriculum. Since the participants already had a chance to meet each other during the previous sessions, they felt very comfortable working on the interactive exercises, giving each other feedback and discussing the task outcomes. One of the most memorable moments was the conversation we had about the motivation and demotivation in educational setting. Many attendees openly shared their experiences from their school or study years. We also discussed students’ approaches to learning and the difficulties faced by the teachers who need to deal with demotivated and discouraged learners. The organisation was absolutely perfect. We were hosted by the North-West University where we had not only a nice venue allowing us to run interactive sessions but we also had very comfortable accommodation at the Sports Village. We closed the workshop on Wednesday lunchtime. During the last session, the participants practiced their demo teaching lessons. Then we discussed the next steps needed to complete become certified instructors. Everyone is very keen to become certified and run workshops at their home institutions. It would be fantastic to see the existing Software and Data Carpentry community provide mentorship and guidance support to these newly trained instructors. Many of them will be the first ones advocating for this type of training in their institutions. It will require persistence and they will need help from those of us who already have the experience in organising and teaching at workshops. So if you can and want help, please step in and fill in this form. Thank you! Read More ›

10 tips and tricks for instructing and teaching by means of live coding
Lex Nederbragt / 2016-04-20
One of the key teaching practices used at Software and Data Carpentry workshops is ‘live coding’: instructors don’t use slides, but work through the lesson material, typing in the code or instructions, with the workshop participants following along. Learning how to teach using live-coding is best done in practice, with feedback from peers (this is why it is included in instructor training). Nonetheless, this post lists ten tips and tricks to help you get started. This text (re)uses, and expands on, elements from the Software and Data Carpentry instructor training materials. 1. Be seen and heard If you are physically able to stand up for a couple of hours, do it while you are teaching. When you sit down, you are hiding yourself behind others for those sitting in the back rows. Make sure to notify the workshop organisers of your wish to stand up and, ask them to arrange a high table/standing desk or lectern. Regardless of whether you are standing or sitting, make sure to move around as much as reasonable. You can for example go to the screen to point something out, or draw something on the white/blackboard (see below). Moving around makes the teaching more lively, less monotonous. It draws the learners’ attention away from their screens, to you, which helps getting the point you are making across. Even though you may have a good voice and know how to use it well, it may be an idea to use a microphone, especially if the workshop room is equipped with one. Your voice will be less tired, and you increase the chance of people with hearing difficulties being able to follow the workshop. 2. Take it slow For every command you type, every word of code you write, every menu item or website button you click, say out loud what you are doing while you do it. Then point to the command and its output on the screen and go through it a second time. This not only slows you down, it allows learners who are following along to copy what you do, or to catch up, even when they are looking at their screen while doing it. If the output of your command or code makes what you just typed disappear from view, scroll back up so learners can see it again - this is especially needed for the Unix shell lesson. Other possibilities are to execute the same command a second time, or copy and paste the last command(s) into the workshop Etherpad. 3. Mirror your learner’s environment as much as possible You may have set up your environment to your liking, with a very simple or rather fancy Unix prompt, colour schemes for your development environment, keyboard shortcuts etc. Your learners usually won’t have all of this. Try to create an environment that mirrors what your learners have, and avoid using keyboard shortcuts. Some instructors create a separate ‘bare-bone’ user (login) account on their laptop, or a separate ‘teaching-only’ account on the service being taught (e.g. Github). 4. Use the screen - or screens - wisely Use a big font, and maximise the window. A black font on a white background works better than a light font on a dark background. When the bottom of the projector screen is at the same height, or below, the heads of the learners, people in the back won’t be able to see the lower parts. Draw up the bottom of your window(s) to compensate. If you can get a second screen, use it! It will usually require its own PC or laptop, so you may need to ask a helper to control it. You could use the second screen to show the Etherpad content, or the lesson material, or illustrations. Pay attention to the lighting (not too dark, no lights directly on/above the presenter’s screen) and if needed, reposition the tables so all learners can see the screen, and helpers can easily reach all learners. 5. Use illustrations, or even better, draw them Most lesson material comes with illustrations, and these may help learners to understand the stages of the lesson and to organise the material. What can work really well is when you as instructor generate the illustrations on the white/blackboard as you progress through the material. This allows you to build up diagrams, making them increasingly complex in parallel with the material you are teaching. It helps learners understand the material, makes for a more lively workshop (you’ll have to move between your laptop and the blackboard) and gathers the learners’ attention to you as well. 6. Avoid being disturbed Turn off any notifications you may use on your laptop, such as those from social media, email, etc. Seeing notifications flash by on the screen distracts you as well as the learners - and may even result in awkward situations when a message pops up you’d rather not have others see. 7. Stick to the lesson material The core Software and Data Carpentry lessons are developed collaboratively by many instructors and tried and tested at many workshops. This means they are very streamlined - which is great when you start teaching them for the first time. It may be tempting to deviate from the material because you would like to show a neat trick, or demonstrate some alternative way of doing something. Don’t do this, since there is a fair chance you’ll run into something unexpected that you then have to explain. If you really want to use something outside of the material, try it out thoroughly before the workshop: run through the lesson as you would during the actual teaching and test the effect of your modification. Some instructors use printouts of the lesson material during teaching. Others use a second device (tablet or laptop) when teaching, on which they can view their notes and the Etherpad session. This seems to be more reliable than displaying one virtual desktop while flipping back and forth to another. 8. Leave no learner behind Give each learner two sticky notes of different colours, e.g., red and green. These can be held up for voting, but their real use is as status flags. If someone has completed an exercise and wants it checked, they put the green sticky note on their laptop; if they run into a problem and need help, the put up the red one. This is better than having people raise their hands because: it’s more discreet (which means they’re more likely to actually do it), they can keep working while their flag is raised, and the instructor can quickly see from the front of the room what state the class is in. Sometimes a red sticky involves a technical problem that takes a bit more time to solve. To prevent this issue to slow down the whole class too much, you could use the occasion to take the small break you had planned to take a bit later, giving the helper(s) time to fix the problem. 9. Embrace mistakes No matter how well prepared you are, you will be making mistakes. Typo’s are hard to avoid, you may overlook something from the lesson instructions, etc. This is OK! It allows learners to see instructors’ mistakes and how to diagnose and correct them. Some mistakes are actually an opportunity to point something out, or reflect back on something covered earlier. Novices are going to spend most of their time making the same and other mistakes, but how to deal with the is left out of most textbooks. The typos are the pedagogy - Dana Brunson 10. Have fun Teaching is performance art and can be rather serious business. On the one hand, don’t let this scare you - it is much easier than performing Hamlet. You have an excellent script at your disposal, after all! On the other hand, it is OK to add an element of ‘play’, i.e. use humor and improvisation to liven up the workshop. How much you are able and willing to do this is really a matter of personality and taste - as well as experience. It becomes easier when you are more familiar with the material, allowing you to relax more. Choose your words and actions wisely, though. Remember that we want the learners to have a welcoming experience and a positive learning environment - a misplaced joke can ruin this in an instance. Start small, even just saying ‘that was fun’ after something worked well is a good start. Ask your co-instructors and helpers for feedback when you are unsure of the effect you behaviour has on the workshop. Teaching is theater not cinema - Neal Davis (Thanks to Neil Davis, Rayna Harris and Greg Wilson for feedback on an earlier version of this post) Read More ›

So You Want to Make a Screencast
Caleb Hattingh / 2016-04-13
I am the author of Learning Cython, a new series of videos teaching Cython. There are over 70 videos, making up around five and a half hours of content. If you’re involved in education, you may have considered making a screencast like these and I’d like to give a short overview of what that entails. Publisher You may be happy to self-produce and self-host, and I would be too except that I was approached by a publisher and made an offer to produce the videos for distribution on their site. How this happened is interesting: in 2015, on a whim, I decided to submit a Cython talk proposal for PyCon AU. My talk got accepted, and a few weeks before the conference the publisher contacted me and made the offer. You don’t need a publisher to make some videos, but I saw several advantages: Expertise: the publisher is likely to have experience with producing content, and is likely to be able to assist with quality control and direction when required. As it turns out, my publisher had a ready repository of “author training” material that I could refer to for a precise idea of what was expected in terms of layout, flow, production quality, and so on. Marketing: the publisher already has a distribution platform in place for making the content available. Without a platform, I’d have to do that work myself. Credibility: if the publisher is well-known and respected, and in particular has a reputation for producing quality content, then my own credibility would be enhanced by working with that publisher. Deadlines: having someone such as a publisher holding you to account for meeting deadlines is extremely valuable for making sure your content really does see the light of day. Having accepted the offer, I was immediately asked to provide a table of contents (TOC) and a timeline. In hindsight, the most important thing you can do in the entire process is give your TOC a great deal of thought and planning. The titles for my first half of videos were meticulously planned in the TOC, and the second half to a lesser extent, and I found that I had to spend much more time making the second half. Your ability to plan lessons upfront, before you start with content generation, is probably the best predictor of your time-efficiency; at least this was the case for me. The deadlines were arranged in three stages: 25%, 50% and 100%, as a proportion of the number of videos included in the TOC. If you self-produce your own screencast videos without the aid of a publisher, it would be a great idea to impose deadlines on yourself. My productivity during the week before each deadline was greater than any other period in the entire process. Equipment To do the actual recording, you’re going to need a few extra things. My publisher provided the screencast software Camtasia to record the videos. For my first video, Camtasia was new to me but by the time I completed the last one, I was an editing pro. The sooner you can get comfortable with your recording and editing software, the better. It is liberating to know that you can easily fix a small section of video, or change a small part of the audio very easily. More on that later. My publisher offered to send me a microphone, but I already had some audio equipment with professional-grade microphones set up so I used that. The publisher wanted to send me a headset microphone because with these, the distance between the mic and your mouth is always the same. The concern with stand-mounted microphones, like mine, was that authors have a tendency to move their head around while speaking and recording, which changes the distance between the mic and your mouth, and therefore the volume. I only had to re-record one section of a single video because of this, and I got better at it over time. Still, I’d recommend that you just use a headset microphone. And make sure you get a noise-cancelling or noise-resistant mic! The best thing about my microphone is that it was a stage mic (Shure SM58), and therefore intended for use in noisy environments. This meant that I could record videos even in a noisy place like my home during school holidays! My basic setup at home is with a Macbook Pro and a second, generic Samsung HD monitor. My publisher insisted on videos being 1280x720 (720p) resolution, insisting even that I had to record them that way. My dual-monitor setup made this easy, thankfully, and I set up my second Samsung LCD monitor at that resolution. It is also worth mentioning that I created a separate user account on my Macbook, exclusively for these videos. A separate account made it very easy to keep things isolated between my normal user account and the screencast account, and things like custom video resolution settings are included in that: I could log into my normal user account, and the second monitor goes to 1080p, but in my screencast user account the resolution changes back to 720p. If you make the changes manually and forget to set it before a video, it would require a re-record so it’s a fairly important setting to have automatically managed. In addition, the desktop wallpaper was set to middle gray (in accordance with the publisher’s recommendations), and all desktop icons were removed in the “screencast” user account. The time display, as well as most of the dock and taskbar icons were also disabled, primarily to reduce opportunities for distraction for the viewer. What do you show on the screen? The topic of my course is Cython and so what the viewer is going to be expecting to see is tons of Cython; however, what do we specifically want to show? My text editor? Slides? This is where good planning makes a world of difference to your authoring experience. When I started, I wasn’t sure, but after a few videos I settled on a presentation pattern that worked very well for the rest: First show simple/clean/colorful slides Then show how to use the info from the slides Slides, as a medium, allow you to hide irrelevant details, and this allows the viewer, and you, to focus on crucial points. I used this a great deal to explain complex topics. Video also allows the viewer to pause and rewind, which I hope contributes further to the elevated focus. Hopefully, when the viewer moves on from the slides to the how to use section, the emphasized details also carry over. Some videos did show me entering code into a text editor, and then running or compiling code in a terminal window, but the vast majority of teaching videos were made using the Jupyter Notebook which is not only an excellent, all-round development tool for Python and a huge and growing list of other languages, but it has very good support for Cython. For teaching Cython, the Notebook interface conceals the C/C++ compilation and command-line steps, as well as re-importing the changed binary object making it much, much easier to focus on the important information. If you’re going to be making screencasts for almost anything in the Python space, the Jupyter Notebook interface is hard to ignore. In addition to visual appeal, it also allows your viewers to run the code from your lessons in exactly the same way. So in terms of presentation of new information, those videos are made up of Slides. I used Keynote, a Powerpoint-like tool from Apple, the Jupyter Notebook, and some interaction with the Vim text editor and the Terminal (command-line) application. I heavily used clipboard copy/paste to move code from my text editor into the slides. My blog gives a brief description of the tricks I used to convert copied code into a Rich Text Format to apply basic syntax highlighting in the slides. It is low-level, and possibly too esoteric but the steps fit in well with all the other parts of my workflow. Once I had set up a keyboard shortcut to both copy and convert code text into a rich text onto my system clipboard, syntax highlighting in slides was never an issue. I found it very difficult to keep a shell-based interface clean in an on-screen sense: output from previous commands still shows above your prompt which is likely to be distracting to the viewer. It also makes it difficult to make surgical edits in the video afterwards, since there is no seamless cut point if you have to redo commands. I artificially created these by running clear after explaining the output of each command. This makes it easy to replace a section of video with something else, or remove it entirely. Lesson Preparation My brief required videos of around five minutes in length, and no fewer than three nor more than ten. If you’ve done any prepared speaking, for instance at a conference, you would be well aware that five minutes is an extremely short amount of time in which to convey a self-contained chunk of information. With complex information it can be even more restricting. It required a great deal of preparation to make sure that each lesson could work inside such small time limits; c.f. Blaise Pascal, Provincial Letters: I would have written a shorter letter but I did not have the time. I was compelled to make short videos and therefore, I discovered, I would have to put in quite a lot of time! My initial plan was to sketch out a few high-level bullet points for each video, and then discuss them with a light and easy, “conversational” and unrehearsed-sounding tone. This plan was so laughably inadequate that I failed to complete even the first video recording this way. I eventually found that my most efficient method was to write out the entire script for a video, word-for-word, in the same way as one might expect a movie script to be written. During recording, I would read from script, but with (hopefully!) liveliness and expression. If I practised the script a few times before the recording, it became that much easier to focus on the performance aspects, such as timing, pacing and emphasis. Indeed, merely practising the scripts helped to spot sections in need of correction or improvement. In the second half of videos, I would even begin to script scroll-actions, typed (on the keyboard) sections, and where to put the mouse, and so on. I found that every detail captured in the script allowed me to think less about that, and focus on the pacing, sound of my voice and so on. I don’t know how other presenters do it, but my script preparation was the most important (and time-saving) aspect of my preparation. Whether I avoided sounding overly rehearsed, or worse, lifeless, is something that viewers will have to judge, but I could not have recorded the videos any other way. As far as the technical aspects of preparation go, I am very comfortable with the material and even though I did have to research a few specific details that I had not yet had experience with, this was a minor headache. How Cython works was never a problem: the real headaches were all about how to explain pitfalls on different operating systems, or dealing with differences between Python versions, or how to create packages; basically, the same things that all beginners in Python find difficult. The most difficult balancing act was trying to decide what to mention, and what to leave out. I imagine this is the same for every lesson creator. By planning each video in great detail, it was a lot easier to maintain more of a “big picture” view for each lesson. Habits In my first few videos, my unthinking approach was similar to how one might approach giving a live talk. I would try very hard to speak correctly and clearly, and I would immediately be disappointed when I made a mistake, knowing that I would have to go back into the video afterwards and edit out or perhaps even re-record a section. This will sound very naive if you have experience in making screencasts, or recording in general, but I mention it for the benefit of other first-timers. For at least the first three videos I would sigh loudly every time I made a mistake! Once you’ve edited a few videos though, you immediately realise that there is zero cost to mistakes in recording, as long as you simply repeat without the error. If I stumbled at the middle of a sentence, I realised that I could simply restart the sentence and cut the error out later. Sometimes, in a particularly tricky sequence of words, I would get tongue-tied over and over again. You eventually learn to just stop speaking, compose yourself, take a few breaths, look out the window, check your notes, and then try again, all the while the video is still recording. The dead time doesn’t matter, and it takes less than a second to remove during video editing. All that matters is getting the best “take” that you can, but you can try over and over in a single recording, and then afterwards choose the best one during editing. Therefore, the first high-level habit is this: get used to making mistakes, and then keeping speaking through the same section over and over until you get it right, because it is trivially easy to remove the errors and the multiple attempts during editing. During recording of my last few videos, I was so used to it that upon making a mistake, I’d automatically wait a second (to make the cut during editing easier) and immediately repeat the sentence. This becomes quite automatic. I used this even to rephrase sentences, or toy with moving the emphasis to different parts of the same sentence, just to see what the effect might be like during editing when you get to pick the version you want to keep. Mouse manners In the author training videos supplied by my publisher, the management of the mouse cursor was strongly emphasized: it can be very distracting for viewers to see the mouse cursor jump from one position to another on the screen; a common occurrence if there are frequent edits and cut points. The advice given was to place something on your computer screen like a small piece of sticky tape, or similar, to use as a visual marker of where to place your mouse cursor when not in use. The idea is that if cuts or edits are required, the mouse cursor will always be in the same position on the screen, and so edits can be made seamless. In my situation I have two screens: one on my Macbook, and an external display. I found it much more convenient to simply move my cursor off the presentation screen entirely between actions or when not in use. When using the Jupyter Notebook, scrolling becomes an important part of moving through the content because of the document-based design of the notebook interface. My guiding principle for how to scroll was based on whether my actions might be distracting from the content; this principle was promoted heavily by the author training videos. All of my mouse movements or scroll actions were made in the best way to diminish distracting actions. I would even frequently announce that I would be scrolling down to the next cell before doing so. Sometimes, during editing, I would find that such announcements were unnecessary or were implied in the context of the surrounding speech. In these situations, I would edit them out. In others, however, it seemed to me a useful cue to the viewer that context was being changed in the video. I also learned to scroll slowly, knowing that I could easily speed up the scrolling later in editing if required. Time manipulation While on the subject of speeding up and slowing down, many things become possible in editing. Initially, I was nervous about my typing speed, but again, it is so easy, trivial really, to speed up bits of typing that it becomes a non-issue. Even typing mistakes are easily dealt with in editing. It is slightly more annoying to cut a section of video than it is to apply a speedup, so once I became experienced, I would concentrate on making sure I typed accurately rather than fast, and then during editing I would apply a 200-400% speedup to typed sections. This sounds very rapid, but as a viewer, watching normal typing speed can be painfully slow. In most situations I would use the video editing tools to speed things up; however, during editing I found some instances where a few extra seconds’ lingering on the output of an especially complex command was likely to be useful to viewers. In these situations, I added some dead time in the video; basically, a stop-and-think moment that I had failed to provide during the original recording. The point is that time management and pacing can easily be manipulated after recording. It is up to you or your editing team to make decisions about what to speed up and what to slow down. And once you realise this, it is quite liberating: it allows you to relax about those issues while recording. Just focus on your content and the key points you want to drive home. Take any pauses as you need, and feel free to repeat sections as often as you like: it all comes together during editing. Content The topic of my course is Cython. This is a tool that allows you to write code that looks very similar to, but is not exactly Python, but then allows you to compile that code into a native binary module. The primary use case of the tool is for speeding up Python code, although what is really happening behind the scenes is that you’re generating C code and compiling that with a C compiler. The main attraction of Cython as a technology is that you can re-use your Python knowledge while optimizing for speed. As you might imagine then, Cython is useful in a space where the person writing the code may not have had training with C/C++. From the point of view of developers with a background in computer science, this is very hard to understand, because training or at least exposure to C/C++ is nearly mandatory in computer science schools in most countries. However, there is a class of people who sometimes need to write code that do not have formal training in computer science: scientists and (non-electrical!) engineers. And ironically, it is this group that also frequently requires code to do number crunching at speed which is a feature that raw Python lacks. It is for this group that Cython is especially compelling: mostly the same, easy-to-understand Python syntax but with the speed of native numerical code. This is interesting from the point of view of someone who wants to teach this topic. What do you focus on? Is your time better spent on the code generation aspects of C/C++, i.e., explaining what your nearly-Python code gets converted into? Alternatively, is it better to focus on using Cython and focusing on applications from the Python side of things? Do you prefer How it works, or instead, How to use it, or some mixture of the two? For these questions, I relied on my experience working as a chemical engineer. Professionally, when using software tools I was generally focused on trying to achieve a domain objective; that means trying to solve some real-world problem at work. This is the approach I used in my videos. I decided to avoid discussing the internals about how C/C++ works, such as the linker and assembler and the rest of the compiler toolchain, and rather focus on the features that Cython provides from a Python perspective. I tried, perhaps too aggressively, to use examples and case studies that were not scientific in nature. The problem with using scientific examples is that it is too easy to exclude people. Different scientists are not necessarily experts in each other’s fields, so using a bioinformatics case study risks alienating the neuroscience folks, and vice versa, while using math examples alientates everyone! So I tried to find case studies that would have the most broad appeal. In one case study, I showed how to speed up a large-scale personal tax calculation. In another, I showed how to load and process data from a history file of Soccer World Cup matches. In yet another I showed basic image processing by changing the RGB color of red chillies to blue. Whether this approach works for Cython remains to be seen, but my gamble is that by using examples that are likely to appeal to, or at least to be merely understandable by the largest number of people, will improve the odds that the lessons will be conveyed successfully. I found Cython quite difficult to teach. Python itself is fairly easy to teach, but Cython, even conceptually, begins to touch on many other disparate issues that are not specifically related to Python. It requires a C compiler, which is installed differently on different operating systems. The compiler doesn’t even work the same on different OSs, making the OpenMP case study disconcertingly compiler-specific. Explaining the Cython type declarations threatens to pull in further discussion about whether a long means 32 or 64 bit; whether a char is a string; and whether that answer changes on Python 2 or Python 3. These are just the most high-level issues. Once you begin to sink your teeth into the meat of the subject, the complexity only increases further. How do you package your Cython module? Do you explain how Python’s distutils works? Oops, I meant setuptools? Do you explain how wheels work? The list goes on and on! My goal eventually became to provide just enough information to allow the average developer to be able to get something working on Cython with a significant speed-up, but not necessarily understanding all the low-level machinery. A complete treatment was unrealistic: the reality is that both my time, and the viewer’s available time doesn’t allow for it. I am proud of the set of videos I did manage to produce though: I think the content strikes a good balance between high-level topics, and just enough hinting at low-level topics to suggest further learning paths. General Advice It is best to get feedback on your videos as often as possible, especially in the beginning. The feedback is not about whether your content is accurate. The main thing is to get feedback about whether the visuals are clear and easy to understand, and whether your speaking pace is clear. My publisher pushed very hard for reviewing demos as well as my first few videos so in that sense I was fortunate. If you plan to self-publish I would strongly urge you to ask friends to critique your first few videos. Even before you spend any time on actual content, make a two-minute demo that you can get feedback on. It makes a world of difference to fix things early. The second piece of advice: the only thing that really matters is whether your content is useful for viewers. It is of no use to rush to complete your videos, only to discover later that few people understood the material. The quality of instruction is more important than how quickly you finish your videos. If you must obsess about something, let it be the success of knowledge transfer. There were two videos in particular that I took forever to complete. Each turned out to be, after editing, around 11 minutes, but my preparation time for each was over 20 hours. These were to do with wrapping C/C++ libraries with Cython, one for Mac and one for Windows. The extensive preparation time was spent on finding a way to convey the most important concepts without getting bogged down in the noise. I kept trying different C libraries from github, and different ways of performing the command-line actions until I found a sequence that was both understandable and useful to a viewer who may perhaps lack significant Python experience. I am especially interested in feedback about these two videos: many years of experience went into their production! The final piece of advice is: if you’re preparing a screencast, aim to do a little every day. I failed to do this, which resulted in large backlogs beset with looming deadlines. I succeeded in meeting my deadlines, but the process would have been much smoother if I had spread out the work more. I hope to make more screencasts in the future, and I’d love to get feedback on Learning Cython to improve the next one! Read More ›

Carpentry week 2016 at the University of Oslo
Lex Nederbragt / 2016-04-12
Cross-posted from the author’s blog. In March 14-18 2016 we organised the first Carpentry week at the University of Oslo. After a mini-Seminar on Open Data Skills, there was a Software Carpentry workshop, two Data Carpentry workshops and a workshop on Reproducible Science as well as a ‘beta’ Library Carpentry workshop. The Software and Data Carpentry effort at the University of Oslo, aka ‘Carpentry@UiO’, really started in 2012 when I invited Software Carpentry to give a workshop at the university. The then director, Greg Wilson, came himself and gave an inspirational workshop – recruiting Karin Lagesen and I to become workshop instructors in the process. Karin and I graduated from instructor training spring 2013 and have been giving a couple of workshops in Oslo and elsewhere. In the fall of 2014 we partnered with the UiO Science Library (Realfagsbiblioteket) with the goal of giving regular workshops and to recruit more people as helpers and instructors. Before Carpentry Week, we have held 4 workshops, and 10 helpers and instructors became involved. In connection with the 4th birthday of the Science Library, we together came up with the plan to organize a mini-workshop on Open Data Skills (the video recording is here), followed by four days with five workshops with a total of 90 participants. We were very fortunate to have three international instructors coming all the way to Oslo for these events: Tracy Teal, executive director Data Carpentry; Leah Wasser, supervising scientist at The National Ecological Observatory Network (NEON); and Titus Brown, assistant professor at UC Davis. Tracy, Leah and Titus, thanks for coming! The workshops we offered were: A Software Carpentry workshop teaching automating tasks with the Unix shell, collaborative code development through version control with git and github, and modular code design with python, with 10 participants; this workshop was taught by Halfdan Rydbeck, Hugues Fontenelle, Arvind Sundaram and Axel Rosén A Data Carpentry workshop for GeoSciences and others, focussing on spatio-temporal data, the use of shell for data exploration and data analysis and visualization using R, with 10 participants, taught by Leah Wasser, Michael Heeremans and Anne Fouilloux A Data Carpentry workshop for Biosciences/Genomics teaching about metadata, use of the shell and cloud computing, and data analysis and visualisation using R, with 29 participants, taught by Tracy Teal, Carrie Andrew and Lex Nederbragt A ‘beta’ version of a Library Carpentry workshop teaching about version control with Git and GitHub, tech jargon, working with plain text formats using Sublime Text, APIs, regular expressions, use of the shell, and data cleaning using OpenRefine with 28 participants, taught by Leon du Toit, Elin Stangeland, Live Kvale, Kyrre T. Låberg, Ahmed Abdi Mohammed, Mari Lundevall, Stian Lågstad and Dan Michael O. Heggø. A one-day workshop teaching technologies such as make, Jupyter Notebooks, Docker, myBinder and RMarkdown/Knitr for making computational analysis more reproducible, with 13 participants, taught by Titus Brown and Tracy Teal Carpentry Week gave a real push to the effort at UiO, generating a lot of attention and allowing us to recruit several new helpers and people interested in becoming instructors. At the Mini-Seminar, we proudly announced that the University of Oslo joins the Software Carpentry Foundation as an affiliate. Affiliate status enables us to strengthen the effort at UiO, help grow the Foundation, and gives us easier access to instructor training. For me personally, the Carpentry Week was a fantastic experience. It was very satisfying to see so many undergraduate and graduate students, postdocs and other staff coming to the workshops – we clearly are addressing a need for this kind of skills training. This was the first time I experienced – and instructed at – a Data Carpentry workshop. As I suspected, many researchers I interact with need the kind of training these workshops offer – perhaps even more so than what Software Carpentry offers. I don’t have many possibilities to teach with experienced instructors not from UiO, so witnessing both Tracy Teal and Titus Brown gave me a excellent opportunity to reflect (positively) on my own teaching. Finally, Software and Data Carpentry attract a fantastically open, warm and welcome community of scientists and students, and we see the same happening at UiO. Instructors and helpers are a great bunch of people to work with! The Carpentry@UiO initiative wants to thank the Science Library, whose excellent organizational skills made putting together so many workshops a breeze. The Library provided the indispensable organizing efforts that such events require, as well as coffee and tea, fruit during the workshops, and lunch for everyone participating each day. Live Rasmussen and coworkers, you are fantastic! Let’s do this again next year… Upcoming workshops at UiO: Software Carpentry and Data Carpentry. (Thanks also to the Carpentry@UiO instructors who gave feedback to earlier drafts of this post.) Read More ›

Installation Video Tutorials
Sarah Stevens / 2016-04-12
I recently created videos for installing the software needed for most Software Carpentry workshops. Links for these videos are now added to the lessons template for a workshop and listed below. Windows: Git Bash and SWC installer Python R/Rstudio Mac: Shell, git and nano Python R/Rstudio So far, we only have Windows and Mac videos. If anyone wants to make Linux versions, I’m happy to answer any questions. Or if you want to make the recordings and not the voiceover, I’m willing to narrate. Acknowledgments The idea for installation videos was proposed by UW-Madison instructors and helpers during one of our local meetings. I recorded all the Windows videos, Christina Koch recorded all the Mac videos, and I did narration for all of them. Read More ›

Changes on Mentoring Subcommittee
Raniere Silva / 2016-04-11
The first quarter of 2016 is almost over and so far the Mentoring Subcommittee have been very quiet although we have continued hosting the post workshop debriefing sessions and the pre-workshop helping session. We have been quiet because we were busy with the Steering Committee election and reorganization of the volunteer hours from our members since six of them (Belinda Weaver, Bill Mills, Karin Lagesen, Kate Hertweck, Rayna M Harris and I) were elected as Steering Committee members. In addition to this some members had to temporarily reduce their volunteer hours for various reasons but are now back to help us. In the upcoming weeks you will hear more news from the Mentoring Subcommittee but for now I want to ask you to welcome Christina Koch as the new chair of the Subcommittee. Christina will replace me on this position and I can say that the Subcommittee couldn’t be in better hands. For those that don’t know Christina, she is one of our instructors working at the University of Wisconsin-Madison, last year she was one of the maintainers of our Shell lesson and is now part of our Instructor Trainer team. Christina is interested to hear from you about ideas for the mentoring committee. She may be asking for more feedback (and volunteers!) over the next few months, but for now, would appreciate receiving your thoughts via this form. Read More ›

Designing a New Novice Python Lesson
Greg Wilson / 2016-04-10
Last November, I volunteered to pull together a new full-day lesson on Python suitable for people with no previous programming experience. It has taken longer to come together than I expected, but that’s partly because I saw it as an opportunity to create a full-length example of the “backward” lesson design method we teach in the instructor training class. What we have so far is in this GitHub repository, which is rendered at this site. We would be very grateful for input: if you think crucial topics have been left out, please say so, while if there’s stuff that could be removed or re-ordered, please say that as well. The most important part of this lesson right now is the design page, which describes the process used to date, the assumptions, the desired results, and the learning plan (with time estimates); please read this before diving into anything else. This is an experiment. In the past, the Carpentries have created new lessons by taking something written primarily by one person and polishing it, or by convening a hackathon to thrash out the “what” and “how”. We would like to see whether creating a skeleton with an explicit design rationale, crowdsourcing its refinement, and then putting prose on its bones can be a useful complement to those methods. To keep us all pulling in the same direction while we figure that out, please: Give feedback via issues or pull requests, not on this mailing list. (See this thread for example of such a discussion.) Check to see if someone has already started an issue or pull request about a topic before creating a new one. (Feel free to edit the titles of existing issues and PRs to make them more descriptive if need be.) Stick to one topic per issue or PR; if the discussion wanders onto a new subject, feel free to create a new issue and link to it. Please also feel free to replace one heterogeneous issue with several more focused ones. Remember that nothing is free: everything takes times to teach, and the total time we have for the lesson and exercises is fixed, so if you want to add something, you must tell us how long you think it will take and what you would remove to make room. (“We’ll just add a bit of X to Y” is cheating…) Do a better job than I have of relating exercises back to the gapminder data set. Thanks to everyone who has contributed so far, and thanks in advance to everyone else for their help. Read More ›

Maintaining a Lesson
John Blischak / 2016-04-04
After two years as a Software Carpentry lesson maintainer of r-novice-inflammation, I am stepping down so that I can spend more time on my research (this thesis appears unwilling to write itself) and with my family. This therefore seems like a good time to summarize what’s happened and what I’ve learned for the benefit of anyone planning to become a maintainer. A Bit of Background Software Carpentry really started growing the number of workshops and instructors in 2012. At the time, there were two main sets of lessons in place: the version 4 SWC lessons written by Greg Wilson and others in 2010-11, and a set developed by The Hacker Within, a student group at the University of Wisconsin - Madison. (These were referred to as the THW lessons, and can be found in the old boot-camps repository.) Two problems arose as the number of workshops and instructors increased: Because no official lesson set was enforced, a workshop could be taught using the Version 4 lessons, the THW lessons, and/or lessons that an instructor put together the week before. Because the workshops were reaching a wider audience, many attendees had never programmed before, but the lessons were written for people who had at least some previous experience. To address these problems, the curriculum was reorganized to create an official set of lessons to be taught in all workshops. As part of this, we aimed to create one set of lessons for complete beginners with no previous programming experience (“novice”) and one for self-taught programmers (“intermediate”). Of these, the novice versions received much more attention and development than the intermediate. Both sets were stored in a single bc repository (“bc” was short for “bootcamp”, which is what we used to call workshops), and the Python and SQL lessons were both written as Jupyter notebooks. By late 2014, storing all the lessons in one central repository had become unmanageable, so we split the bc repo into separate repositories for each lesson. At the same time, we adoped a lesson-template to ensure that all the lessons would have the same structure. This gave us the lessons as they are today. Enter r-novice-inflammation As SWC grew, there was lots of interest in hosting R-based workshops. While some instructors (myself included) experimented with custom R lessons, there was no official set of R lessons: the reorganization seemed like the perfect time to fix this. It also seemed like a good time to experiment with collaboratively developing lesson material (see for example these blog posts by Greg Wilson in 2011 and Justin Kitzes in 2014 — this was also the time that the r-discuss list was created). Inspired by these ideas, we tried to make the lesson development process as similar to open source software development as possible. Our options for creating a new R lesson were to merge the already existing lesson sets, to translate the novice Python lesson to R, or to start from scratch. In our initial discussion, we decided it would be most straightforward to translate the novice Python lessons that used some fake “inflammation” data (now python-novice-inflammation). Six months later, we announced the completion of r-novice-inflammation, which was subsequently migrated to the lesson template. The strengths of r-novice-inflammation are that it focuses on language-agnostic programming principles and parallels the standard Python lesson. Its main weakness is that many of our R instructors are enthusiastic about their favorite language, and preferred to have a lesson focus on how it, in combination with packages like dplyr, could be used for data analysis. After much discussion, we decided that the best way to resolve this was to support a second R lesson, r-novice-gapminder, that focuses more on R specifics for data analysis and visualization. Mechanics of the Lesson Template The mechanics of our lessons have gone through almost as many changes as their content, and are set to go through more in the coming months. At present, the core of the lesson template resides in the lesson-template repo. This repo is purposely very minimal because it is repeatedly merged into the repos for specific lessons. For example, it does not contain a README file, because if it did, that file would create conflicts with the lesson’s own README file at each merge. An example of the lesson template is contained in lesson-example. This repo contains the documentation for writing lessons using the template, the most important of which is LAYOUT.md, which describes all the files a lesson should contain and how they should be structured. As is often the case, these instructions are an ideal, and not necessarily how the repositories are maintained in practice: for example, LAYOUT.md states that solutions to challenges are supposed to be contained in the file instructors.md, but this is inconsistently implemented across the SWC lesson repos. Here is how the template works in general. Content is written in Markdown and converted to HTML using pandoc. This process is automated using a Makefile, so that running the command make preview builds the site. All content, including generated HTML files and figures, are committed to the gh-pages branch of the repository. When pushed to GitHub, the HTML files are automatically served online. The template’s maintainers usually merge changes to the lesson template into the downstream repositories, but the lesson maintainers can also do this. For example, the commands below pull changes from lesson-template into r-novice-inflammation: git clone git@github.com:swcarpentry/r-novice-inflammation.git cd r-novice-inflammation/ git remote add template git@github.com:swcarpentry/lesson-template.git git pull template gh-pages git push origin gh-pages They’re All Special Cases The R lessons are the only ones that still contain executable code—the Python and SQL lessons switched from the Jupyter Notebook to plain Markdown because the former’s JSON format made merges complicated. Support for R Markdown files is included in the lesson-template: the Makefile converts files with the Rmd file extension to md using knit from the knitr package. (We do not need to use render from the rmarkdown package because the pandoc formatting is already done by the template.) Furthermore, there is a file called chunk-options.R in the subdirectory tools/ that: Sends all output figures to the directory fig/. formats the code and output chunks so that they conform to the lesson template (check out one of the Markdown files in r-novice-inflammation to see what this looks like). This is why all the Rmd files start with the chunk shown below to load this file. Also note that this chunk overrides the default fig.path: in addition to writing to fig, we add the basename of the file to make managing the figures much easier. Some of these details are documented in a section called Writing lessons with R Markdown. # Code at the beginning of each lesson. We use chunk option `include = FALSE` to # hide this from the rendered file. source("tools/chunk-options.R") opts_chunk$set(fig.path = "fig/01-starting-with-data-") Current Challenges for the Lesson Template If you become a maintainer, you will be added to the mailing list maintainers, which is where discussion and votes happen. One of your responsibilities will be to help decide on changes to the template. For example, we recently voted on a proposal for making it easier to find and use the setup instructions, and are currently voting on a proposal to manage lesson versioning. On the horizon, we need to decide where challenges and their solutions should go. When there were only a few challenges per topic, it made sense to embed them in the same file as the lesson content. As more exercises have been contributed by trainee instructors, this has grown unmanageable: We have discussed having separate files for extra challenges and their solutions, but have yet to make a decision (see this thread). We also need to revisit the question of using pandoc for turning Markdown into HTML, or switching to Jekyll (the tool that GitHub uses): see this thread and Issues issue-279 and issue-280. A more big picture challenge is the target audience of the SWC lessons. During the reorganization, the goal was to create separate lessons for novice and intermediate learners. However, only the novice versions of the lessons were developed and appear on the lessons page. Despite this, instructors wishing to teach intermediate learners have also developed and used these lessons in workshops. Thus we are back at the original situation, where we have one sets of lessons and two different audiences. In recognition of this, we have started working on a new novice Python lesson (which will hopefully also serve as an example for the instructor training course of how to design a lesson systematically). A Final Thought Perhaps the toughest part of being a maintainer is monitoring all the different places where important discussions take place. I currently watch: r-novice-inflammation r-discuss maintainers discuss (it is high volume, so I just skim it) lesson-template r-novice-gapminder Each one of these has a useful role, but collectively they are simply too much. Going forward, I think the lesson maintainers’ biggest task is to find ways to make the flood of information more manageable, so that more people can play an effective part in curating and improving our content. Read More ›

AMY release v1.5.1
Piotr Banaszkiewicz / 2016-04-03
We’re very pleased to announce the newest (v1.5.1) release of AMY, our workshop management tool. This release is special because: it contains more bugfixes than new features we had a number of submissions from prospective Google Summer of Code 2016 (GSoC) students. GSoC students that submitted fixes or new features to AMY: Shubham Singh added “Notes” field to instructor profile update form Nikhil Verma found and fixed “List duplicates” page error when no duplicates existed Shubham Singh added new tag “hackaton” Chris Medrela found and fixed 404 page for revisions that didn’t exist Additionally Nikhil Verma is working on getting certificates to generate from within AMY, but this will probably land in v1.5.2. Apart from GSoC students we’ve had a number of contributions from our very own Maneesha Sane, Greg Wilson and Piotr Banaszkiewicz. If you want to read more about changes introduced in v1.5.1, here’s the changelog. Thanks to all new contributors! Read More ›

rOpenSci is Looking for a Community Manager
Greg Wilson / 2016-03-29
Thanks to their recent funding, rOpenSci is now looking for a community manager. Their mission is to expand access to scientific data and promote a culture of reproducible research and sustainable research software; to aid that, the community manager’s job will be to broaden the understanding and reach of rOpenSci to the researcher community. For details, please see the full announcement. Read More ›

An R-based Instructor Training Sponsored by the R Consortium
Laurent Gatto / 2016-03-28
The R consortium Infrastructure Steering Committee awards have been officially announced and we are happy to confirm that our proposal to organise an R-based instructor training was successful. We will organise an in-person instructor training that will focus on R. While the content of the training will not change as such, code-related exercises and examples that the participants will be working with will be focused on R, and will be made available to the R instructor community to help them improve their teaching. The full proposal is available here. We are now waiting for all the paperwork to be finalised. The next step is to decide where to organise the training—see this issue for requirements and share your suggestions. If you would like to participate, please register your interest here by sending a pull request. We thank the R consortium for their support and look forward to organising our R-based instructor training. Read More ›

Hello, Spatio-temporal Data Carpentry
Leah Wasser / 2016-03-28
Data Carpentry have just announced the availability of an introductory lesson on working with geospatial data developed in conjunction with NEON (the National Ecological Observatory Network). For details, please see the full announcement on Data Carpentry’s blog. Read More ›

Announcing the Open Science Grid User School 2016
Christina Koch / 2016-03-28
If you could access hundreds or thousands of computers for your scholarly work, what could you do? How could it transform your work? What discoveries might you make? We are seeking applicants for the 2016 Open Science Grid (OSG) User School, which takes place 25–29 July at the beautiful University of Wisconsin in Madison. Participants will learn to use high throughput computing (HTC) to harness vast amounts of computing power for research, applicable to nearly any field of study (e.g., physics, chemistry, engineering, life sciences, earth sciences, agricultural and animal sciences, economics, social sciences, medicine, and more). Using lectures, discussions, roleplays, and lots of hands-on work with OSG experts in HTC, participants will learn how HTC systems work, how to run and manage many jobs and huge datasets to implement a full scientific computing workflow, and where to turn for help and more info. Successful applicants will receive financial support to attend the OSG School, covering all basic travel, hotel, and food costs. Ideal candidates are graduate students whose research involves or could involve large-scale computing — work that cannot be done on one laptop or a handful of computers. We also accept post-doctoral students, faculty, staff, and advanced undergraduates, so make a good case for yourself! Important Dates Application Period (OPEN NOW): 14 March – 15 April 2016 OSG User School: 25–29 July 2016 More Information and Applications Website and brief application: http://www.opensciencegrid.org/UserSchool Email: user-school (at) opensciencegrid (dot) org Facebook / Twitter Please forward this announcement to help us reach potential participants, and consider posting our flyer where appropriate. Link to flyer Read More ›

2015 Annual Report
Adina Howe / 2016-03-23
We’re pleased to announce the publication of the first SCF annual report summarizing what we have done this year to to further our goals of promoting reproducibility and reliability in all branches of science, and helping researchers be more productive. You can download the annual report here. Read More ›

Python Education Summit at PyCon 2016
Andrea Zonca / 2016-03-23
The Python Education Summit is a gathering of educators from all venues (school, colleges, universities, community workshops, online courses, government) that share interest and passion in teaching Python. It is an interesting opportunity for Software Carpentry instructors to get inspiration and tips from educators in completely different contexts. Schedule The Summit will take place at the Oregon Convention Center, same site of PyCon, on Sunday, May 29th, the 2nd day of the tutorials and the day before the main conference starts. It will run from 9am to 4:30pm. Format Many 30 min talks Lightning talks with registration on-site 30 min unconference session at the end Registration and more information You need to be already registered to PyCon Remember to register on Eventbrite, space is limited! Blog post on the Pycon blog Talk schedule Read More ›

4 - 18 March, 2016: Instructor Trainee Mentoring, Debriefing vs Lesson Sessions, Version Control, Big Data in Biology Summer School, New Lessons
Anelda van der Walt, Bianca Peterson / 2016-03-18
##Instructor Training How can we help instructor trainees complete their training? Belinda Weaver is working hard to support new instructors in Australia. Do you have any other ideas? Several opportunities are now available for new and experienced instructors to join the conversations about workshops. Should we be combining various sessions to better integrate new and old instructors and give more options for participation? You can also read about the abovementioned discussions in the latest summary of instructor debriefing rounds 4 and 5. ##Conversations Over the last few weeks a very interesting discussion took place on our discuss mailing list about version control beyond git and svn. The conversation was prompted by an initial post by Arjun Raj. Arjun summarised his view on the discussion in a follow-up post titled: From over-reproducibility to a reproducibility wish-list. ##New A new Data Carpentry Genomics lesson and Defensive Programming with Python lesson were discussed during round 4 & 5 of instructor training debriefing. Please read their post and visit the repos for more information. There are also many other cool tools and tips for new and experienced instructors in this post. ##Events The 3rd Annual Big Data in Biology Summer School will take place at the University of Texas at Austin from 23-26 May 2016. 11 courses will be scheduled over four half-days each and include topics such as basic computational skills, Genomics, and Proteomics. ##Other The DataOne Webinar with Greg Wilson titled Research Computing Skills for Scientists: Lessons, Challenges, and Opportunities from Software Carpentry is now available online. What is the difference between commercial and scientific software? Please share your thoughts. Byron Smith built on the Software Carpentry Make lessons to develop a Make tutorial for Titus Brown’s week-long Bioinformatics Workshop at UC Davis’s Bodega Marine Laboratory in February, 2016. Aleksandra Pawlik recently spoke about Supporting Research Software Community Through Training at the eResearch New Zealand conference. The talk focused on the development of Software and Data Carpentry in the UK. 15 workshops were run over the past 15 days. For more information about past workshops, please visit our website Upcoming Workhshops March: University of Oklahoma, Brock University, University of Connecticut, Imperial College London, University of Florida Informatics Institute, University of Miami, UNIC Gif-sur-Yvette, University of Washington - Seattle April: University of Texas at Arlington, UNC Chapel Hill, Online, University of California, Santa Barbara, North-West University May: Bancroft Building, University of Toronto, CSDMS Annual Meeting, Online, Colorado Special Libraries Association @ CU Boulder, National Institutes of Health June: University of Cincinnati July: R workshop - The University of Queensland Read More ›

Software and Data Carpentry Instructor Training Comes to Africa
Anelda van der Walt, Aleksandra Pawlik / 2016-03-18
North-West University eResearch, UCT eResearch, and Talarify are excited to announce that a Software & Data Carpentry Instructor Training event will take place in Potchefstroom, North-West Province, South Africa from 17 - 20 April 2016. Our lead trainer will be Aleksandra Pawlik and several of the more experienced South African instructors will also join the workshop to work with the trainee instructors. In line with a previous post from Belinda Weaver about helping new instructors through the pipeline, this workshop will form part of a larger 12-month programme to help new instructors truly integrate into the community. The programme, an initiative of the three hosting organisations and currently under development, will include supporting instructor trainees to: complete the training after the workshop; run their first workshop at their home institution; and set up and run a user group or Mozilla Science Study Group to support participants from their workshop after the event In 2017 we aim to bring newly qualified instructors as well as the two or three most active community members from their study groups together again to share experiences and develop proposals for future initiatives. Our instructor training workshop will run over two and a half days. The last day will be used to introduce the concept of user groups and communities and expose participants to the Mozilla Science Lab Study Group Handbook and other useful resources that could be used to help set up and run these community events. We will also have feedback from Maia Lesosky who started the Cape R User Group and members of the NWU Genomics Hacky Hour Study Group to provide real life anecdotes. To ensure a transparent process is followed for selection of candidates we have developed a rubric which will be used to score applications based on requirements set out in the original advertisement. We also have an independent selection committee consisting of two international Software/Data Carpentry community members and four South African instructors. We hope to attract at least 50% women and other gender participants for the event. For more information about the workshop please visit the NWU eResearch website. If you’d like to learn more about the extended 12 month programme, please contact Anelda van der Walt Read More ›

New Maintainers
Greg Wilson / 2016-03-17
We are pleased to announce that Harriet Dashnow and Daniel Chen have agreed to take over maintenance of the R inflammation lesson. Our thanks to them for volunteering, and to John Blischak and Denis Haine for all their hard work over the past year and more. We are also grateful to Erik Bray, who has volunteered to take over maintenance of our Windows installer. As we have grown, we have come to depend on our community to do more than just teach (and yes, I just said “just”). If you would like to get involved in maintaining lessons, mentoring new instructors, or other non-teaching tasks, please give us a shout. Read More ›

2016 Post-Workshop Instructor Debriefings, Rounds 04 and 05
Rayna Harris, Sheldon McKay, Raniere Silva, Tiffany Timbers, Belinda Weaver / 2016-03-10
On February 23, Rayna Harris, Sheldon McKay, and Tiffany Timbers ran the 4th round of post-workshop instructor debriefing. On March 8th, Rayna Harris, Raniere Silva, and Belinda Weaver ran the 5th round of post-workshop instructor debriefing. New lesson materials Data Carpentry Genomics Lesson. Greg Wilson recently sent out an email about updating the lesson maintainer roles for Software Carpentry to jump-start finalizing the Ecology and Genomic lessons. These aren’t final yet, but the Iowa State University instructors spent about considerable time putting together a genomic lesson. You can view the Git repository with their lessons here. Defensive Programming with Python. On Day 2 of the University of Pennsylvania workshop, Byron adapted the Python materials to teach defensive programing, testing, and other tools that were applicable to the audience and could be incorporated immediately into the learner’s research programs. Rayna encouraged Byron to contribute with the new Python Novice Gapminder lesson. What worked well Amazon Web Service EC2. Data Carpentry workshops used Amazon’s instance for cloud computing. For instructions on using EC2 in your workshop, see this lesson. Split screen Git. At our Brisbane workshop, Selene Fernandez Valverde used a split screen to simulate two different collaborators when pushing and pulling with Git and it worked really well. It helped people see what was going on. What could have gone better Picking and choosing parts of the lesson. A common thing instructors struggle with is deciding exactly what pieces of a lesson to teach given the time constraints and the skill level of the learners. For instance, Byron reflected that he might could have spent less time on relative/fixed paths in favor of more time spent on scripting. As a newbie, Leah found working off the script made it difficult to flow through the material as she was unsure how much she could divert from the lesson. We all struggle with this and have different solutions. April Wright gets a lot of useful comments, feedback, and suggestions when she tweets or blogs about her lesson. So, good communication is key. Also, a repository of lesson flow charts would be nice, but I don’t think that resource exists yet. Windows and gitbash. In the Brisbane Python lesson, we had problems with copy and paste on gitbash on the Windows laptops. This made it hard using wget and curl as people had to manually type in the download link for data, and were unable to copy and paste other material. Installation problems Some students had problems installing Python and accessing Jupyter notebooks from Opera and Internet Explorer. Git is not on the default Data Carpentry Instances. This is not an issue, per se, but the instructors did have to stop and retool during the Git lesson because of it. ICYMI: Thoughts On Combining Debriefing Sessions with Instructor Checkout Belinda Weaver wrote a nice post on helping instructor trainees finish their training. Then Greg Wilson suggested combining the lesson discussion with the debriefings. Kate Hertweck agreed and suggested merging the lesson discussion with the pre-workshop help session. Instructor training is a valuable and critical part of our community, and we value your opinion on these matters. If you haven’t already, take a look at these posts and share your thoughts. Thanks! We are very grateful to the instructors who attended these debriefing sessions. By taking the time to reflect on their teaching experiences, they are helping to strengthen our community. Jeff Stafford and Robert Colautti, 2016-02-17-queens Pat Schloss, 2016-02-29-UMichigan April Wright, 2016-02-22-ISU Byron Smith, 2016-02-22-upenn Leah Roberts, 2016-02-01-BrisbaneResBaz Read More ›

Should We Combine Debriefing and Lesson Discussion?
Greg Wilson / 2016-03-08
Belinda Weaver recently posted a proposal for helping instructor trainees finish. While reading her post and writing out the current instructor training workflow for the benefit of GSoC students, I started wondering if it might also make sense to combine the workshop debriefing sessions with the trainee mentoring sessions: a couple of regularly-scheduled slots each week (times chosen to accommodate all comers) would be easier to wrangle, and trainees would get to meet more existing instructors and hear about their in-class experiences. We’d still need some sort of sign-up mechanism to keep numbers manageable, and we’d need someone to report back on which trainees had asked well-informed questions, and it would mean less coupling between the discussion and the specific lesson(s) trainees were working on, but the first two aren’t new work (we’re doing them already) and the third might actually be a good thing, particularly if we follow up on another suggestion and have people attend a discussion before submitting their lesson change. Please add comments to this blog post to let us know what you think of the idea, and to Belinda’s to give her feedback as well. Read More ›

A Proposal for Helping Instructor Trainees Finish
Belinda Weaver / 2016-03-08
Instructor training is one way we grow our community worldwide. Yet many people who go through instructor training never go on to teach at a Software Carpentry workshop. Can that be fixed? I suggest that shepherding people though the final stages might help with completion rates. When the idea of running instructor training in Brisbane was originally mooted, I set up a survey, very early on, to record expressions of interest from people who might want to train. I tweeted that several times in the months leading up to the training. The survey captured name, email address, institution, discipline and how each applicant had heard about the survey. I wish now I had also asked ‘why do you want to train’ or ‘why do you deserve a place over someone else?’ as that would have helped me sift through the candidates. Nearer the workshop time, I put that survey info into a spreadsheet and got my local Software Carpentry crew to help me whittle the more than 55 responses down to 20. We were mindful of a few things: having a spread of disciplines, if possible training people from a range of universities, if possible training people from outside Brisbane, e.g. from regional universities, or from cities not offering instructor training in this round training groups rather than individuals (to help foster activity later) how likely the person was to actually teach. This last was a bit tricky but attendance at a workshop or having helped at a workshop was evidence of at least some commitment to Software Carpentry. I think one big problem is that while people like the idea of Software Carpentry workshops, and like being part of that vibe, that community, that doesn’t necessarily translate into a willingness on their part to actually teach. So evidence of previous participation helps identify instructor trainees who have already put in time to make workshops possible – whether by helping organise them, helping out on the day, getting funding to run them, and so on. But it doesn’t hurt to do some hard questioning beforehand as to whether people can realistically see themselves teaching at workshops, and if so, how often? If someone is a PhD student in the writing-up phase, for example, it’s unlikely they will have much time to commit. By doing that kind of questioning, I was able to eliminate quite a few people from the people who originally volunteered to come. The 20 people I ended up with were what I considered the best 20. To get a place, they had to complete the prerequisite activities by a deadline, or lose the spot - which is one good way to tell whether or not people are serious. If applicants aren’t prepared to complete tasks, or if they’re sloppy about deadlines, chances are they won’t teach workshops either. My 20 all did the prep, they turned up on the day, they stuck it out with no falling by the wayside. After the workshop, we were lucky we had the big Research Bazaar (ResBaz) event coming up at which we offered parallel Software Carpentry workshops in both R and Python. Several fledgling instructors got a chance to teach there while being assisted (and observed) by more experienced instructors. After ResBaz, I organised a Hangout practice session for seven of those recent instructor trainees. During that session, they all taught to the group and they all critiqued one another face to face and via an etherpad. I have since done a second practice teaching session with two other trainees (and walked them both through the pull request requirement) and am planning a third with more trainees from Townsville. I have also followed up twice via email with the rest who attended from Sydney and Canberra. I think my follow ups have helped people complete, and my ability to get a lot of trainees teaching soon after the training ended focused them on the importance of getting through the final stages. I think having a local ‘shepherd’ can really help get people over the line. But the shepherd could be anywhere - it’s more just having someone specific to whom trainees feel accountable - someone who follows up with them, chases them up, and actually cares that they get the final bits done and qualify as instructors. Maybe if there is no-one local who could play that role, this could be a job for the mentoring committee to handle - it really is a mentoring task. The ideal would be to have the instructor trainer hand the class over to the mentor as part of the final session of the training, and for the mentor to check in with trainees regularly from then on – maybe by running teaching practice sessions, handholding, or talking them through the final tasks, getting them over their nerves about actually getting up and teaching (the practice sessions help with that). I think there also needs to be some contribution pathway for people who’ve completed instructor training but have discovered - belatedly! - that teaching is not for them. In order not to waste their training, they could become an organiser/cheerleader. We could always use more of those! Read More ›

3rd Annual Big Data in Biology Summer School
Rayna Harris / 2016-03-08
The Center for Computational Biology and Bioinformatics at The University of Texas at Austin is hosting the 3rd Annual Big Data in Biology Summer School May 23–26, 2016. The 2016 Summer School offers eleven intensive courses that span general programming, high throughput DNA and RNA sequencing analysis, proteomics, and computational modeling. These courses provides a unique hands-on opportunity to acquire valuable skills directly from experts in the field. Each course will meet for three hours a day for four days (either in the morning or in the afternoon) for a total of twelve hours. **Click here for more details or to register! ** Great introductory courses: Introduction to Core Next Generation Sequencing (NGS) Tools Introduction to Proteomics Introduction to Python Introduction to RNA-seq Bioinformatic courses: Bash Beyond Basics Genome Variant Analysis Machine Learning Methods for Gene Expression Profiling Analysis Medical Genomics Metagenomic Analysis of Microbial Communities Computational Modeling: Computational Modeling to Study Evolution in Action Protein Modeling Using Rosetta New in 2016: Bash Beyond Basics: This course will focus on being more productive in the Bash shell. We will learn about regular expressions, Unix utilities like cut/sort/join, awk, advanced piping, process substitution, string manipulation, and Bash scripting. Learn to love the command line and increase your productivity with rapid manipulation of bioinformatic data! Metagenomic Analysis of Microbial Communities: This course surveys the Python software ecosystem and familiarizes participants with cutting-edge data science tools. Topics include interactive computing basics; data preprocessing and cleaning; exploratory data analysis and visualization; and machine learning and predictive modeling. Clinical Genomics: This course will introduce a selection of genomics methodologies in a clinical and medical context. We will cover genomics data processing and interpretation, quantitative genetics, association between variants and clinical outcomes, cancer genomics, and the ethics/regulatory considerations of developing medical genomics tools for clinicians. The course will have an optional lab component where participants will have the opportunity to explore datasets and learn basic genomics and clinical data analysis. Computational Modeling to Study Evolution in Action: This course is about the study of evolution using computational model systems. We will use two different systems for digital evolution: Avida and “Markov Gate Networks” exploring many different possibilities of using computational systems for evolution research. Participants will gain a hands-on introduction to the Avida Digital Evolution Research Platform, a popular artificial life system for biological research and the Markov Gate Network modeling framework to study questions pertaining to neuro-evolution, behavior, and artificial intelligence. **Click here for more details or to register! ** Read More ›

Complexity vs. Subtlety
Greg Wilson / 2016-03-05
I gave a lightning talk on Software Carpentry for the OICR yesterday, and in discussion afterward, Jonathan Dursi made an observation that I’ve been thinking about since. He wondered whether the key difference between commercial software and scientific software is complexity versus subtlety. For example, the software that manages workplace insurance payouts for the province of Ontario is complex because it has to handle every regulatory change since the mid-1920s. None of the its rules and exceptions are intellectually taxing, but by the time you turn them into a service, provide a dozen different interfaces for different business roles, and make the whole thing fault tolerant, the software is incredibly tangled. A lot of scientific software is relatively straightforward by comparison, so long all you look at is the control flow. It’s the specific calculations that are hard: what differencing scheme or statistical test to use, what convergence criteria or significance measure to apply, and so on. And yes, there are a lot of fiendishly tricky algorithms in science, but they’re often hidden in libraries built and maintained by specialists who work and think like software engineers. All of this brings me back to the issue of testing. (I’ll pause a moment to let long-time readers groan, “Oh no, not this again.”) A lot of tools and techniques for testing mainstream software are really about managing its complexity: some of the most useful books I know about making software right are good precisely because they makes this explicit. Offhand, I can’t think of any good books about managing subtlety—about picking the right calculation to perform rather than handling badly-formatted input data and corner cases in control flow. I suspect this is because subtlety is inherently domain-specific, which means many fewer people know enough to write about any particular bit. In response to an early draft of this post, Jonathan added, “This distinction is especially important in the early experimental stage of developing a tool: if something is successful enough and widely applicable enough that it becomes ‘hardened’ or ‘productized’ or the like, then the complexity naturally grows to be robust and to handle a wider range of cases.” This is why I enjoy Software Carpentry so much: someone always has new insights. As always, we’d be grateful for yours. Read More ›

16 February - 3 March, 2016: New Steering Committee, Software Carpentry Value Proposition, Webinar, Vacancies, Community Building, Instructor Training, and Modern Scientific Authoring
Anelda van der Walt / 2016-03-03
##Highlights Our new steering committee has been elected. Congratulations to Rayna Harris, Kate Hertweck, Karin Lagesen, Bill Mills, Raniere Silva, Belinda Weaver, and Jason Williams! Looking for a good model to show potentially interested stakeholders what value Software Carpentry brings? Jonah Duckles, our executive director, shared his views. Let us know how we can improve on this? ##Webinars DataONE is hosting Greg Wilson for a webinar titled Research Computing Skills for Scientists: Lessons, Challenges, and Opportunities from Software Carpentry on 8 March 2016 at 9am Pacific Time. ##New Over the past three years the rOpenSci community have learnt some interesting lessons about community building and open software. Their paper describing these lessons learnt are now published and may provide some valuable insights to others on similar journeys. The new instructor training pipeline is starting to ramp up. Five people are now qualified to train instructors while the next round of instructor trainer training is already underway. We’re currently in the lesson building phase for Modern Scientific Authoring. Comments and contributions are very welcome. ##Vacancies The Bioinformatics Training Facility of the School of Biological Sciences, University of Cambridge is looking for a Training Impact Co-ordinator. ##Other Did you realise that collaborative tools such as the etherpad could be tremendously useful for learners with (even slight) disabilities? We recently published some statistics on gender representation in the Software Carpentry community in response to a new paper on gender bias in open source. The paper has since received strong critique, but our stats remain. The first six ELIXIR Software and Data Carpentry instructors recently received their instructor certification. Read more about ELIXR Software and Data Carpentry activities. 18 workshops were run over the past 16 days. For more information about past workshops, please visit our website Upcoming Workhshops March: Boston College Libraries, National Networks of Libraries of Medicine, New England Region, Alaska Fisheries Science Center / National Marine Fisheries Service, Calcul Québec, Université Laval, UC Berkeley, Notre Dame, University of Connecticut, The Genome Analysis Centre (TGAC), EPSRC & MRC Centre for Doctoral Training in Regenerative Medicine, University of Manchester, Politechnika Krakowska, University of Miami, University of British Columbia, Brock University, University of Connecticut, University of Miami, UNIC Gif-sur-Yvette, University of Washington - Seattle April: Online, UNC Chapel Hill, University of California, Santa Barbara May: CSDMS Annual Meeting, Colorado Special Libraries Association @ CU Boulder, National Institutes of Health July: R workshop - The University of Queensland Read More ›

Communities: The Foundation of Impactful Workshops
Jonah Duckles / 2016-03-01
I spend a good deal of my time trying to communicate with member organizations about what it is that the Software Carpentry Foundation can do to help them meet their own goals. It is part of my job to showcase to them the return on investment that they’ll see in various areas. The three areas in which this is most apparent are impacts on skills transfer to learners who attend workshops, the lesson material that is publicly available and built by the community, and the capacity building that comes from mentoring instructors who are thinking about impactful instruction in short workshops. In order to better arrange my ideas I decided to draw a Venn Diagram with circles for Lessons, Learners and Instructors. I’ve tried to use this diagram to build my own mental model for how what we do in terms of core activities can scale and grow. Through this process I’ve come to think of the structure in this diagram as podular (made of semi-autonomous pods), or a fractal element that repeats at various community scales such as university, research network, nation, or worldwide. At each of our partner organizations we may have lessons, instructors and learners with their own unique and local perspectives, working toward impactful workshops that are appropriate for their own community. Internationally, we are working toward spreading consensus lessons and ethos of using open source tools for open and collaborative science to scientific communities via our workshops while using evidence based teaching methods. These workshops sit at the nexus of the diagram and showcase, in a focused event, what it is that we stand for. They reflect the impacts we would like to have on changing how science is thought about in the context of computation. The brands “Software Carpentry” and “Data Carpentry” reflect a particular set of opinionated lessons arranged to have specific impacts on learners. This is why we work so hard to make sure that a workshop called “Software Carpentry” or “Data Carpentry” is being taught using our methods (the instructor is “badged”) and with instructors who have studied the community-developed lessons. Locally when we deliver a workshop we’re working to bring together lessons, learners, and instructors that can deliver impactful workshops. To be prepared for this we strive to convince learners that spending two days with these lessons and our instructors will be impactful and helpful to how they work. When you put this all together, we’re not just developing training and delivering workshops, we as a community collectively own the lessons and are advancing, testing and refining the evidence based teaching best practices that we share with others and reinforce through our instructor training. Outside of our core lessons and flagship workshops, you are strongly encouraged to duplicate this structure and apply it at smaller scales toward the specific needs of your own communities. This is how we grow and test new ideas and lessons. As you do this we want to know what you’ve learned, we want to hear your success stories, we want to hear about your spectacular failures. Overall, we want the Carpentries together to be a community where the most broadly applicable lessons pertaining to the tools and best practices needed to do modern research can come to be curated and improved together. At the same time, we have thriving global conversations about what gaps there are in our lessons, in our teaching methods, and how can we address those gaps and have more impact on the practice of research supported by computational tools. One area that this diagram pointed out to me that we could do better at is supporting and helping our learners in their self study. We do know from our website analytics that browsing our lessons is one of the most popular activities among website visitors. We also know, through our instructor survey, that we have many instructors that came to be a part of our community from self study of the lessons online over the years. I would welcome ideas and efforts towards our lessons and our community being more supportive of our learners who are interested in self study. As it is, our lessons are mostly meant to be instructor notes, but if we could find ways to make them more useful for self study, I think that would be fantastic. What are your thoughts and ideas on this diagram, are there any ways to enhance it or improve it that you see? It has really helped me to organize a jumble of ideas I’ve been dancing around in conversations with partners over the past several months. Read More ›

Applications due March 1st: 2016 eScience Data Science for Social Good summer program
Ariel Rokem / 2016-02-26
The University of Washington eScience Institute, in collaboration with Urban@UW and Microsoft, is excited to announce the 2016 Data Science for Social Good (DSSG) summer program. The program brings together data and domain scientists to work on focused, collaborative projects that are designed to impact public policy for social benefit. Modeled after similar programs at the University of Chicago and Georgia Tech, with elements from our own Data Science Incubator, sixteen DSSG Student Fellows will be selected to work with academic researchers, data scientists, and public stakeholder groups on data-intensive research projects. Graduate students and advanced undergraduates are eligible for these paid positions. This year’s projects will focus on Urban Science, aiming to understand and extract valuable, actionable information out of data from urban environments across topic areas including public health, sustainable urban planning, crime prevention, education, transportation, and social justice. For more program details and application information visit: http://escience.washington.edu/get-involved/incubator-programs/data-science-for-social-good/ Read More ›

Bioinformatics Training Impact Coordinator
Greg Wilson / 2016-02-22
The Bioinformatics Training Facility of the School of Biological Sciences, University of Cambridge is looking for a Training Impact Co-ordinator. The post-holder will develop (building on existing work), implement, monitor, analyse and report on a comprehensive system of training metrics/key performance indicators across the portfolio of ELIXIR and EXCELERATE bioinformatics training activities. For more information, please see the full ad. Read More ›

Welcome to the 2016 Steering Committee!
Jonah Duckles / 2016-02-20
The results are in. Your 2016 steering committee, in alphabetical order by last name is: Rayna Harris Kate Hertweck Karin Lagesen Bill Mills Raniere Silva Belinda Weaver Jason Williams Thank you to all of the candidates for standing for election. We look forward to an exciting year of the new committee’s contribution and leadership. For those interested, the raw results are available here. Read More ›

More of a Difference Than You Realize
Greg Wilson / 2016-02-19
We received this after an online instructor training workshop earlier this week, which reminded me that small differences for some people can be large ones for others: Thanks for a great workshop the last two days… I wanted to share a separate positive comment that I should have included on the Etherpad: I’m profoundly/severely hard-of-hearing in both ears, and depend quite a bit on lip-reading when listening to people. As such, I have great difficulty with online material if the audio is bad, the speaker is not well lit, or the speaker is simply not on video. This was the first online course I’ve taken where there were several sites participating, and there was the use of software (Etherpad) for collaborative interaction. I have to admit I was dubious at first at how well this would all work for me with you in one corner, the audience in another, and stuff happening on the etherpad. In the end, I think it was fantastic. Having everybody collaboratively take notes worked out really well, because then if I didn’t quite get something, I could wait and see if somebody else typed up the information or I could ask about it in the chat window. I did have trouble hearing the audio from some of the other sites, but it wasn’t critical. Thanks again for a great class. Read More ›

Building Software, Building Community: Lessons from the rOpenSci Project
Greg Wilson / 2016-02-17
Carl Boettiger, Scott Chamberlain, Ted Hart, and Karthik Ram have just published a paper titled “Building Software, Building Community: Lessons from the rOpenSci Project”, in which they describe what they’ve learned by growing the rOpenSci project There are a lot of great ideas here that other groups could borrow, and of course you can keep up with their news by following their blog. Read More ›

2 - 16 February, 2016: Election Week, University Courses, New Lessons and a Shell Co-Maintainer, An Interview, and Teaching Strategies
Anelda van der Walt / 2016-02-16
##Highlights The 2016 Software Carpentry Foundation Steering Committee election is on this week. Questions regarding the elections can be addressed to election@software-carpentry.org. Daniel Chen has written up a fantasticly informative interview-style post about the challenges and opportunities associated with running Software Carpentry lessons as university course. The post is well-worth the read as there are many different views and ideas that might be relevant in your own context. ##Collaborate Great feedback about our collaboratively designed introductory Python lesson has shaped much of what the lesson will look like. How should we proceed with actual development of the lesson? Let us know? Alex Konovalov and his colleagues are now developing a Software Carpentry-style lesson on SageMath. If you’re interested to collaborate, please get in touch. ##New Ashwin Srinath, a Ph.D. candidate in Mechanical Engineering at Clemson University, is joining Gabriel Devenyi as co-maintainer of the Unix Shell lessons. A big thanks to Christina and all the best to Ashwin! A lesson on the computational algebra system GAP has been developed and taught by Alex Konovalov and his colleague A recent interview with Greg Wilson by Matthias Fromm and Konrad Förstner is now available as part of their Open Science Radio podcasts. ##Instructor training How well is Software Carpentry doing in conveying evidence-based teaching practices, specifically when measured against the six core teaching strategies published in 2007? ##Other Ian Hawke from the Centre for Doctoral Training in Next Generation Computational Modelling wrote about his experience in providing context through the use of authentic examples during a recent numerical methods workshop. The second/blog/2016/02/instructor-debriefing-round-02.html) and third instructor debriefing session for 2016 covered topics such as finding solutions for installation challenges, a collaborative blog writing excercise for git, wifi challenges, integration of SQL and Python lessons, and using real-world examples to encourage learning. We’re comparing stats from online versus in-person instructor training from the last few years. Do you have any suggestions for correlations we could be looking at? The Kellogg Biological Station at Michigan State University is running a two-week course on next generation sequencing data analysis for biologists on August 8-19, 2016. If you’d like to know how many workshops we’ve similtaneously ran on one day across the world you can now access the raw data or visualise our activity over the last few years. What do we know about usability and programming language design? Andreas Stefik and his colleagues have developed a two-pager summarising the knowledge that’s out there. 16 workshops were run over the past 15 days. For more information about past workshops, please visit our website Upcoming Workhshops February: University of Edinburgh (EPCC), National Institutes of Health, Online, University of Alberta, University College London, Queen’s University, Sir John A. MacDonald Hall Room 2, ACC Cyfronet AGH, University of Calgary, Iowa State University, University of Miami, University of British Columbia Okanagan, UC Davis, Tulane University, University of Pennsylvania, University of Illinois March: Calcul Québec, Université Laval, Alaska Fisheries Science Center / National Marine Fisheries Service, Boston College Libraries, National Networks of Libraries of Medicine, New England Region, Notre Dame, University of Connecticut, EPSRC & MRC Centre for Doctoral Training in Regenerative Medicine, University of Manchester, University of Miami, University of British Columbia, Brock University, University of Connecticut, UNIC Gif-sur-Yvette, University of Washington - Seattle April: Online May: CSDMS Annual Meeting Read More ›

Our New Instructor Pipeline
Greg Wilson / 2016-02-16
Last fall, we decided to reboot our instructor training course. We’ve tried a lot of things since then, and one of the biggest successes has been our new checkout procedure. In brief: After completing the instructor training course, the trainee picks one of the core lessons from each of the Carpentries she wants to teach and submit a new exercise for that lesson. She then takes part in an hour-long group discussion of that lesson led by an experienced instructor. She is expected to have gone through the lesson before this session so that she can ask lots of pointed questions during that hour. If the mentor leading the session feels that she is unprepared, she may be asked to do some more work and try again. She then does a demonstration lesson via screen sharing. In it, she is asked to teach a five-minute segment of her lesson chosen by the person running the session. Since she does not know in advance which five-minute segment she will be asked to teach, she must be ready to teach any part of the lesson. If the examiner feels that she needs to do more work, she will be given feedback and asked to try again, but if all goes well, she will get her badge. (Note that people don’t have to qualify separately for different topics: if you show that you’ve mastered one, we’ll trust that you’ll master others as needed.) A dozen experienced instructors have run discussion sessions so far, and feedback has been very positive—everyone (both leaders and trainees) have found the sessions really useful. What’s more, trainers other than myself are now running the final demonstration lessons and deciding whether people are ready to teach for us. By mid-year, our rule will be that trainees are always examined by someone other than the person who ran their training course, for the same reason that PhD committees usually include external examiners. It’s starting to look like a sustainable, scalable process, but there’s still lots of work to do: We need to do a better job of telling trainees and discussion leaders what’s expected of them. For example, trainees need to know that they’re responsible for mastering the whole of their chosen lesson, and that they need to check out separately for Data Carpentry and Software Carpentry. We need to automate scheduling and signup for discussion sessions and checkouts instead of using a pile of Etherpads and a flurry of emails. I expect we will do this via AMY, though I’m still leery of opening it up to hundreds of people. We need to make our expectations of trainees clearer. If someone signs up for a session, doesn’t show, then shows up late for another session they haven’t signed up for and asks the leader to wait while they read through the lesson, the discussion leader should know to hand them back to a trainer for a full and frank discussion of our reputation and their reliability. Equally, we need to cut infrequently taught material from our lessons so that trainees know what to focus on. For example, the material on building R packages is too advanced for most novice workshops; we should either move that material to a separate advanced lesson or mark it somehow. We need to set a time limit on completion, let everyone know what it is, and enforce it. Our current thought is to give trainees 90 days to wrap up, while being generous with waivers for extenuating circumstances. We need to follow up with the teams who took part in December 2015’s training and make sure they run the workshops that they promised to. I’m sure lots of other things will come up, but we’re making progress. And it really is “we”: Steve Crouch, Christina Koch, Aleksandra Pawlik, and Tracy Teal are now certified instructor trainers, Ariel Rokem is in training to become one, and we are about add six more: Neal Davis, Rayna Harris, Lex Nederbragt, Anelda van der Walt, Belinda Weaver, and Jason Williams. By August, we may finally have the capacity to help all the people who come to us wanting to help their colleagues. Read More ›

A Counterpoint to Collaborative Lesson Design
Greg Wilson / 2016-02-16
Discussion of our proposed lesson on on modern scientific authoring is an instructive counterpoint to our previous post extolling the virtues of collaborative lesson development. The aim of the lesson is to show researchers how to write and publish in the early 21st Century—or more honestly, to persuade them to stop mailing each other copies of Microsoft Word files and start using something else. After a long-winded opening (which I’ll cut substantially), the current introduction summarizes the strengths and weaknesses of six options: WYSIWYG on the desktop (Microsoft Word). WYSIWYG on the web (Google Docs). Desktop typesetting (LaTeX). Web-based typesetting hybrids (Authorea, Overleaf). HTML. Markdown. It concludes that for the foreseeable future, many researchers will continue to prefer WYSIWYG tools rather than typesetting tools requiring compilation (such as LaTeX and Markdown). However, since most researchers are already familiar with WYSIWYG tools, and since typesetting tools are easier to integrate into reproducible workflows, the lesson will focus on LaTeX for mansucripts and Markdown for the web. The feedback to date has been interesting: Word’s built-in compare/merge tool can be launched from version control systems. New tools (like Authorea and Overleaf) are necessarily immature, and there’s no guarantee they’ll still be around in a couple of years. Many journals won’t accept either LaTeX or Markdown, so we should teach a Markdown-plus-Pandoc workflow. Setting up a build environment for a randomly-selected LaTeX document is exactly as challenging as setting one up for a randomly chosen piece of software. What about LyX? What about CommonMark? What about reStructuredText? What about RMarkdown? What about Microsoft OneDrive? What about org-mode? What about the Jupyter Notebook? There were a few comments about lesson content (mostly about workflows for reviewing changes), but compared to the feedback on the new introduction to Python, there was much more about technical issues like tools and formats and much less about pedagogy and big ideas. I suspect there are two reasons for that: We’ve been teaching Python for years, so instructors are more familiar with the pedagogical issues. There’s genuinely less agreement about tools for modern research writing. The next step is going to be to draw up an outline like this one laying out topics, exercises, and timings. We’ve found with the new Python lesson that this focuses the discussion, and we’re hoping that it will allow parallelization, i.e., that many people will be able to fill in different parts of the outline simultaneously once the overall structure has been agreed. One of the biggest challenges in doing this will be to make the lesson not depend on command-line skills, so that it’s accessible to people who are attending Data Carpentry workshops. That’s going to be hard, as both LaTeX and Pandoc are command-line tools. Whatever results, building lessons this way is a big step for us, and I’m eager to see how well it actually works. Read More ›

Checking the Balance
Greg Wilson / 2016-02-16
Added 2016-02-22: this strong critique of the Terrell et al preprint mentioned in the opening paragraph of this post is worth a careful read. It's been a depressing couple of weeks. On top of yet more reports of universities turning a blind eye to sexual harassment for years, a new paper of gender bias in open source shows that, "...women's contributions tend to be accepted more often than men's. However, when a woman's gender is identifiable, they are rejected more often." This comes on top of earlier studies (like this one) showing that women are substantially under-represented in online forums like Stack Overflow, even when compared to computing as a whole. This prompted me to take another look at how Software Carpentry is doing. To start, here are the number and percentage of qualified Software Carpentry instructors broken down by gender: Qualified Instructors by Gender FemaleMaleOtherUnknown Number136355326 %age26.2%68.3%0.6%5.0% Let's compare that to the number of people contributing to our core lesson repositories in January 2016 by gender. (I'd like to show number for a whole year, but my script won't fetch stuff from that far back.) Repository Contributors by Gender FemaleMaleOtherUnknown Number1979-14 %age17.0%70.5%-12.5% 17% female is better than average for GitHub and Stack Overflow, but still pretty poor. The proportion is even worse when we count number of contributions rather than number of contributors: Repository Contributions by Gender FemaleMaleOtherUnknown Number69532-28 %age11.0%84.6%-4.5% This allows us to calculate contributions per person: Repository Contributions per Person by Gender FemaleMaleOtherUnknown Number3.66.72- But now let's compare this to the stats for Software Carpentry's core mission: delivering workshops. Our next table shows the number of people who taught workshops in 2015: Workshop Instructors by Gender FemaleMaleOtherUnknown Number891931213 %age21.9%47.5%29.8%0.7% while this one shows the actual number of workshops taught (e.g., if I taught three times, I count as three points in the male column): Workshops Taught by Gender FemaleMaleOtherUnknown Number1824061634 %age24.1%53.8%21.6%0.5% and this one shows the average number of workshops taught by each instructor: Repository Contributions per Person by Gender FemaleMaleOtherUnknown Number2.02.11.31.3 Finally, here's the breakdown of contributors to our discussion mailing list for the three months Nov 2015 - Jan 2016: Email Contributors by Gender FemaleMaleOtherUnknown Number2295-- %age18.8%81.2%-- of messages sent: Email Messages by Gender FemaleMaleOtherUnknown Number73277-- %age20.9%79.1%-- and of messages per person: Email Messages per Person by Gender FemaleMaleOtherUnknown Number3.32.9-- I draw some comfort from the fact that our online balance isn't dramatically different from our in-person balance, and that both are much better than GitHub's or Stack Overflow's (though it would be hard for us to do worse), It's still clear, though, that women and other people who do not identify as male are under-represented both online and in person. What's worse is that as we grow, we're regressing to computing's unbalanced mean: over a third of our instructors were women in the summer of 2013. While I worry about the number of people who complete instructor training but never teach for us, I worry a lot more about that, and if we're going to try to fix something this year, that's what I'd like it to be. Read More ›

Designing Lessons Collaboratively
Greg Wilson / 2016-02-15
A few days ago, I asked for feedback on a new Python lesson aimed at people who’ve never programmed before. The outline had already received several rounds of feedback from a handful of people, but there were still lots of comments: As always, the choice of tools attracted a lot of discussion. Jeremy Metz opened by saying, “I worry that the use of a more abstract environment like the Jupyter Notebook might confuse and add an additional barrier to people wanting to ‘really’ use Python.” I agree that the Notebook imposes an extra cognitive load, since it’s so different from anything else novices are likely to have seen, but it has become the tool of choice for many scientists for good reasons: it’s stable, cross-platform, encourages reproducible practices, and has a great, supportive community. We’re also starved for alternatives: the audience for this class isn’t required to have seen the shell, so running scripts from the command line is out, and all of the Python IDEs we’ve tried have significant shortcomings for our audience. Time estimates are one of the places where community input matters most. In this issue, two experienced instructors discuss how far they think learners could get by lunch, while here, two others talk about whether lots of short exercises will be manageable in practice, and whether using the Notebook will help. The former was a useful reality check (which is my way of saying “I cut some material based on their feedback”) while we will address the latter by having most exercises be multiple choice questions, Parsons Problems, and filling in the blanks or tweaking existing code rather than writing things from scratch. Rayna Harris has also suggested that we use Socrative quizzes for real-time assessment. While I’m a bit nervous about becoming any more dependent on closed-source commercial sites than we already are, it’s a great tool, and we’ll definitely explore it. Potential problems are another place where having a community makes a big difference. This discussion reminded me that loading data is hard if you don’t know how to navigate the filesystem; we’ve addressed that by allocating 10 minutes for learners to read their first CSV data set, most of which we expect will be taken up with tech support. Testing is important and coverage of these practical aspects of programming is part of what distinguishes SWC from “pure programming” classes, but (a) will it be accessible, (b) will it be compelling, and (c) what should we take out to make room for it? We could show assert and focus on defensive programming rather than testing per se, and it’s less effort (no separate functions). See this thread for the discussion. Debugging is also important. I’m a big fan of interactive symbolic debuggers, but all the Notebook provides right now is pdb, which we are not showing to novices. Instead, we have 15 minutes of lesson and discussion on how to make sense of error messages (which will draw on this discussion as well as recycling this material) and 25 minutes on actual debugging. The latter episode is toward the end of the lesson, and I suspect that many workshops will drop it because they’ll run short of time. NumPy was the heart and soul of scientific Python for many years, but this lesson will only mention it in passing, devoting its attention to Pandas instead. It really deserves more air time—as Bartosz Telenczuk observes, “students leaving the course without basic familiarity of NumPy will not be able to understand ~60% (my rough guess) of scientific Python applications.” The problem once again is what to cut to make room… The biggest message for me in this wasn’t the specific feedback, though. It was the way that two dozen people who are familiar with our current content and teaching methods, and have first-hand experience delivering this material in the classroom, were willing and able to share what they knew. That doesn’t guarantee that the first draft of the lesson will be perfect, but it does improve the odds of it being good. The next step is to figure out how to go about writing the lesson. Should one or two people assemble a first draft for others to critique? Or should we start by crowd-sourcing the creation of the exercises (which I think will parallelize better)? And if we do the latter, should we ask instructor trainees who already speak Python to propose exercises as part of their training? Comments would be greatly appreciated. Read More ›

NGS Summer 2016: Analyzing Next-Generation Sequencing Data
Greg Wilson / 2016-02-14
A two-week residential course on next-generation sequencing is being offered at the Kellogg Biological Station at Michigan State University on August 8-19, 2016. The course’s directors are Prof. Matt MacManes (U. New Hampshire) and Prof. Meg Staton (U. Tennessee, Knoxville), and instructors will include Prof. Ian Dworkin (McMaster U.), Prof. Torsten Seemann (U. Melbourne), Shaun Jackman (PhD candidate, UBC) and others. More information, or to register, please see http://bioinformatics.msu.edu/ngs-summer-course-2016. Note: if you are running a course that might be of interest to our community, please let us know. This intensive two week summer course will introduce attendees with a strong biology background to the practice of analyzing short-read sequencing data from Illumina and other next-gen platforms (e.g., Nanopore, PacBio). The first week will introduce students to computational thinking and large-scale data analysis on UNIX platforms. The second week will focus on genome and transcriptome assembly, transcript quantitation, mapping, and other topics. No prior programming experience is required, although familiarity with some programming concepts is helpful, and bravery in the face of the unknown is necessary. 2 years or more of graduate school in a biological science is strongly suggested. Faculty, postdocs, and research staff are more than welcome! Students will gain practical experience in: Python and bash shell scripting cloud computing/Amazon EC2 basic software installation on UNIX installing and running Trinity, BWA, Salmon, SPAdes, ABySS, Prokka and other bioinformatics tools. querying mappings and evaluating assemblies Materials from previous courses are available at http://angus.readthedocs.org/ under a Creative Commons/full use+reuse license. Read More ›

Announcing New Unix Shell Maintainer
Christina Koch, Ashwin Srinath / 2016-02-12
After an application and selection process, Ashwin Srinath has been selected as a new co-maintainer of the Unix Shell lesson, joining Gabriel Devenyi and replacing outgoing maintainer Christina Koch. Introducing Ashwin Ashwin is a Ph.D. candidate in Mechanical Engineering at Clemson University, and soon-to-be member of Clemson’s Cyberinfrastructure Technology Integration (CITI) research computing group. His research areas include computational fluid dynamics, and high-performance computing. He spends most days programming and making mistakes, and some days teaching others how not to make them. Ashwin has formerly served as co-maintainer for the Software Carpentry MATLAB materials. I decided to volunteer to be maintainer of the shell lesson primarily because I feel my time is well spent contributing to the SWC organization and community - I’m indebted to them for making me a better programmer and a better person. I’m also excited to work with other maintainers and contributors who I’ve learned a lot from - and will continue to do so. Christina Thanks… This post is about introducing Ashwin as a new maintainer, but I’d like to also thank everyone else who has contributed to the shell lesson over the past 1-2 years, either by submitting pull requests in response to issues, commenting on complex changes, or otherwise weighing in on the direction of the lesson. There’s a lot to keep track of in the Software Carpentry lessons, and that responsibility is lighter when shared among an active community. I appreciated everyone’s contributions - even a simple comment of “yes, I agree” or “no, I don’t agree” - more than you know! I’m looking forward to seeing where the lesson goes in the future with Ashwin and Gabriel at the helm. I’ve not been an active maintainer in the past few weeks, so for those of you with recent pull requests/issues to the shell lesson, sorry for the lack of movement (and thanks to those who have stepped up and commented)! We’ll hopefully clear all that through soon, and continue to improve the material that’s already there. Read More ›

2016 Post-Workshop Instructor Debriefing, Round 03
Rayna Harris, Raniere Silva / 2016-02-11
On Tuesday, February 9th, we ran the 3rd round of Post-Workshop Instructor Debriefing Sessions. Rayna and Raniere hosted the morning debriefing session. The even session was cancelled due to low attendance. New lesson materials Blogging as a Collaborative Git Exercise. The SFU group created a new Git exercise in which pairs of learners wrote guest blog posts for the others site. Then learners experienced conflict resolution with a controlled lesson where paid edited the same line of the guest blog post. This was an engaging exercise that allowed learned to see successful collaboration and conflict resolution. Stay tuned for a dedicate blog post from Bruno Grande about the exercise. What worked well A 5-day Hybrid Data / Software Carpentry Workshop. The 2016-01-25-Utrecht group covered a lot of material in this 5 day workshop (Spredsheets, OpenRefine, Python 3, Unix, SQL, Git, Make, and HPC)! While most of the materials were used as is, updates to the Data Carpentry Python lesson have already been pushed and merged. Integrating SQL and Python. The 2016-01-25-Utrecht group used a very large dataset to demo the loading of files in SQL and queerying the database from Python. Examples like this are very useful for groups who are interested in showing how to integrate tools from different programs. Domain Specific Python Lessons. The 2016-02-06-uguelph workshop substituted the Numpy and Matplotlib sections with material taught by John Simpson. Mateusz referred us to Library Carpentry for lessons tailored for librarians and similiar domain scientists. Practical Use Cases. When instructors take the time to provide real work examples and practical applications, it helps learned better understand the power of these computational tools. What could have gone better Without access to campus wifi, the 2016-02-06-uguelph workshop had to use Eduroam for the internet, which was incompatible with Ubuntu laptops. What is the best way to ensure that the institution provide wifi? If you know, please share. Installation problems Gitbash doesn’t have Make, so that was an issue for the 2016-01-25-Utrecht workshop. Nano setup on Windows was problematic, so the 2016-02-02-SFU instructors had the learners use Notepad++. 3 people out of 32 had some fatal error between terminal/anaconda package. The mercurial install failed for everyone, but this was irrelevant since they were using Git. Thanks! We are grateful to the following instructors who attended this debriefing session. By taking the time to share their experiences and listen, they are truly making the Software and Data Carpentry community even more awesome! Mateusz Kuzak, 2016-01-25-Utrecht Bruno Grande, 2016-02-02-SFU Dhavide Aruliah, 2016-02-02-SFU Pawel Pomorski, 2016-02-06-uguelph Blake Joyce, 2016-01-30-UofArizonaIntroPython Read More ›

Context when teaching Numerical Methods
Ian Hawke / 2016-02-11
As a director of the Centre for Doctoral Training in Next Generation Computational Modelling I spend plenty of time thinking about, working with, and particularly teaching Numerical Methods. The general intro course I give is Maths focussed, has around 100 participants, has about 50 contact hours over 12 weeks, and gets a range of interested students from Maths, Physics and Engineering. The past two weeks I’ve been teaching Numerical Methods for the Centre for Doctoral Training in New and Sustainable Photovoltaics. With around a dozen participants, many of whom had minimal theory or coding background, and only 10 hours available, some changes in the teaching were required. Most of these followed Software Carpentry approaches: live coding, Jupyter notebooks, sticky notes, and a set of “authentic” examples to train underpinning principles. The course we ended up with covered most of the key Numerical Methods topics, but all used a motivating example from Photovoltaics or Solar Cells. The feedback showed that, despite my almost complete lack of knowledge of PV modelling, these examples did provide crucial context that the students found useful and relevant. In combination with live coding we were able to build up some fairly detailed numerical methods, up to coupling PDEs and ODE boundary value problems, all whilst talking about the implications for Photovoltaic modelling. This very much put me in mind of the Software Carpentry Instructor Training course I attended in 2014. Much of the content was nearly identical to that on generic lecturer training courses I’ve been on. But the context, and authentic examples given in the Software Carpentry training, made all the difference in engaging me. Tailoring the material to the interests of the learners (as in Guzdial’s Learner Centered Design approach) in my case made a big difference. But I also matched to my strengths as a trainer, to make the “authentic” examples ones I could motivate and be enthusiastic about. Of course, not all aspects of the course went well - unsurprisingly for a first attempt at delivery. The most consistent piece of feedback was that I typed too fast for people to keep up. This wasn’t a huge problem when live coding, which (almost) slowed me down enough. When I made mistakes, live debugging was a different problem. Minor typos were straightforward, with the students usually spotting them before me. More complex issues, especially where coding and numerical methods problems combined, I tended to find and fix automatically, sometimes without even fully vocalizing the logic behind the bug and its resolution. More practice at revelling in my own errors is clearly required. Read More ›

Open Science Radio Podcast
Greg Wilson / 2016-02-09
Matthias Fromm and Konrad Förstner recently interviewed Greg Wilson for their Open Science Radio podcast. You can listen to it here, or explore their other recordings. Read More ›

A New Lesson on GAP
Greg Wilson / 2016-02-09
We are pleased to announce that Alex Konovalov and his colleagues have created a Software Carpentry-style lesson on the computational algebra system GAP. The lesson is at http://alex-konovalov.github.io/gap-lesson/, and the repository for it is at https://github.com/alex-konovalov/gap-lesson. It was taught at a recent workshop, and feedback can be viewed here. We are now starting to develop a lesson on SageMath. We invite collaborators, please watch the repository if you’re interested in following along, and add a comment to this issue if you’re interested in contributing. Read More ›

Correlations
Greg Wilson / 2016-02-08
We've run instructor training both online and in person for several years, so it's time to look at how they compare. The raw data shows: the event's start date its unique identifier (which we call a "slug") whether it was online how many people took part how many completed training after this course (both as an absolute number and as a percentage) how many completed after taking a later course (in both forms) how many have never completed how many have taught at least once since taking the course The plot below then shows completion rates and follow-through teaching rates (as percentages) versus cohort size, tagged by whether the training event was online or in person, for all events that took place at least one year ago. (I've used that cutoff to give participants a fair chance to teach after completing their training.) It also shows the absolute number of participants and the follow-through teaching percentage by date. There are lots of other ways to analyze this data---if you can find any interesting correlations, please post as comments below. Rayna Harris added this: Greg Wilson wrote two blog posts about active SWC workshops and instructor training. He made some pretty graphs and asked for other plots. So, here is my attempt to make some pretty correlation plots using the instructor training data. Plot Description and Interpretation A1 & A2. Not all learners but most badged instructors will teach Just because we train a lot of new instructors doesn't mean that they all go on to teach a workshop. The number of learners is not a great predictor of how many will actually go on to teach a workshop (R^2 = 0.7021), but once they get their badge, they are very likely to teach (R^2 = 0.915) B1 & B2. With time, more people will teach a workshop We just started the new year, so many newly badged instructors haven't had the chance to teach a workshop. When we account for year, there is a very strong correlation between number of badged instructors and those who have taught a course. It looks like over time, more and more are teaching. C1 & C2. Slightly more badged instructors from in-person training are teaching than those from online training For 2014 and 2015, it looks like a few more badged instructors from the in-person training have gone on to teach a workshop compared to those who completed online training. Given that the attrition rate for online workshops appears to be greater (not shown), one could conclude that the in-person training is more effective at producing SWC instructors. How I made these. Check out my R script to see the linear model that I ran to get the R^2 and the commands used to make the plots. Read More ›

Come a Long Way, Got a Long Way to Go
Greg Wilson / 2016-02-07
Simon Oxenham recently reported on a new report from the National Council on Teacher Quality that examined how well teacher training courses and textbooks convey evidence-based teaching practices. The sad answer is, hardly at all: The report finds that out of 48 texts used in teacher-training programs none accurately described fundamental evidence-based teaching strategies comprehensively. Only 15 percent had more than a single page devoted to evidence-based practices; the remainder contained either zero or only a few sentences on methods that have been backed up by the decades of scientific findings that exist in the field of educational psychology. In particular, textbooks didn’t include anything approaching adequate coverage of six core teaching strategies identified in this 2007 report as being the most effective techniques in all classrooms regardless of age or subject: Pairing graphics with words. All of us receive information through two primary pathways — auditory (for the spoken word) and visual (for the written word and graphic or pictorial representation). Student learning increases when teachers convey new material through both. Linking abstract concepts with concrete representations. Teachers should present tangible examples that illuminate overarching ideas and also explain how the example and big ideas connect. Posing probing questions. Asking students “why”, “how”, “what if”, and “how do you know” requires them to clarify and link their knowledge of key ideas. Repeatedly alternating solved and unsolved problems. Explanations accompanying solved problems help students comprehend underlying principles, taking them beyond the mechanics of problem solving. Distributing practice. Students should practice material several times after learning it, with each practice or review separated by weeks and even months. Assessing to boost retention. Beyond the value of formative assessment (to help a teacher decide what to teach) and summative assessment (to determine what students have learned), assessments that require students to recall material help information “stick”. This raises an uncomfortable question, though: how well does Software Carpentry measure up against these six criteria? I’ll give us a pass on #5 — our two-day workshops simply don’t allow for practice weeks or months later (though we hope learners will do this on their own). But what about graphics? There aren’t many diagrams in our lessons, and the ones we have usually aren’t put up on the screen when we teach. Linking concepts to representations? Probing questions? I think we still have a lot of work to do. But I also think that our lessons and teaching practices are better than they used to be. We’ll add these points to instructor training and follow up on them in mentoring sessions, and keep getting better, one small fix at a time. Read More ›

Software Carpentry as a University Course
Daniel Chen / 2016-02-05
The inaugural Software Carpentry and Data Carpentry Instructor and Helper Retreat is over! It was a long day, packed with tutorials, demos, and discussions. I (Daniel Chen) led a round table discussion with Tiffany Timbers and Jenny Bryan. You can watch the discussion and/or read the notes on etherpad. [T]: Anthony has been teaching the course to undergraduate students and I've been building off Katy and Anthony's experiences from teaching as well as my own from teaching Software Carpentry. Read More ›

2016 Post-Workshop Instructor Debriefing, Round 02
Rayna Harris, Christina Koch, Bill Mills, Raniere Silva / 2016-02-05
On January 26, we ran the 2nd round of Post-Workshop Instructor Debriefing Sessions. Rayna and Christina hosted the morning debriefings while Bill and Raniere hosted the evening debriefing. A topic of interest today was on installations. Read on for details. Thoughts on the “check installation” scripts There is in an ongoing discussion about whether or not the “check installation” scripts for workshop pages should be updated. So, we asked the attendees how often these were being using and how useful they were. Some comments we received were: only a handful of students use them before coming to the workshop instructors/helpers use them when diagnosing install problems on site some cleaning of the script would be useful to remove checks for outdated and/or irrelevant tools If you want to contribute to your two cents, the you can submit an issue here. Thoughts on dealing with installation problems We know that poor planning, sub-par wifi, lack of admin privileges, improper installs, power outages, and a bunch of other things can give rise to installation problems for a learner. A number of groups in Australia have used a cloud-based service for providing learners with homogeneous running environments; this solution has performed well a number of times, but is subject to server and network outages as was the case in the Brisbane workshop discussed in the evening session. So, what do you do when this happens? One idea is to have a handful of USB sticks with all the files needed for quick transfer. This has its own issues but is a good plan B. Maneesha Sane and Kate Hertweck are working on a list of things that should be on this. What worked well The Oklahoma group is developing a Git that they are very happy with. Check it out here. The Boston group gave a 10 minute session on implicit bias and stereotype threat that went over really well. Multiple said they were glad it was included. See this link for details of the session or check out these two blogs. Learners really love the Data Carpentry Spreadsheets Lesson. The lesson is still under development and has room for improvement, but it is helping people make the transition from using Excel to using the command line in their research The City University of New York group found learners were much more motivated after taking time to explore a narrative explanation of realistic tool usage. Waterloo found that reaching out to students on a number of new mailing lists dramatically helped attendance. What could have gone better The University of Boston group need to cap the number of University of Boston students admitted to allow seats for non-University of Boston attendees, but this was tricky to do within Eventbrite. The University of Washington group taught two concurrent workshops, with a R-centric group in one room and a Python-centric group in the other. Managing a single waitlist for both on Eventbrite was not smooth. Maneesha is looking into solutions for this. The University of Washington instructors taught Bash and Git in the mornings, think its best to teach this in the morning. But, the learning didn’t like having their 6 hour R or 6 hour Python lesson split up in two. What do y’all think? What’s the best order for teaching? Thanks! We are grateful to the instructors who attended the debriefing sessions this round. By taking the time to share their experience and listen to the experiences of others, they are truly making the Software and Data Carpentry community even more awesome! Morning attendees: Ariel Rokem Arthur Endsley Christina Koch John Moreau Keith Ma Mark Laufersweiler Matt Aiello-Lammens Sarah Clayton Sarah Stevens Evening attendees: Ivana Kajic Jennifer Shelton Pawel Pomorski Sean Aubin Read More ›

Two Pages of Evidence
Greg Wilson / 2016-02-02
Andreas Stefik and his colleagues have written a two-page summary of what we actually know about usability and programming language design. There aren’t nearly as many results yet as we want or need, but what the people working in the field have shown is that we actually can answer these questions scientifically, and that answering them correctly actually does have an impact. Read More ›

Active Workshops
Greg Wilson / 2016-02-02
A few days ago, we were asked what was the greatest number of simultaneous workshops we’d ever run. I didn’t know, but it was an easy enough question to answer from our records: I’m sure better visualizations are possible (if you want to create one, the raw data is here), and we should find some way to show that the burst starting in April 2014 is actually a couple of workshops that were spread out over several weeks, but I think the trend line is pretty cool. Read More ›

17 January - 1 February, 2016: SCF Election Candidates, Lessons Learned, Instructor Survey Results, and an Intermediate R Lesson
Anelda van der Walt / 2016-02-01
##Highlights: Steering Committee Election The 2016 Software Carpentry Foundation Steering Committee election dates are set for February 15-19. Remember to join our Community Lab Meeting on 9 February to meet the candidates. Read more about the candidates for the 2016 elections by following the links below: Dhavide Aruliah Jonathan Guyer Rayna Harris Kate Hertweck Karin Lagesen Cam Macdonell Lauren Michael Bill Mills Giacomo Peru Raniere Silva Anelda van der Walt Leanne Wake Belinda Weaver Jason Williams ##Published You can read the latest revision of Software Carpentry: Lessons Learned on F1000 now. The 2015 instructor survey results are available in PDF report format. The report contains rich suggestions for improvement and great feedback from instructors about the value gained from involvement with Software and Data Carpentry. Please let us know if you have any comments. A short paper titled A Quick Introduction to Version Control with Git and GitHub has been published in PLOS Computational Biology. It includes topics such as “What not to version control” and “Managing large files” as well as a tutorial on basic git and GitHub usage. Scott Richie shared their lessons aimed at intermediate R users including functions such as reshape2, plyr, foreach loops, and R Markdown. ##Vacancies The Software Sustainability Institute is hiring a Communications Officer. Applications close Thursday 18th February 2016. ##Other Several non-Software Carpentry workshops are being taught by our instructors and might be of interest to the community. Want to add your course to the list? Please let us know. Do you know any good cartoons that can enhance our lessons? The Software Sustainability Institute participated in several Software and Data Carpentry-related events towards the end of 2015 and have a lot in store for 2016. Aleksandra Pawlik has summarised their recent activities and plans for this year. 18 workshops were run over the past 16 days. For more information about past workshops, please visit our website Upcoming Workhshops February: QUT - Research Bazaar (Python), QUT - Research Bazaar (R), ResBaz 2016 R Course at the University of Sydney, University of Oslo, Simon Fraser University, NESI - ResBaz Auckland, NESI - ResBaz Otago, USGS Flagstaff Science Campus, Oak Ridge National Laboratory, New York Academy of Sciences - R, New York Academy of Sciences - Python, University of Guelph, University of Illinois, University of Alberta, University of Calgary, Online, Queen’s University, Sir John A. MacDonald Hall Room 2, University of British Columbia Okanagan, UC Davis, Tulane University, University of Pennsylvania, University of Illinois March: Notre Dame, University of British Columbia April: Online May: CSDMS Annual Meeting Read More ›

SSI is hiring a Communications Officer!
Aleksandra Pawlik / 2016-02-01
The Software Sustainability Institute (SSI) is recruiting a Communications Officer to communicate the activities of the Institute, and raise its profile in the research community and the general public. SSI has been Software Carpentry’s long-standing partner and coordinates our workshops in the UK. They are looking for an enthusiastic individual with a track record in dissemination and outreach. Ideally, they’ll have had experience of working with researchers and/or writing technical articles, as well as an understanding of what it takes to nurture a great social media presence and successfully promote events. For more details and to apply, please see the advert on the University of Edinburgh website (search for vacancy reference ID 035330). Note that the closing date is Thursday 18th February 2016 at 5pm GMT. Read More ›

Elsewhere on the Web
Greg Wilson / 2016-01-29
Our instructors teach a lot more than just Software Carpentry. For example, Christie Bahlai has started teaching a course on “Open Science and Reproducible Research”. You can follow the course blog or check out the materials in the course repository—contributions from the community are very welcome. The course already includes lots of interesting ideas, and there’s more to come. Elsewhere, Tiffany Timbers is organizing several half-day workshops at Simon Fraser University as part of Titus Brown’s Data Intensive Biology training program. These workshops will be hands on and supported by several TA’s in addition to the instructor. All are welcome to participate, regardless of discipline or training level—please feel free to distribute these events widely. And if you cannot attend locally because of geography, please note that several of them are being streamed on YouTube: Regular Expressions and Python taught by Tiffany Timbers on Feb 17, 2016. Amazon Web Services taught remotely by Titus Brown (UC Davis) on Mar 7, 2016. R Markdown taught remotely by Marian Schmidt (U. Michigan) on May 11, 2016. Open Science taught by Bruno Grande (SFU) on June 13, 2016 If you are teaching something out in the open that our audience might be interested in, please send us a link—we’d enjoy hearing more about it, and we’re sure our readers would too. Read More ›

A New Version of 'Lessons Learned'
Greg Wilson / 2016-01-28
A new version of “Software Carpentry: Lessons Learned” is now available on F1000. We think it’s an interesting complement to the instructor survey, and we hope you enjoy them both. Read More ›

Instructor Survey Report
Jonah Duckles / 2016-01-27
Back in the fall we surveyed our then 450-strong instructor community for feedback on what being an instructor has done for them. To recap, we asked the following questions: How did you get interested in doing Software Carpentry/Data Carpentry? How do you use the skills you teach in Software Carpentry/Data Carpentry in your daily work? What is the benefit (personally and/or professionally) to you in being involved with the Software Carpentry/Data Carpentry Community? What examples do you have of how the Software Carpentry/Data Carpentry Community have benefited others (students, labs, university groups)? What suggestions do you have for improvement? There is a ton of great information in this report that can be useful for building momentum for Software and Data Carpentry instructor training and workshops at your own institution. It is clear that we have a very active and driven pool of instructors. This is a large part of what made me so excited and passionate about the community when I first became an instructor. I hope that we can learn from each other and broaden the impacts that this community is having around the world! I hope you take some time to read through what your peers have had to say about the rewards of being an instructor. Please do take action and help us to put some of the great suggestions in the report into our various processes. The resulting report is available here (PDF) Please let us know what you think in the comments or by contacting me directly Read More ›

Our Introduction to Git Has Been Published
John Blischak, Emily Davenport, Greg Wilson / 2016-01-21
Our short paper “A Quick Introduction to Version Control with Git and GitHub” has been published in PLOS Computational Biology. It is freely reusable under a Creative Commons Attribution license, and we hope you and your colleagues find it useful. Read More ›

Meet the 2016 Election Candidates
Jonah Duckles / 2016-01-21
The candidates for the 2016 election are set. The election will take place during the week of February 15th-19th. Look for electionbuddy ballot information in your email if you’re a member. Meet the candidates at the February 9th Community Lab Meeting Calls (14:00 & 23:00 UTC). Call coordinates will be posted to the etherpad when they’re finalized. In alphabetical order by last name the candidates are: Dhavide Aruliah Jonathan Guyer Rayna Harris Kate Hertweck Karin Lagesen Cam Macdonell Lauren Michael Bill Mills Giacomo Peru Raniere Silva Leanne Wake Anelda van der Walt Belinda Weaver Jason Williams Please take some time to read a bit about the candidates and their background so that you can make an informed decision when your ballot arrives in your inbox next month. Read More ›

Presenting Materials for Intermediate UseRs
Scott Ritchie / 2016-01-18
We’ve been teaching R workshops for over a year now at the University of Melbourne, and one thing we’ve noticed is the disparity in skill level among workshop attendees. Researchers generally fall into one of two catergories: Absolute novices: those who have heard of R, but have never touched a programming language before. Regular users: those who are using R in their research, possibly on a regular basis. They can modify scripts, and have a general understanding of the language basics, but want to extend their knowledge. Those who fall into the first category fit the Software Carpentry attendee archetype. The novice materials work well for them. The latter group typically find the novice materials too basic, and end up quickly bored and tune out. However, like the absolute novice they have never encountered programming concepts, and have never written their own function, nor understand for loops. Early last year we recieved a request to run a workshop for a group of quantitative ecologists at the University of Melbourne. The organiser, Saras Windecker had attended a novice workshop previously, found the material too basic, but had appreciated the best practices and programming concepts. Together we sat down, and came up with a rough outline for some of the extension material she thought her group would find useful, and taught a two day workshop themed “effectively working with data”. A write up of the workshop can be found here Since then, we’ve typed up our notes from the workshop into the Software Carpentry lesson format, and can now present you with intermediate materials for regular useRs Just like the novice materials, the lesson spends a lot of time covering the staple programming concepts taught by Software Carpentry: functions control flow looping code organisation best practices At a faster pace, we also expose attendees to more advanced concepts and R specific material that gets missed out in novice lessons: The apply family of functions. Effective data manipulation using data.table and reshape2. How to solve “split-apply-combine” problems with data.table and plyr. Solving embarrassingly parallel problems with parallel foreach loops. Reproducible documents with R markdown. We hope that the community finds these materials useful and look forward to hearing about intermediate and advanced R workshops in the future! Read More ›

What Are Your Favorite Cartoons?
Greg Wilson / 2016-01-18
As we’re updating our lessons this year, I’d like to add a few more cartoons that relate directly to what we teach. Two of my favorites are this description of how most scientists manage revisions and this summary of how many people react to unwelcome evidence. What are yours, and what lessons would you add them to? Please add links in the comments on this page (and please make sure what you link to has enough information for us to contact artists and make sure we have permission to re-use their work). Read More ›

6 January - 16 January, 2016: Election Candidates, SCF Strategic Plan, New Book, Pre-Workshop Help Sessions, AMY Version 1.3, Mistakes, and Recorded Lessons
Anelda van der Walt / 2016-01-16
##Highlights: Steering Committee Election The 2016 Software Carpentry Foundation Steering Committee election dates are set for February 15-19. Please make sure your name is on the members list to cast your vote. Read more about the candidates for the 2016 elections by following the links below: Belinda Weaver Cam Macdonell Leanne Wake Kate Hertweck Dhavide Aruliah Jason Williams Karin Lagesen Rayna Harris Lauren Michael Raniere Silva Bill Mills Giacomo Peru Anelda van der Walt Jonathan Guyer The Software Carpentry Foundation strategic plan as developed by the current steering committee is available. ##New Mark Guzdial’s new book, Learner-Centered Design of Computing Education: Research on Computing for Everyone is highly recommended. The mentoring subcommittee has launched brand new two-weekly pre-workshop help sessions starting on 20 January. Sign up if you’re an instructor looking for assistance with your upcoming workshop. ##Contribute AMY Version 1.3 have been released. Learn about what’s planned for version 1.4 and let us know if you’d like to contribute. Do you have ideas for collecting empirical data on the most common mistakes made by our learners when they use git, python, shell, and R? Links to recorded lessons are now archived on the Lessons page of the website. If you have any recordings and would like to share, please get in touch. ##Other What strategies are Software Carpentry employing to change STEM education? Read the paper by Maura Borrego and Charles Henderson, and let us know what you think? The last year has seen a buzz of Software Carpentry activities in South Africa: six workshops, one remote instructor training session, and lots of lessons learnt. Welcoming and diverse open communities are created through conscious effort of its members. Sarah Sharp describes five levels of cultural change that can be attained. Where would you rate us? The mentoring subcommittee ran the first post-workshop instructor debriefing session for the year. Learn more about 3-day workshops and provide your suggestions for incorporating our “best practices” guidelines in the lessons. Titus Brown and colleagues will be running several online workshops in Q1 and Q2 from UC Davis. Get in touch with him if you’d like to present or participate. Our community has made great progress in terms of contributing to and re-using our lessons since SciPy 2014. You can now publish data-related articles in the CODATA Data Science Journal. It’s open access, online, and peer-reviewed and is edited by Sarah Callaghan. 12 workshops were run over the past 11 days. For more information about past workshops, please visit our website Upcoming Workhshops January: BEACON @ MSU, Natural History Museum, University of British Columbia, NERC / University of Leeds, Natural History Museum, Wang Center - Lecture Hall 2, USDA-ARS, Centers for Disease Control, University of Illinois, University of Waterloo, CUNY Digital Research Bootcamp, University of Texas at Arlington February: University of Sydney, QUT - Research Bazaar (Python), QUT - Research Bazaar (R), ResBaz 2016 R Course at the University of Sydney, Simon Fraser University, NESI - ResBaz Auckland, USGS Flagstaff Science Campus, Oak Ridge National Laboratory, University of Illinois, University of Alberta, University of Calgary, University of British Columbia Okanagan, Tulane University, University of Pennsylvania, University of Illinois March: Notre Dame Read More ›

2016 Election: Jonathan Guyer
Jonathan Guyer / 2016-01-15
##Background My training is in Materials Science and Engineering and I have done both computational and experimental research in fields like semiconductor crystal growth, electrochemical interfaces, and additive manufacturing. I co-authored the FiPy partial differential equation Python framework to support my simulations and to enable others to use these methods. I have spent my professional career at the US National Institute of Standards and Technology and been leader of the Mechanical Performance Group for the past two years. I learned of Software Carpentry when Greg Wilson gave a keynote talk at SciPy 2014. I’d been invited the following week to teach at a summer school about the thermodynamics of phase transformations in materials and the use of FiPy. I told Greg afterwards that I wished I’d had enough time to adjust my course materials and approach to reflect some of the ideas about effective teaching that I’d learned from his keynote and Lorena Barba’s. While there wasn’t time for that, Greg encouraged me to sign up for the next round of instructor training. For the first few weeks, I continued to view this as a way to become a better teacher in my own discipline, but didn’t see myself as a Software Carpentry instructor, per se. With time, though, I realized that SWC covers exactly the skills we struggle to impart to our summer interns every year. Further, I had to admit that many of my colleagues (and I!), did not use computers as effectively as we could, even though many of us are quite adept at scientific computing. In the year since finishing instructor training, I helped at a workshop at the US National Institutes of Health and then was asked to teach a subsequent course there with Fan Yang and Adina Howe. Since then, I’ve organized and led two workshops at NIST and have people asking for more. I am focused now on building a cadre of instructors at NIST to sustain the effort. ##Plans I will support the Foundation in any way the membership thinks I can be helpful, but I am particularly interested in ways to foster continuing engagement with our audience. We have helped (considerably) more than 10,000 people use computers more effectively in their research, which is wonderful. On the other hand, I currently see 332 formal Foundation Members. Somewhere in the wide span between those figures is a group of people who are actively using what we teach, but who will never become badged instructors. I would like to look for ways that we can bolster and encourage those people to continue building on the bootcamp skills, using them in their day-to-day work, and staying engaged in Software Carpentry. One approach I’m thinking about is some form of refresher training or hands-on workshops where we help SWC graduates put what they’ve learned to practice on their own research. I’m already starting to develop this concept where I work, but I see a much broader potential to ensure that Software Carpentry isn’t just a one-time thing, but an ongoing resource. You can email me at guyer@nist.gov or jab a fork at me on GitHub. Read More ›

2016 Election: Anelda van der Walt
Anelda van der Walt / 2016-01-15
Who am I? I have always been involved with, and am passionate about, multidisciplinary, multi-institutional projects. Over the last six years most of my work has focused on collaboratively providing support and training to researchers at various institutions across South Africa. Initially the support focussed specifically in areas of Next Generation Sequencing as I was employed by two national *omics platforms, but more recently I’ve participated in local eResearch initiatives. Through eResearch I’ve been fortunate to collaborate with researchers from a broad array of disciplines including life scientists, medical doctors, engineers, linguists, historians, and more. While I may not be the most computationally advanced person in this community, I have a knack for creating connections between those who need help and those who can help. I specifically love to work with people who are blissfully unaware of the power of technology, and upon exposure, blossoms into the greatest advocates and users with huge impact. My exposure to researchers at all career stages from a wide array of institutions highlighted the dire state of computational adoption as well as the challenges associated with trying to adopt better practices. To help address this need I created a company, Talarify, at the end of 2014. Initially it focused on building sustainable computational capacity amongst NGS researchers in South Africa - specifically within research groups and not only individuals. In 2015 Talarify expanded its focus and joined forces with North-West University(NWU) to aid in development of the NWU eResearch Initiative together with IT, Libraries, and Research Support Office. Formal education include an MSc in Bioinformatics from the South African National Bioinformatics Institute and undergraduate and honours in Genetics. Software Carpentry and me Software Carpentry has been on my radar for several years, but in November 2014 I was involved in running a local workshop for the first time. I’ve subsequently organised or co-organised 5 more workshops and one remote instructor training (thanks to Greg’s support and the enthusiastic help of our great local instructors and hosts). For more info about South African Software Carpentry Activities please read our recent blog post. I’ve also been summarising Software Carpentry activities in blog format since January 2015 and briefly helped out with the Twitter account. I’ve submitted (small) pull requests to the Shell lesson, participated in the 2015 Instructor Retreat in London, and have been a massive local advocate for Software and Data Carpentry in South Africa across all organisations I engage with. What can I offer? It’s easy to see how South Africa and I can personally benefit from our continued and increased involvement in Software Carpentry, but what can I bring to you? Through my work with NWU I am involved with national initiatives such as the Institute for Data Intensive Astronomy and the African Research Cloud which are great avenues for expanding the reach of Software Carpentry into South Africa and possibly even Africa. We’ve already had interest from researchers in Namibia and Mozambique to run workshops in their countries. I would love to help significantly expand the instructor base on our continent and could offer mentoring and advice to others who want to build Software Carpentry capacity in poorly resourced or isolated environments. I also serve on the Silicon Cape Initiative subcommittee for Women in Technology. There are untapped opportunities here to help build communities to support our workshop participants, and to partner in offering workshops specifically for women and other underrepresented groups. The Software Carpentry Community have provided me with inspiration, solutions, answers, and resources to empower others with, and I’d be honoured to help drive this organisation and its community forward into 2016. You can find me at @aneldavdw or anelda-AT-talarify-DOT-co-DOT-za. Read More ›

18 Months of Progress
Greg Wilson / 2016-01-15
My talk at SciPy in 2014 quoted one of our instructors as saying: The most frustrating things for me are 1) people are more willing to submit new material…that to improve existing material or review another person’s PR and 2) instructors teaching bootcamps using other lesson material. This experiment in collaborative lesson development is doomed to fail if 1) people don’t make iterative changes to lesson material and 2) the end product isn’t used in live bootcamps. 18 months later, I’m very pleased by how much better we’re doing. Almost all of our instructors are now using our lessons rather than legacy material of their own. just as importantly, dozens of people have submitted pull requests in the last couple of months alone. Some of those are exercises required to complete instructor training, but others range from minor bug fixes to major refactorings. I’m not sure why this has finally happened: have we finally reached some sort of critical mass, or is it a result of there being specific maintainers for particular lessons? Whatever the cause, it’s great to see. Read More ›

2016 Election: Giacomo Peru
Giacomo Peru / 2016-01-15
2016 Election: Giacomo Peru Ciao a tutti. Many thanks for taking a minute or two to read this post about my candidature for our Software Carpentry Steering Committee. As you’ll see, it was a long way from programming that I was reared, yet this opportunity for closer involvement with our work and its future brings together so much of what I have been doing all these years. Here’s why… Background My educational background is diverse: long years of Classics (Ancient Greek, Latin) concluded with an MA in 2005, followed by European Studies and Local Development. No formal training in programming, therefore, but in my opinion the core disciplines of Classics are excellent preparation for programmers. Workwise, previous experience in European Project Management has informed my current role as Project Officer in the Software Sustainability Institute, where in the last two years I have carried out the Admin around Software Carpentry workshops in the UK, besides other things like administering the Institute’s Fellowship Programme and events (for more background, see http://software.ac.uk/). Previous involvement Since the Institute has in the last few years been the main coordinator of Software Carpentry workshops across the UK, I have become the person who has de facto facilitated these workshops, co-ordinating the various elements of the process which leads to the staging of these workshops: dealing with hosts, recruiting instructors, interfacing between the different parts involved, keeping records. I have collaborated closely with Aleksandra Pawlik, Greg Wilson and many others in the community. Even though Software Carpentry offers a fairly straightforward pattern by which to put together a workshop, each workshop has in my experience been a unique case of blending together different components. Since the structure of Software Carpentry as an organisation has, in the last couple of years, gone through fundamental changes, I have participated very directly in this transition from the old format, through a pretty fluid and unstructured phase, to the current, still in-progress, structure. During these changes my primary focus has always been to coordinate the participants and to realize Software Carpentry workshops in my relevant region, thereby fostering the development of the Software Carpentry model across our scientific community. I have learned a lot, and made some mistakes, and I am now ready to harvest the fruit of the experience matured. Future commitment What I like most about Software Carpentry is its openness and its effectiveness. With openness I mean the fact that anyone willing to contribute in any form is given the possibility to do so with minimum restrictions. With effectiveness I mean that Software Carpentry is actually giving to modern science a real contribution increasing its trustworthiness, accessibility and openness. More recently I have been trying to gain better understanding of what happens ‘inside’ a workshop, by attending in person and by establishing closer contact with hosts and instructors in order to deliver training which meets better and better the expectations and needs of the audience. I believe that my main contribution to the Committee would be my understanding and experience of the mechanisms of Software Carpentry as an organisation with a view to making it more agile, fit-for-purpose, and ready to sustain the success and growth it is enjoying. More precisely I would be interested in helping with the definition of Software Carpentry organisation and processes and with its taking root in the European countries where it is not well known yet. Read More ›

2016 Election: Bill Mills
Bill Mills / 2016-01-14
Meet Bill I’m Bill Mills, a scientific software developer based in Vancouver, Canada, currently spearheading web development for the GRIFFIN collaboration at TRIUMF. In two years with SWC, I’ve taught seven workshops in addition to five live Instructor Trainings, across five countries. Over the past year I have sat on SWC’s Mentorship Committee; my most visible work there was as co-organizer of our first Instructor and Helper Retreat. On the Steering Committee I’d like you to elect me to SWC’s 2016 Steering Committee so I can continue and amplify the work I began on the Mentorship Committee: to ensure SWC puts the needs of its instructors, helpers, and students first. A few ideas: Engaging the SWC Community SWC is great because of you. I’d like to give back to you by making sure we focus on helping you achieve the goals that brought you to us in the first place. At present, many people who complete Instructor Training never teach a workshop. Some don’t have the time, but believe in our mission and want to advocate for us; others would love to teach, but need help assembling a workshop. Everyone who does Instructor Training is a valuable member of this community; we need to support those ambassadors and aspiring instructors by building a stronger pipeline from Instructor Training to advocacy and teaching. Our Financial Future The Software Carpentry Foundation is working hard to hammer out how we can achieve financial sustainability. I’m very optimistic about the results so far, but I believe it’s only part of the solution; for genuine financial stability, Software Carpentry needs to redouble its commitment to its partners, and make itself attractive to large philanthropic grants. Our partnership model must be core to our strategy, since it bases our finances on larger institutional grants, and decouples them from individual workshops. In order for this to be sustainable, we need to make sure our partners are satisfied; redoubling our attention to our partners’ needs strengthens that model, and helps keep Software Carpentry accessible to everyone. In addition to our partners, Software Carpentry needs to attract large philanthropic grants; the challenge we’ve faced in the past, is demonstrating that Software Carpentry actually works. It’s very hard to measure the effect of a two day workshop on our students, but I believe there’s another way; we need to begin arguing on the grounds of the merit of the instructor, helper and supporter community we have created worldwide. I have watched many helpers and instructors go from their first introductions to research computing ideas, to being their strongest advocates in their communities; Software Carpentry has been brilliantly successful in empowering leaders like you. This capacity building is very fundable, and is key to enriching our financial strategy. Summary I am eternally enthusiastic about Software Carpentry because of its community; this project is an incubator for some of the strongest advocates of reproducible research computing in the world. I’d like a seat on the Steering Committee to make sure you always remain front-and-center in our strategy, and to ensure that Software Carpentry gives back to you every bit as much as you give back to us. Read More ›

2016 Election: Raniere Silva
Raniere Silva / 2016-01-14
My name is Raniere Gaia Costa da Silva and I’m standing for re-election to Software Carpentry Steering Committee because working on the Steering Committee in 2016 was fun and a great way to meet some of the amazing members of Software Carpentry community. Background I have a B.Sc. in Apply Mathematics from the University of Campinas, Brazil, and in the last year I worked as a freelancer software developer using most of the time Python and Javascript. This year I will move to the UK for work on the University of Manchester and collaborate with The Software Sustainability Institute. Previous Involvement I discovered Software Carpentry in 2013 (I wish that I did it early) and contacted Greg asking to be in the next round of instructor training, what happened in 2013’s Fall, because I was trying to do Software Carpentry like workshops in Brazil. In 2014 I sent some pull requests to the lessons (mostly to Shell, Git and Python), annoyed Greg because we use Python 2 and not Python 3, helped over-engineering our crazy lesson template, and delivery the first half dozen of Software Carpentry in Brazil. Last year, because of Steering Committee activities I significant drop the number of pull request sent to the lessons (something that I missed very much). On the Steering Committee my biggest contribution was leading the Mentoring Subcommittee that host the post workshop debriefing sessions and the help & instructor retreat that happened last Fall. If you want to read more about what I was involved in 2015 check this blog post. Plans for 2016 In 2016, independent of the result of the Steering Committee election, I want to focus on the assessment of achievements that our students had after attend our workshops because this is a important piece to Software Carpentry sustainability and expansion. I will not leave the Mentoring Subcommittee because it is one of the ways to get data for assessments but I hope to pass the leadership of it to one of our current members. If I was elected I want be secretary of 2016 Steering Committee because although we did a good job with the minutes of the meetings I believe that we could do it a lot better. A few extra things that I want to do but don’t plan to spend many hours are: run our first workshop in Portugal (or any other country that speaks Portuguese), collaborate to workshops in Latin America continue to be offered, and translate the lessons to Portuguese (or any other language). More Sometimes I write in my blog about book that I read, cities that I visited, things that I tried to hack and Software Carpentry workshops that I taught. Some of my projects are on GitHub. Most of them are old and abandoned. =( If you have any question, please send me a email or tweet. You can also call me on IRC (raniere at Freenode) or Slack. Read More ›

2016 Election: Lauren Michael
Lauren Michael / 2016-01-14
About Me B.Sc. Biology, Chemistry - Truman State University M.Sc. Biophysics - University of Wisconsin-Madison 3 years instructing with SWC; certified Fall 2014 9 SWC workshops organized/helped; 8 as an instructor 1 DC workshop organized/helped Relevant Volunteer Work Social Media Manager for Midwest Ultimate, 2013-2014 Science Editor and Writer for the Daily Cardinal, 2011-2012 Currently Research Computing Facilitator at UW-Madison’s large-scale computing center, the Center for High Throughput Computing, where I lead user interaction efforts and spend much of my time working directly with researchers to: consult on computational research design provide issue support develop and deliver learning materials match researchers to eachother and to additional resources Beyond my direct work with researchers, I: partake in design decisions for computational infrastructure contribute to strategic initiatives to improve IT-related services for researchers advocate for researchers and their IT-related needs to campus administrators and other stakeholders Previous Contributions Shortly after I started my current position, three years ago, I began working with long-time SWC (and pre-SWC) advocates to establish an ongoing schedule of SWC workshops at UW-Madison. To this end, I continue to manage local workshop logistics and coordinate our community of 15+ instructors and helpers. Prior to switching to current SWC curricula in Aug 2015, I led the development of experimental curricula with support from SWC leadership. I otherwise work alongside Christina Koch, who serves as a Shell curriculum maintainer and instructor trainer. Steering Committee Enthusiasm I am passionate about enabling researchers to most-effectively leverage computational tools and technology, in part, as a result of my own struggles as a scientist. For this reason, I not only firmly believe in the mission of SWC, but care deeply about ensuring the organization’s future. Based upon experiences and expertise I have gained in my professional, volunteer, and specific SWC activities, I believe I can help to continue the success of the organization’s activities and its strong community of contributors. I have a demonstrated interested in motivating participation within communities, appealing to relevant stakeholders, and managing the execution of a range of training efforts. As SWC embarks on what I believe will be a period of significant growth, I would be honored to contribute to strategic decisions that secure the future of SWC’s ability to empower researchers. I am impressed with recent efforts to improve training and support for instructors, and look forward to any opportunity to share my own ideas, including encouragement of peer training from SWC’s most effective instructors and rewarding significant contributions to curricula and documentation. In order to make stronger arguments to existing and potential partners, I believe SWC can invest in strategies to leverage support from attendees and from leaders in research. These strategies will be essential to securing the organization’s financial sustainability, enabling SWC to scale for the increasing demand for workshops and materials. Furthermore, I look forward to more formally representing SWC, in part, to manage perceptions of the brand by emerging stakeholders. Because an organization like SWC is only as strong as its community, let me know your thoughts! lmichael-AT-wisc.edu Read More ›

2016 Election: Rayna Harris
Rayna Harris / 2016-01-14
##Background I am pursuing a PhD in Cell and Molecular Biology at The University of Texas in Austin. My thesis research in Hans Hofmann’s lab focuses on understanding transcriptional responses to spatial learning with single neuron resolution. Since 2012, I’ve been deeply involved with enhancing graduate student education through 1) the Neural Systems and Behavior (NS&B) course at the Marine Biological Laboratories and 2) the Center for Computational Biology and Bioinformatics (CCBB) at UT Austin. At NS&B, I teach molecular approaches to neuroscience and supervise student research projects. With the CCBB, I organize workshops and symposia to promote sharing of ideas and expertise across departmental boundaries. It’s been awesome to see theses, manuscripts, tweets, and blogs acknowledge these programs for advancing student’s professional development. Before that, I got a B.S. in Biochemistry, taught undergraduate Organic Chemistry labs, and scuba dove for INBio in Costa Rica. ##Software Carpentry Involvement I was first exposed to Software Carpentry methods during an Intro to Python Course via April Wright’s use of pink and green stickes. I thought this was fantastic! April suggested that I attend the Instructor Training Workshop that that Greg Wilson and Titus Brown spearheaded January 2015. My PI enthusiastically supported this, and I secured BEACON travel funding. The workshop was so inspiring and informative, so I wrote a blog entitled Effective Teaching Tips from a Train-the-Trainers Workshop. I have co-taught workshops at UT Arlington and New Mexico State University. I co-organized the Austin-based Instructor/Helper Retreat with Nichole Bennett to strengthen the teaching community in Austin. I am on the Mentoring Subcommittee and Assessment Subcommittee. I’ve co-hosted debriefing sessions with Sheldon McKay, Kate Hertweck, Raniere Silva, and Christina Koch. I worked with Jason Williams to improve the utility of the new post-workshop assessment. ##Vision for Steering Committee Participation The phrase “integration across levels” is the most used phrase among my colleagues from then Hofmann Lab and NS&B. This refers to examining the evolutionary, physiological, genomic, genetic, neural, and environmental mechanisms that contribute to variation in animal behavior. As a member of the steering committee, my vision would be to promote integration across organization levels. Our community is growing rapidly, so my overarching goal is to ensure that each level or organization is aware of and acting on the progress of the others. Specifically, in 2016 I would focus on: integrating data from the mentoring and debriefing sessions with the assessment surveys to understand the degree of workshop effectiveness  discussing the above information with lesson maintainers, who can decide if lessons need revision or not integrating the above with instructor-trainers and instructor-mentors to improve lesson delivery and ultimately student success streamlining the above processes so that new trainees can easily be incorporated into these leadership roles Thank you for considering me for the Steering Committee. Software Carpentry has contributed vastly to my growth as an educator and scientist, and I look forward to contributing back to this excellent community in 2016 and beyond! Read More ›

2016 Election: Karin Lagesen
Karin Lagesen / 2016-01-14
For the past year, I have served as the Software Carpentry Steering Committee’s secretary. My involvement with the SCF started in 2012, when I attended a workshop in Oslo, Norway. I signed on as an instructor in 2013, and have taught 9 workshops since then. In addition to serving on the Steering Committee, I am also a member of the mentoring committee, where I have focused on ways of giving our instructors hands-on experience with the teaching material, with the goal of making it easier for new instructors to get started. I am also currently training to become an instructor trainer. I have a PhD in bioinformatics from the University of Oslo, and am currently employed at the Norwegian Veterinary Institute and the University of Oslo. My background is in both computer science and molecular biology. Since I have formal training in both fields, I am frequently the one to translate the biological problem into a computational one. I have often been called upon to teach people with little to no training in computer science how to do their bioinformatics analyses. This means introducing them to Unix, to command-line work and to basic programming. Working in such multi-disciplinary situations has made me very aware of how hard it can be to move into a field far removed from your core area of expertise. This makes the values and skills that Software Carpentry teaches particularly important to me. If re-elected, I will focus on maintaining and building on the high quality of our training, both when it comes to instructor training and to workshops. We are currently integrating new instructor trainers into the project. Having other people than Greg Wilson train instructors is an important transition for Software Carpentry, and it is vital that we manage this properly. To this end, I will work on building team cohesiveness among those who do instructor training, to help ensure that our training is consistent and that we’re all pulling in the same direction. I will also continue to work on improving the transition from instructor training to teaching that first workshop. The mentoring committee has made important advances in this area, and I aim to continue in that direction. I will also work on finding ways to upskill our existing instructor pool. The quality of our workshops depends heavily on the quality of our instructors, and it is therefore very important to ensure that they are adequately trained and supported. Feel free to contact me on Twitter (@karinlag) or by email (karin.lagesen@gmail.com). I occasionally blog at blog.karinlag.no. Read More ›

2016 Election: Jason Williams
Jason Williams / 2016-01-14
#2016 Election: Jason Williams With benthic sensuous pleasure I offer up myself as candidate for the 2016 Steering Committee. There are those (mom, the U.S. Department of Justice) who have labeled my apoplectic fits a sign of ‘weakness’, but I have found them to be a source of extraordinary strength. With this strength I pledge to bring Software Carpentry to places no one imagined it could (or should) go. As other candidates hail HYDRA, I’m proud to be the first and only candidate to come out in favor of all the things you believe in, and strongly (though perhaps not uncategorically) opposed to the things that displease you. We have seen many successes this year, and I am humbled to have had some small role. Through the efforts of the assessment committee we are now collecting data that will allow informed decisions about the curriculum and give instructors valuable feedback on teaching. I will bring to the role a penchant for being in the right place at the right time. A quick FOIA search of arraignment depositions reveals as much in a pattern of compliments: “It is [comforting] that on multiple occasions the [Mr. Williams] was found [lending a helping hand].” “always in close proximity to the scene of the [giving out free ice cream].” “quia timet; this court finds it absurd to discount [his] seemingly peripheral involvement as mere coincidence. [Clearly], he was aiding and [helping] the [helpless puppies].” I promise to be there for you - just when you need me most, exactly when you need me most. As Assistant Director for External Collaboration at Cold Spring Harbor Laboratory’s DNA Learning Center and Education, Outreach and Training Lead for the National Science Foundation’s life science cyberinfrastructure CyVerse (formerly iPlant Collaborative), I am in close contact with the bioinformatics community and funding agencies (NSF/NIH). I’ve taken every occasion to publicize, advocate, and generate opportunities for Software and Data Carpentry. Going after more grant funding will help us cultivate a more diverse set of learners. We also need to do more to get instructors who are women, from developing countries, and from groups underrepresented/underserved in the sciences. I now serve on an NSF-funded research-collaboration network for integrating bioinformatics into the undergraduate life science curriculum. I am champing at the bit to pursue these and other great opportunities with SWC over the next year. I predict my momentary ‘instabilities’ will only accelerate progress. As the committee’s Treasurer I’ve watched proudly as we have become financially stable. We are well on our way to the catbird seat; money will no longer be a question of ¯\_(ツ)_/¯ but a matter of 😏. There is still much work to do in getting better organized with NumFocus and working with our Executive Director to court and cultivate partners and affiliates. I was honored to host the Steering Committee for an in-person meeting at Cold Spring Harbor. All of their trailblazing work deserves special recognition. The committee has not however, been without its flaws - which I am compelled to reveal in a way that goes beyond what’s captured in meeting minutes. Oftentimes, when I thought consensus was within reach, instead of a vote to pass a motion, we were told to “Finish it on the Astral Plane.” These calls for astral battle would usually come at a time when dissenters either lacked the spirit-energy to project themselves to the proper plane or were simply given improper coordinates. The gossamer-veiled claims that parade these exercises as legitimate parliamentary procedure need to be called out and I’ll put a stop to it. I pledge that astral combat as a tool for decision making will be replaced with fully corporeal dance-offs. Finally, I just want to remind those who voted for me last year just how fun it was. As I ask for your vote, feel free to tweet a selfie (@JasonWilliamsNY) as you select my name so that everyone can be a part of the joy of ceding decision making authority over to a benevolent mother-father figure – it just may be the most liberating experience of the year. Read More ›

2016 Election: Dhavide Aruliah
Dhavide Aruliah / 2016-01-14
My name is Dhavide Aruliah. I am standing for election to the Steering Committee of Software Carpentry for 2016. Who am I? By training, I am an applied mathematician and computer scientist. From 2004 to 2009, I was a tenure-track assistant professor at UOIT (the University of Ontario Institute of Technology in Oshawa, ON, Canada). From 2009 to 2015, I was an associate professor with tenure at UOIT. In mid-2015, I left my academic career to join Continuum Analytics. What connection do I have with Software Carpentry? In 2011, I lead a Software Carpentry graduate reading course (with Greg Wilson’s blessing) at UOIT. Later, in 2012, when Software Carpentry converged on the bootcamp model of delivery, I shadowed Greg teaching a few bootcamps and eventually started leading and organizing bootcamps myself. All said, I have volunteered at over a dozen Software Carpentry bootcamps in various roles. Software Carpentry has been a great source of inspiration for me through a significant transition in my professional life. I am genuinely humbled by the vibrant and dynamic young people (not-so-young, in my case) the Software Carpentry community unites. What do I want to do for Software Carpentry? I want Software Carpentry to help serve as a bridge into industry for young academic scientists. There are far more PhD graduates than available academic faculty jobs; simple arithmetic dictates that most graduate-degree holders will have to find work in the private sector. Unfortunately, the disconnect between academia and industry is daunting to cross for many graduate students and postdocs. To my mind, Software Carpentry has been enormously successful in connecting and nurturing talented scientists from diverse intellectual backgrounds. There is tremendous potential for this network to engage industrial partners to everyone’s benefit. My hope is that my past academic career and my present industrial one will provide a useful perspective for the Steering Committee and the larger Software Carpentry community. What relevant experience can I bring to the Software Carpentry Steering Committee? For several years, I held elected posts for the Canadian Applied & Industrial Mathematics Society (CAIMS). From 2007 through 2010, I was a Member-At-Large on the CAIMS Board and, from 2010 to 2014, I was the CAIMS Treasurer for two successive terms. As an academic at UOIT, I helped develop a number of graduate and undergraduate programs and served as Program Director for two different programs. As a Program Director, I enjoyed getting to know the students well and advocating on their behalf to the university. I also mentored numerous undergraduate and graduate students while at UOIT. Mentoring students is the part of my academic career I miss the most. Software Carpentry is, above everything else, a community that supports budding scientists so I welcome the opportunity to scratch this particular itch of mine through the Steering Committee. Read More ›

2016 Election: Kate Hertweck
Kate Hertweck / 2016-01-13
Hello fellow educators and coding enthusiasts! I’m terribly excited to offer myself as a candidate for the Software Carpentry Steering Committee. See below for answers to a few questions you might have about whether I’d be a good fit. Who are you? I’m an assistant professor at the University of Texas at Tyler in small but diverse Biology Department. My position is officially described as bioinformaticist, but I specialize in comparative genomics and collaborate on many different types of biological data analysis. I teach graduate and undergraduate classes in bioinformatics, genomics, and plant taxonomy. Don’t I know you from somewhere? Perhaps! I was trained as an instructor in fall 2014, taught three workshops in 2015, and attended Data Carpentry’s Genomics Hackathon last spring. If you checked out the Instructor Retreat last fall, you may also recognize me from the session on assessing student performance using Socrative. Most of my involvement over the last year has been through serving on the mentoring subcommittee. If you’ve attended a debriefing discussion after teaching a workshop, chances are good that I was one of your hosts and/or helped write a blog post summarizing those sessions. I’m also coordinating pre-workshop help sessions for instructors preparing to teach. How can you help Software Carpentry? My work on the mentoring subcommittee has given me a deep appreciation for the particular challenges faced by both our instructors and workshop attendees. There are four main areas I’d like to target as a member of the Steering Committee this upcoming year: Instructor preparation: Many instructors are noting similar difficulties in debriefing sessions. While there are many resources to help plan for workshops, I would like to help instructors sift through this multitude of information to help their lessons go as smoothly as possible. This includes streamlining information available and fielding questions during pre-workshop help sessions. Teaching for HPC: Software Carpentry skills are especially important in my subdiscipline, as they are essential for analysis using high performance computing resources. I’m interested in developing lessons that will help entry-level coders use compute clusters. Assessing student adoption of skills: Assessment is definitely a hot topic right now! In addition to basic metrics of student learning from our workshops, I’m keenly interested in assessing which and with what frequency skills are adopted by student learners into their scientific workflows. Moreover, what is the best recommendation we can give for helping students continue learning on their own? Attend another workshop? Join a coding working group? Community and inclusivity: A large part of what I appreciate about Software Carpentry is the sense of community and willingness to embrace diversity and inclusivity. As a member of the Steering Committee, I would keep these values at the forefront of my mind as we develop policies, especially in relation to audiences who may otherwise feel isolated (culturally, geographically, or otherwise). Why do you want to serve on the steering committee? Software Carpentry brings rays of sunshine into my work life on a weekly basis. No joke! This group has been essential for my career development, both as a scientist and educator. I want to contribute more to this group. Given that the pedagogical methods of Software Carpentry dovetail nicely with many of the semester-long courses I teach, service on the Steering Committee could be an important source for my career development, so it really is a win-win situation! How can I learn more? I’m easily stalk-able on the intertubes! You can find me on twitter as @k8hert, GitHub, and I have both a (sorely neglected) blog and research/teaching website. I’m generally happy to talk to other folks passionate about these same sorts of topics, so feel free to drop me a line! Read More ›

Pre-workshop help sessions for 2016
Kate Hertweck / 2016-01-13
Are you preparing to teach a workshop in a few months? Have you been trained as an instructor but are hesitant to sign up to teach? Has it been awhile since you taught and are interested in learning what’s new in a lesson? Are you thinking about teaching a new (or just new-to-you) lesson? The mentoring subcommittee is pleased to announce the institution of regularly scheduled help sessions for folks who are interested in feedback on their workshop plans. The first hangouts of 2016 are planned for Wednesday, January 20 at 10:00 and 19:00 EST and are open to the entire community. You can view possible topics for discussion and sign up to attend on the etherpads for the morning and evening sessions. These discussions are planned for every two weeks and will be included on the community calendar. Please consider coming to join us, either to share your own experiences, troubleshoot lessons, or receive feedback about what you have planned for an upcoming workshop. We are hoping these sessions will be an informal, friendly place to help instructors streamline the process of workshop implementation. Read More ›

A New Book from Mark Guzdial
Greg Wilson / 2016-01-13
Regular readers will know that I am a huge fan of Mark Guzdial, a professor at Georgia Tech whose group does world-class work on computing education, and who blogs about it regularly and incisively. Mark has just released a new book; you can preview the contents if you want, or just head straight to Amazon and order a copy. In it, Mark asks what it means to talk about teaching everyone to program, and whether we should we have the same goals for a mass audience as we do for professional software developers. If not (and he makes the “not” case pretty convincingly), then how do we design computing education that works for graphic designers, high school teachers, and everyone else? His answers are based on both his own experience and his comprehensive knowledge of CS Ed research, and is a solid, readable, and purposeful introduction to the latter. It’s definitely going to be on the reading list for future instructor training classes… Read More ›

Archiving Videos
Greg Wilson / 2016-01-13
People have been asking for videos of our lessons more and more frequently over the past few months, so we have begun archiving links on the lessons page. If you have more, please send them our way (either as a pull request against the lessons page in our website’s GitHub repo or by email. Read More ›

2016 Election: Leanne Wake
Leanne Wake / 2016-01-12
Academic background You can find me in the Department of Geography in the University of Northumbria at Newcastle, UK. In 2010 I obtained my PhD in Geophysics from Durham University, UK. I would describe myself, broadly, as an Earth system modeller. I try to understand the workings of a complex world through the 0s and 1s of code but I have been known get my hands dirty with fieldwork! Relevant experience Having been exposed to a wide variety of code instruction techniques (from “you’re on your own, mate” to “Paired Programming”), I became motivated to find out what the right method was to introduce someone to code. I became involved in Software Carpentry (SWC) via a Fellowship in 2014 with the Software Sustainability Institute. My main interest is software education - see here. I qualified as a SWC instructor in 2015 and subsequently co-led a workshop at St Andrew’s University in the summer of 2015. How I will contribute to the growth of Software Carpentry Global Expansion: Involvement with the Software Sustainability Institute and SWC has convinced me that SWC should be made available to as wide a user-base as possible. As a geographer, I was drawn to this section of the SWC website, and saw that SWC’s global coverage could be improved. Imagine the scenario: Institute to Funder: ‘I’d like some money to host a Software Carpentry workshop, please’ Funder: ‘What percentage increase in coding ability can we expect from this investment?’ Institute: ‘Ummmmm….’ Metrics and Quality Assurance (QA) are a part of everyday life but also huge bugbears for most in academia. However, I believe they can be used as agents for change for SWC. As part of my tenure on SWC’s steering committee, I’d like to work with interested parties both inside and outside SWC towards development of a ‘influence metric’ that SWC can use to show the positive impact of it’s teaching. Let’s turn those post-it notes into points! Course Content: Parallelisation: Making your code more efficient. You have built a car - how do you turn it into a Ferrari? How many folks do you know have a fancy multi-core machine, yet only use a single core? SWC is contributing towards reproducible science, how about efficient science? Organisational Duties/Miscellaneous: I am willing to accept any admin duties to ensure fluidity and expansion of the organisation, e.g.: Maintaining a live instructor database indicating temporal availability of instructors. Recruitment: Workshop and participants number are increasing; Instructor numbers are tailing off. Anything else we should know? I own a lightsaber. Finally….. If you are not sufficiently convinced of my enthusiasm and nerdiness and don’t vote for me, I hope that any members that have taken the time to read this and who find value in these ideas will take them forward. Thanks, Leanne Read More ›

A Strategic Plan for the Software Carpentry Foundation
Katy Huff / 2016-01-12
The Software Carpentry Foundation Steering Committee held a strategic planning meeting in August. In that two-day, intensive meeting, we focused on developing a long term strategic plan for the Software Carpentry Foundation. The process emphasized identification of our stakeholders and mission. It also used the results of our community survey to identify our strengths and weaknesses as well as the opportunities and threats that face the organization. The resulting Strategic Plan was developed over the course of the in-person meeting and summarized into a document. With vital feedback from the Advisory Council, whose role is to offer advice and guidance to SCF on strategic matters, we arrived at the document that now resides here. One highlight is our brief Mission Statement: We aim to teach skills that promote reproduciblility and reliability in research. To accomplish this, we focus on educating and supporting instructors, developing curricula and running workshops. Based on Strengths, Weaknesses, Opportunities, and Threats identified by the community, as well as a lengthy assessment of our stakeholders and mission, the Steering Committee arrived at the following major strategic issues that will be our primary focus for the near term. Some of the nearest term goals have already moved forward since the development of the plan (such as the new website design, which contributes to reporting and transparency). Reporting and Transparency Documentation of Procedures Instructor Pipeline Management Coordination with Data Carpentry More information about the detailed meaning of these issues can be found in the document itself. Rather than a top-down declaration of our goals as a foundation, we intended this document to capture what we understood to be the ideas and missions of you, our community. As you look over this document, we hope you won’t hesitate to provide feedback that will help SCF clarify its mission, nurture its strengths, and reach its aspirations. If you have any questions or suggestions, we hope you’ll get in touch through comments on this post. Even better, if you’d like to be instrumental in the annual revision of this strategic plan, please consider running for the 2016 Steering Committee. The deadline is this Friday. Read More ›

Online Workshops from UC Davis
C. Titus Brown / 2016-01-12
We have been experimenting with half-day workshops on specialized topics that are broadcast over the web. They have been going very well, so we are planning to do more in the first half of 2016, with about half of them broadcast this way. The tentative schedule is below; all workshops will run 9:15am-12:15pm, Pacific Time. If you would like to present something yourself in one of the open slots, please file an issue or leave a comment on this post — we think this could be a really good way to field-test new material that Data Carpentry and Software Carpentry instructors could use for more advanced audiences. 2016-01-20: Camille Scott, pydoit - local+remote 2016-01-27: Raniere Silva, advanced git - local+remote 2016-02-17: Tiffany Timbers, regular expressions and Python - local+remote 2016-02-19: Ariel Rokem, scipy.optimize - local+remote 2016-02-29: Adelaide Rhodes, sphinx + webhooks + bitbucket - local+remote 2016-03-07: Titus Brown, Amazon Web Services - local+remote 2016-03-28: Titus Brown, Short-read trimming and quality eval - local+remote 2016-04-06: Daniel Chen, intro git (SWC lesson on git) - local+remote 2016-04-08: Daniel Chen, advanced git (branching and merging, etc.) - local+remote 2016-04-11: open 2016-04-13: Titus Brown, TBD 2016-04-18: Titus Brown, TBD 2016-04-27: Meeta Mistry, TBD 2016-05-11: Marian Schmidt, Rmarkdown - local+remote 2016-05-13: Ted Hart, TBD - local+remote 2016-05-18: Heer group on VEGA - local+remote 2016-06-06: Titus Brown, TBD 2016-06-10: Titus Brown, TBD 2016-06-13: open 2016-06-15: open 2016-06-22: Titus Brown, TBD 2016-06-29: Titus Brown, TBD Read More ›

2016 Post-Workshop Instructor Debriefing, Round 01
Rayna Harris, Kate Hertweck / 2016-01-12
On January 11, 2016 we ran the 1st round of post-workshop instructor debriefing of the calendar year! The mentoring committee ran 22 debriefing session in 2015, and we are exciting about continuing to host and improve these sessions this year. Interestingly, all instructors present participated in 3-day format Software Carpentry workshops. Read on to find out more about the changes made, what worked well, and what didn’t. Worked well Both of these Nordic workshops were spearheaded by newly minted instructors. Greg Wilson had approached them in the past about becoming instructors and hosting workshops, and this effort has come to fruition. Both workshops were well attended and received, and everyone is looking forward to continuing learning, coding, and networking. The Stockholm group created an exercise/challenge that combined testing and pull requests. Students were given a broken python function; after fixing the errors, the student’s submitted their changes via pull requests. View the exercise here. The Helsinki group incorporated afternoon work sessions where the students could apply the newly-learned tools to their own data or get more help of the morning’s topics. What could have gone better The Helsinki group gave a lecture using the Software Carpentry Best Practices in Scientific Computing slideshow. The material is excellent, but the lecture style delivery was a stark constrast to the hands-on style of teaching the day before. We wonder if there isn’t a way to add the “rules” to various lesson where appropriate. This way, instructors can weave these best practices into the workshop, without needing a separate block of time to cover the slides independently. Thoughts? Other comments Roman Valls Guimera, an instructor with the SciLifeLab, wrote about about their workshop. You can read it and look at some great photos here Thanks We are grateful to the instructors who attended debriefing sessions this round: Radovan Bast, SciLifeLab Stockholm Olav Vahtras, SciLifeLab Stockholm Joona Lehtomäki, University of Helsinki Read More ›

What the Data Says About Novice Programming Mistakes
Greg Wilson / 2016-01-09
I recently had a chance to catch up with this paper from 2014: Neil C. C. Brown and Amjad Altadmri: “Investigating Novice Programming Mistakes: Educator Beliefs vs Student Data”. ICER’14, http://dx.doi.org/10.1145/2632320.2632343. Its abstract says: Educators often form opinions on which programming mistakes novices make most often – for example, in Java: “they always confuse equality with assignment”, or “they always call methods with the wrong types”. These opinions are generally based solely on personal experience. We report a study to determine if programming educators form a consensus about which Java programming mistakes are the most common. We used the Blackbox data set to check whether the educators’ opinions matched data from over 100,000 students – and checked whether this agreement was mediated by educators’ experience. We found that educators formed only a weak consensus about which mistakes are most frequent, that their rankings bore only a moderate correspondence to the students in the Blackbox data, and that educators’ experience had no effect on this level of agreement. These results raise questions about claims educators make regarding which errors students are most likely to commit. There’s lots to admire in both the data they collected and the analyses they did, but the biggest takeaway is that even very experienced teachers only agree very weakly about what errors students make most often, and that their agreement with the data is no stronger. It would be wonderful to have such rich, grounded insight into where people are actually stumbling with Git, Python, R, and the shell. Read More ›

Good Communities (Kinds Of)
Greg Wilson / 2016-01-09
Back in October, Sarah Sharp posted a really useful article titled “What makes a good community?”. In it, she divided online tech communities into six levels (numbered from zero, of course) according to how welcoming and supportive they are. I would put Software Carpentry and Data Carpentry in Level 1, and say that we’re starting to do meaningful work toward Level 2. I would appreciate feedback of two kinds: If you agree, what should we do in 2016 to get to Level 2? If you don’t, where are we falling short of Level 1 (or Level 0)? Read More ›

Change Strategies in STEM Education
Greg Wilson / 2016-01-09
I recently had a chance to read: Maura Borrego and Charles Henderson: "Increasing the Use of Evidence-Based Teaching in STEM Higher Education: A Comparison of Eight Change Strategies". *Journal of Engineering Education*, 103(2), DOI 10.1002/jee.20040. The abstract says: Background Prior efforts have built a knowledge base of effective undergraduate STEM pedagogies, yet rates of implementation remain low. Theories from higher education, management, communication, and other fields can inform change efforts but remain largely inaccessible to STEM education leaders, who are just beginning to view change as a scholarly endeavor informed by the research literature. Purpose This article describes the goals, assumptions, and underlying logic of selected change strategies with potential relevance to STEM higher education settings for a target audience of change agents, leaders, and researchers. Scope/Method This review is organized according to the Four Categories of Change Strategies model developed by Henderson, Beach, and Finkelstein (2011). We describe eight strategies of potential practical relevance to STEM education change efforts (two from each category). For each change strategy, we present a summary with key references, discuss their applicability to STEM higher education, provide a STEM education example, and discuss implications for change efforts and research. Conclusions Change agents are guided, often implicitly, by a single change strategy. These eight strategies will expand the repertoire of change agents by helping them consider change from a greater diversity of perspectives. Change agents can use these descriptions to design more robust change efforts. Improvements in the knowledge and theory base underlying change strategies will occur when change agents situate their writing about change initiatives using shared models, such as the one presented in this article, to make their underlying assumptions about change more explicit. The most valuable part of the paper for me is its discussion of different approaches people have taken to making change happen. The authors break this down by *aspect of system to be changed* and *intended outcome*: Intended Outcome Prescribed Emergent Aspect of Systemto be Changed Individuals I. Disseminating: Curriculum & Pedagogy Change Agent Role: tell/teach individuals about new teaching conceptions and/or practices and encourage their use. Diffusion Implementation II. Developing: Reflective Teachers Change Agent Role: encourage/support individuals to develop new teaching conceptions and/or practices. Scholarly Teaching Faculty Learning Communities EnvironmentsandStructures III. Enacting: Policy Change Agent Role: enact new environmental features that require/encourage new teaching conceptions and/or practices. Quality Assurance Organizational Development IV. Developing: Shared Vision Change Agent Role: empower/support stakeholders to collectively develop new environmental features that encourage new teaching conceptions and/or practices. Learning Organizations Complexity Leadership The authors then discuss the underlying logic of the eight italicized approaches in detail: Diffusion: STEM undergraduate instruction will be changed by altering the behavior of a large number of individual instructors. The greatest influences for changing instructor behavior lie in optimizing characteristics of the innovation and exploiting the characteristics of individuals and their networks. Implementation: STEM undergraduate instruction will be changed by developing research-based instructional "best practices" and training instructors to use them. Instructors must use these practices with fidelity to the established standard. Scholarly Teaching: STEM undergraduate instruction will be changed when more individual faculty members treat their teaching as a scholarly activity. Faculty Learning Communities: STEM undergraduate instruction will be changed by groups of instructors who support and sustain each other’s interest, learning, and reflection on their teaching. Quality Assurance: STEM undergraduate instruction will be changed by requiring institutions (colleges, schools, departments, and degree programs) to collect evidence demonstrating their success in undergraduate instruction. What gets measured is what gets improved. Organizational Development: STEM undergraduate instruction will be changed by administrators with strong vision who can develop structures and motivate faculty to adopt improved instructional practices. Learning Organizations: Innovation in higher education STEM instruction will occur through informal communities of practice within formal organizations in which individuals develop new organizational knowledge through sharing implicit knowledge about their teaching. Leaders cultivate conditions for both formal and informal communities to form and thrive. Complexity Leadership: STEM undergraduate instruction is governed by a complex system. Innovation will occur through the collective action of self-organizing groups within the system. This collective action can be stimulated, but not controlled. The seven dense pages of references at the end were intimidating - I'm still very new to this field - but the categories laid out above have made me think hard about what strategies we're using. I have some opinions, but I'd really like to hear what our instructors and learners think. Are we trying to change the world by diffusion? Are we in the complexity leadership business? Or are we doing something else entirely? Feedback via comments on this post would be very welcome. Read More ›

2016 Election: Cam Macdonell
Cam Macdonell / 2016-01-08
Background & SWC Involvement I am an Assistant Professor in the Department of Computer Science at MacEwan University in Edmonton, Alberta, Canada. My involvement with Software Carpentry began in 2009 as a helper. I helped at a handful of Edmonton bootcamps before completing instructor training in 2013. I have been an instructor for 9 bootcamps, 6 for Software Carpentry and 3 for Data Carpentry. I love my job because I love teaching. Software Carpentry is a great organization to work with in that I enjoy the audience, the perks of travelling and meeting other instructors, and the lessons I’ve learned about effective teaching that have improved my own lectures. I hope to join the Steering Committee to contribute back to this great organization. Joining the Steering Committee While I am happy to pitch in where needed, given my experiences I believe I can contribute in three specific ways: expanding the Software Carpentry learner audience, guiding lesson development, and improving post-workshop assessment and support. Over my years of involvement I have thoroughly enjoyed teaching bootcamps and meeting learners from diverse backgrounds. Several of the bootcamps I have taught have been targeted towards librarians. For these bootcamps, I developed some new lessons utilizing librarian data and formats. As well, the starting point of presumed knowledge had to be lower. I have also taught a few Digital Humanities bootcamps and I believe that Software Carpentry can expand the lesson materials for this audience as well. In joining the Steering Committee, I would look forward to increasing the delivery of bootcamps to these and other audiences. Collaborating with the British Library’s Library Carpentry initiative and Data Science Training for Librarians (DST4L) is something I would hope to be involved in to avoid duplicated effort. Lesson materials and instruction are the “bread & butter” of Software Carpentry. There is a quote about software that is very true - “Software is never complete, only abandoned”. I believe this mindset applies to lesson materials too. Our teaching materials will never really be “done”, they must regularly be revisited and improved. In joining the steering committee, I would take pride in ensuring our materials and instruction remain at a level of excellence to ensure the continued success of Software Carpentry. What happens when learners leave a workshop? This is a question I’m very interested in. Unfortunately I think the answer for many learners is that they fail to apply the lessons they’ve been taught because there are hurdles such as: relating lessons to their own data and processes a lack of retention of lessons taught bad tools that are entrenched in their work environment Exploring how to support learners post-workshop is very important. I think developing and evaluating some combination of online resources, mentoring relationships and other follow-up will help more learners transform the way they work. In addition to helping learners, increased post-workshop follow-up will allow Software Carpentry to produce evidence-based results showing the effectiveness of its teaching method. Read More ›

Announcing the Data Science Journal
Hugh Shanahan / 2016-01-06
The Data Science Journal is, as its title suggests, a journal dedicated to the advancement of data science. The first thing that’s good about it is that you won’t get random emails about it with poor grammar and wild claims about its impact factor that begin with DEAR ESTEEMED RESEARCHER…. Even though it’s about data science, it’s not obsessed with building ever better recommender algorithms for Netflix or mining twitter feeds. Its focus is very much on its application in the policies, practices, and management of open data. It tries to take as wide a definition as possible when considering the subject. Data can be originally digital or converted from other sources, and it also considers every research discipline. The journal will have digital humanities papers rubbing shoulders with bioinformatics papers with social science papers. It’s also a journal that is interested in applications, so papers that are descriptions of data systems are great. Naturally, it’s entirely electronic and open access. The journal has been in existance since 2002 but has recently been relaunched by CODATA and moved to the Ubiquity Press platform with the excellent Sarah Callaghan (@sorcha_ni) as editor (full disclosure: I am on the board). There is a call for papers for the journal which is discussed in detail here. If you are interested have a look at its web site to find out more about the types of articles they are interested in receiving. Read More ›

A Year Of Software Carpentry in South Africa
Anelda van der Walt, Maia Lesosky, Adrianna Pińska / 2016-01-06
After the Software Carpentry workshop ran jointly with the eResearch Africa conferencein November 2014, we were looking for opportunities to run more workshops in South Africa in 2015. In May we ran our first remotely instructed workshop (with thanks to Laurent Gatto and Software Carpentry for the remote option) focusing only on R to counter possible time loss due to technical glitches. Then it was onto remote instructor training together with two groups on other continents (one in the USA and another in the UK) in June. Six new instructors were trained, four of them having gone on to teach one or more subsequent workshops. In July we joined forces with UCT eResearch to host a third local workshop, joined by an experienced SWC instructor, Matt Lammens from the US, who happened to be visiting. Another workshop was offered in conjunction with the SAEON Graduate Student Network Indibano in September. The next one was co-organised by Adrianna Pińska from the Cape Python User Group in October in Johannesburg. In November 2015 we ran the sixth Software Carpentry workshop in 12 months at North-West University. In all, six new instructors were trained, approximately 28 helpers recruited, and more than 200 learners from over 18 local organisations have participated in the 6 workshops between November 2014 and November 2015. There have been a number successes, but of course also some large, and small, bumps in the road. Who participated? Our participants represented undergraduates, postgraduates, postdoctoral research fellows, early career researchers, established researchers and even National Research Foundation rated researchers. Participants also included professional staff in the academic IT departments and libraries and even folks from industry. What have we learnt? Having a community to connect to before and after workshops is essential for sustained learning and actual use of the skills learnt during the workshop. A Cape R User group was started in 2015 which is great for local (SA is big) assistance and community, but doesn’t help all that much with other regions. We’re still thinking of a solution for building a community to support git/GitHub novices as there are currently only the GitHub Guides on YouTube but no physical community to join. For Python in Cape Town there is the Cape Town Python User Group who have also been involved in running and supporting Software Carpentry workshops locally. Establishing links with user groups and communities in other regions of South Africa who can support workshop participants beyond the two days is a high priority for 2016. The workshops and content themselves are still often a big jump for students who come in struggling with their computers and how to navigate an ecosystem more complicated than a word processor and internet browser, and the conceptual leap to programming and algorithms. This is a problem we haven’t really solved yet, but the pre-workshop help/install sessions go a long way, and we have some ideas. Small changes can make big differences - these are things like moving to the three-colour sticky system (blue = working, red = struggling, green = done), making sure the instructor shows an IDE on default settings (massive confusion at the last workshop due to moved-around RStudio panels), pre-defining and redefining the core terms over and over again… The pool of instructors is still far too small, and the time demand (especially for travel) is significant, which is probably the biggest bottleneck to running large numbers of courses. The instructors themselves, though, have been growing as a team (because they keep working together) and that has been a major positive as the discussion in and around tea and lunch usually centres on solving problems, improving outcomes or just brainstorming different ways to get the main messages across to the diverse audience. What next? More workshops in 2016, a local instructor training event, moving out of SA and into parts of the rest of Africa (eg. Kenya, Namibia, Mozambique all have rumours of workshops). Once the local instructor capacity is in place, we’d like to try the four-day half day workshop format. For the workshops themselves, some ideas that have come up are to provide printed cheatsheets with basic commands, and some way to get systematic followup and support for participants (perhaps even Twitter?) . Finances Except for the first workshop in November 2014, we have had no major sources of funding. The bulk of the costs has been covered by a minimal registration fee of ZAR500 (~GBP 24 / Euro 35 / USD 35e). Registration fees are paid into cost centres of host institutions to lower the administrative burden on researchers and students. Fees can be paid via internal fund transfer from research cost centres or via electronic funds transfer (EFT) should participants be paying from their own pockets. Budgets have been supplemented by small donations and sponsorships from hosting institutions and private organisations. Not accounting for the time spent by organisers and instructors, the bulk of the budget goes towards catering (when we have local trainers available). We’ve tried to provide participants with three tea breaks daily, as well as lunch. This has proved to be quite successful (when the coffee isn’t terrible) in stimulating much-needed conversations outside of the classroom. Other costs include: * Travel and accommodation costs for instructors * Flash drives to have local copies of data and GitHub repositories in case of power outages and to save on bandwidth * Name tags * Sticky notes * Renting of audio/visual equipment * Dinner for instructors/helpers * Social event at the end of the first day to allow extra time for networking in a more relaxed setting (we’ve done this at four out of six workshops and it has been received quite positively) Accounting for the time spent by “someone” to organise the workshop is essential. Thank you! Special thanks to Greg Wilson, Jonah Duckles, James Hetherington and Aleksandra Pawlik who helped us to run the first workshop a year ago and who have subsequently been incredibly supportive in many, many ways. The debriefing meetings orchestrated by the mentoring subcommittee has been invaluable to learn from others and we appreciate the time they make available to run these. We’d also like to thank our South African hosts, the eResearch Africa conference organisers, Stellenbosch University Department of Genetics, University of Cape Town (UCT) eResearch, the South African Environmental Observation Network (SAEON), PyConZA, and the North-West University for the opportunity to teach and learn with their students and researchers. The workshops would not have been possible without the enthusiastic participation of researchers, students and technical staff as helpers and/or learners. Thanks for your feedback and for hanging in there even when we were having technical challenges or when the explanations just didn’t make sense. Lastly, thanks to our instructors who have time and again volunteered to teach another workshop and who’ve contributed to the growth in so many ways! South African Organisations Abbreviations: UCT: University of Cape Town; SU: Stellenbosch University; NWU: North-West University; SAEON: South African Environmental Observation Network; UWC: University of the Western Cape; UFS: University of the Free State; CSIR: Council for Scientific and Industrial Research; IL: Ithemba Labs; SKA: Square Kilometre Array; AIMS: African Institute for Mathematical Sciences; GSH: Grootte Schuur Hospital; SAWS: South African Weather Services; UP: University of Pretoria; WCDA: Western Cape Department of Agriculture; WITS: University of the Witwatersrand. Read More ›

AMY Version 1.3
Piotr Banaszkiewicz / 2016-01-05
In the past month we’ve seen two releases of AMY: v1.2.1 and v1.3. This blog post (originally published on my personal blog) contains a joined release notes for both of them. Bug fixes wrong URL used in event validation or import/update features is now indicated (and we won’t receive wrong notifications about it) properly throw 404 on some pages (previously: 500) spaces are striped from Person and ProfileUpdateRequest fields (names, emails) disable location inputs on event details page if online country was preselected New features use custom-built jQuery-UI (so that we no longer have conflicts with Bootstrap’s tooltip module) Greg updated the script used to send instructors “Hey, update your info” mails (it’s getting removed later on) it’s possible to add memberships per host new badge: DC instructor new logic for dealing with two instructor badges timeline of TO-DO items basic models (e.g. lessons, tags, academic levels, etc.) are now manageable from Django’s admin interface all persons view: add filtering by workshop type person taught at remove blurred production database in favor of generated fake database mailing script turned into better Django management command bulk upload now shows generated username and suggested people with matching names show preview of event on SWC website API: filter events by tags No longer with us Greg removed some unused scripts (test-command-line-upload.sh) and commands (parse-instructor-info.py) notifications about invalid HTTP header Host other removed scripts and commands January and February don’t seem busy for me, so I hope to have more done on AMY in the coming months. I also want to thank Prof. Ethan White for his support of my work through December, and for extending his support for the next two months. Interested in helping develop AMY? See what’s scheduled for v1.4. Read More ›

17 December,2015 - 5 January, 2016:Steering Committee Election, New Website, Updated Assessment Forms, Mentoring Meetings, Instructor Training, and First Lab Meeting For 2016
Anelda van der Walt / 2016-01-04
####Highlights 15 January: Deadline for standing for the 2016 Steering Committee elections New pre- and post workshop assessments have been developed to better understand our impact amongst other things. Please take a moment to see what is assessed and let us know if you have other suggestions. The new look website has been launched. Updated procedures for blog contribution and creation of workshop sites are available at https://github.com/swcarpentry/website. ####Upcoming events The mentoring subcommittee is already hard at work with their first meeting and debriefing sessions. Please join them if you’ve recently taught or would like to learn more about their plans for the year. 12 January: First lab meeting for 2016 ####Instructor Training New procedures have been developed to qualify as Software and Data Carpentry instructor. Feedback from the three varieties of instructor training that were put to the test in 2015 is now available. Billy Charlton from Puget Sound Regional Council, used the online Software Carpentry materials to prepare for teaching his first Software Carpentry lesson this past December. If you’ve been on the instructor training waiting list for long, his blog gives good pointers to get going. ####Contribute Want to help trainnee instructors get their instructor badge? Let us know if you have taught two or more workshops and have two hours to spare. ####Other The University of Washington have been running an undergraduate programming course focussed on real-world data analysis since 2012. There’s a lot to learn from their experience. Can comparing our lessons with peer-reviewed lab protocols help to improve usability? Do you want to help find a good word to describe the “practice of hardening software”? See some suggestions or give your own as comment to the blog post. Upcoming Workhshops January: University of Nebraska - Lincoln, NERC / University of Bristol, University of Washington - Seattle, Department of Physics, Arizona State University, Western University, National Bureau of Economic Research, The University of Huddersfield, University of Dundee, The University of Lausanne, Boston College Libraries, National Networks of Libraries of Medicine, New England Region, UW-Madison, Berkeley Institute for Data Science, University of British Columbia The University of Queensland NERC / University of Leeds Wang Center - Lecture Hall 2 USDA-ARS Centers for Disease Control University of Illinois University of Auckland February: USGS Flagstaff Science Campus University of Illinois University of British Columbia Okanagan, University of Illinois Read More ›

A Data Programming CS1 Course at the University of Washington
Greg Wilson / 2016-01-03
People who are interested in integrating Data Carpentry and Software Carpentry ideas into the undergraduate curriculum may enjoy reading Anderson et al’s paper “A Data Programming CS1 Course” (SIGCSE’15, http://dx.doi.org/10.1145/2676723.2677309). From the abstract: This paper reports on our experience teaching introductory programming by means of real-world data analysis. We have found that students can be motivated to learn programming and computer science concepts in order to analyze DNA, predict the outcome of elections, detect fraudulent data, suggest friends in a social network, determine the authorship of documents, and more. The approach is more than just a collection of “nifty assignments”; rather, it affects the choice of topics and pedagogy. This paper describes how our approach has been used at four diverse colleges and universities to teach CS majors and non- majors alike. It outlines the types of assignments, which are based on problems from science, engineering, business, and the humanities. Finally, it offers advice for anyone trying to integrate the approach into their own institution. The first version of the course was run in the summer of 2012, and the discussion of how it has been adapted to different contexts since then is particularly interesting. Some schools mandated extra in-class programming to get students over their fear of writing the wrong code; others used online self-paced resources, and all provided students with considerable starter code for the first assignments, and less as the course went on. One site also had to make adjustments for the realities of its learners: At Evergreen [State College], the difficulty of the material was challenging to students, some of whom worked multiple jobs or had to support families. These students could not devote their full attention to learning as much as younger students at traditional universities. To adapt the original data programming course, we only selected four of the assignments and subdivided each of those in half to create eight mini-assignments. Turn-in dates were flexible, and students were allowed to time-box their efforts (for example, 11 hours of outside time per week) to attempt as many problems as possible within that time. The extent to which students were able to complete assignments provided valuable data to adjust the difficulty of assignments for this demographic in the future. If you come across similar experience reports, or have some of your own, we would welcome posts on this blog. Read More ›

Lessons as Lab Protocols
Greg Wilson / 2016-01-03
A roundabout chain of references led me to Abbott et al’s “Programs for People: What We Can Learn from Lab Protocols” (presented at VL/HCC 2015) which looks at how lab protocols are similar to and different from programs. On the one hand, a lab protocol describes the steps to be followed to (for example) prepare a particular kind of sample for analysis. On the other hand, “human plans are more like descriptions or predictions than prescriptions of what actions a person will take given the most likely sequence of future actions; they are resources for anticipating likely future events. In programming language terms, perhaps, they are more declarative than imperative in that people draw on knowledge of the whole plan rather than blindly following each step.” Later, they say that, “…actors following a protocol have their own ‘trajectories’, that is, their own goals and priorities; they stray from the protocol to various extents, in various ways, for a variety of reasons.” And later still, they make a comparison to a particular piece of software many readers will be familiar with: Some existing software tools play a similar role to protocols as tailorable process descriptions. For example the Bioconductor project uses literate R programs, called ‘workflows’ and ‘vignettes’, to describe how various packages can be marshalled to perform tasks. In these programs the data file and task are merely illustrative examples; the program does not differ semantically from any other R program, but it is meant pragmatically to be used primarily as a resource for copy/paste creation of a new program to do a similar task. The authors’ main contribution is to “…analyze peer-reviewed protocols published in Cold Spring Harbor Protocols. Unlike personal or internal protocols, these represent programs written for people unknown to the author(s) of the program.” Each step is classified according to kind (physical, cognitive, measuring, etc.), precision (instruction, goal-directed, taks-directed, etc.), and features (advice, wiggle room, reference, etc.). One of their findings is that: A protocol describes an idealized course of action, but the person executing it will frequently deviate from this course by modifying, reordering, adding, or even skipping steps. By way of analogy, consider a program or protocol as describing a path. In a computer program, the path specified is very narrow, like a tightrope, which the computer follows precisely. In a human protocol, the path is much wider, and the person following it may wander freely from side to side, stop to smell the flowers, and so on. This was the point where the light came on for me. Everything they are saying about lab protocols can be said equally well for lessons. A lesson is one idealized path through particular material; everyone who actually delivers it will, as Abbot et al say, wander from side to side and stop to smell the flowers. Well-written lab protocols provide for this by explicitly including advice, constraints, expected outcomes, optionality, wiggle room, and contingencies to help users broaden the protocol, narrow it, or find their way back to the main path if they have wandered away: in short, they provide the kind of advice that we have been trying to accumulate in our instructors’ guides. I’d be very interested in hearing what other people think of this analogy, and whether there are ways we could use it to make our lessons more useful. As always, comments on this post are very welcome. Read More ›

Discussion Sessions
Raniere Silva / 2016-01-02
At the end of last year we announced a new checkout procedure for instructor training. This new procedure has “an hour-long group discussion led by an experienced instructor” and we are looking for instructors interested in lead the sessions. Read More ›

Welcome to 2016
Raniere Silva / 2016-01-01
The Mentoring Subcommittee hope that you and your family had a incredible Christmas and amazing New Year. We are slowly backing from holidays but we already have a busy agenda for January so we give a heads up for a few important dates: Mentoring Subcommittee Meeting on January 11th. This will be our first meeting this year and we will have some discussions about our plans for 2016. If you want to join us you will find informations at http://pad.software-carpentry.org/scf-mentoring. You can use the etherpad for suggestions or send them by email to mentoring@lists.software-carpentry.org. Lab Meeting on January 12th. Jonah will provide details of Software Carpentry activities for the next few months. Post-Workshop Debriefing Session on January 12th. If you taught one workshop in December or early January we will love to have your feedback. And you are welcome even if you didn’t. Deadline for stand for Steering Committee elections on January 15th. If you want to shape the future of Software Carpentry this is for you. I can’t say that will be a easy job but I can say that will be reaward to work together with amazing people from our community. For future activities please check this calendar. Happy New Year! And we will be looking to have some fun with you during some workshops. Read More ›

Plans for 2016
Greg Wilson / 2015-12-28
Twelve months is a long time. A year ago I was wrapping up my second month without an income and had serious doubts about whether Software Carpentry was going to be viable. Today, our new Executive Director and Program Coordinator are in place, our workshop administration tool is ticking over nicely, we're about to elect our second Steering Committee, we've published our lessons, we have a growing number of partners and affiliates, our reboot of instructor training has taught us a lot, and those are just the things I've bookmarked. Read More ›

New Words Needed
Greg Wilson / 2015-12-26
Titus Brown recently asked (on behalf of a friend) what people think of the term and practice of "hardening" software, by which he meant making research software more robust, more easily usable, and possibly scalable. Several people responded that "hardening" is usually used to mean "make more secure", and that what Titus's correspondent was really asking about was something we don't have a simple word for: making software ready for production use. As Morgan Taschuk wrote, this involves eliminating hard-coded paths, adding useful error messages, incorporating command-line flags to turn features on and off, and many other things that aren't really necessary if the program is only ever used by its author on her own machine and data, but are essential to institutional scale-up, i.e., to making the software usable by other people, in other places, on other problems. Read More ›

Assessment Update - 2015
Jason Williams / 2015-12-22
About a year ago, I hoped to help answer the question, "Is our students learning?" We are now a bit closer to an answer, and I wanted to update the community about what we have accomplished this year. First, I want to thank and acknowledge the assessment subcommittee and Blake Joyce, who helped to define tasks, possibilities, and provided significant feedback. Especially here I wanted to thank Daniel Chen, Jeramia Ory, and Rayna Harris who put in a lot of hours moving things along. Read More ›

Pushing Ahead in Puget Sound
Greg Wilson / 2015-12-20
As regular readers will know, our instructor training classes are heavily over-subscribed: even after the classes we ran this fall, we still have over 400 people on our waiting list, some of whom have been there for months. One of these is Billy Charlton, who works for the Puget Sound Regional Council. Keen to sharpen his team's technical skills, and unable to wait any longer for a training course, he taught himself as much as he could about what and how we teach, then went ahead and ran a workshop using our materials. His report on his experiences is inspiring, as is the feedback he received from participants, and a great way to end a busy year. Read More ›

Three Flavors of Instructor Training
Stephen Crouch, Christina Koch, Karin Lagesen, Aleksandra Pawlik, Fiona Tweedie, Greg Wilson / 2015-12-18
It's been a busy few months for instructor training: along with a two-day class at the University of Manchester, we also wrapped up a pilot of a new kind of multi-week course and a two-day version of the same material. We have also been training four new trainers, and have put the first complete draft of the training materials online. Here's some of what we think we've learned so far. Read More ›

Instructor Training Checkout Procedure
Greg Wilson / 2015-12-18
After a lot of discussion, we have come up with a new procedure for completing instructor training for Data Carpentry and Software Carpentry. Its goals are to ensure that people are familiar with the lesson material while introducing them to our community. The steps are: Read More ›

November 11 - December 16, 2015: 2016 Steering Committee Election, Projects, Instructor Training and More, Lessons, Workshop Feedback, and Data Carpentry is Hiring.
Anelda van der Walt / 2015-12-16
Highlights The Software Carpentry Foundation will hold its annual Steering Committee election in February 2016. Active Software Carpentry community members are invited to both stand for and vote in the elections. The first candidate for the Steering Committee elections is Belinda Weaver. Learn more about her history with Software Carpentry and see what she plans to do for the community in 2016. Want to contribute to Software Carpentry, but don't know how? There's a fantastic array of projects including development of new lessons, fixing up old ones, and much more. Jump in or contact us if you're interested in participating. Other News The past month saw a huge amount of Software Carpentry activities including: Instructor Training events and feedback Many Software Carpentry workshops were run and several people provided great write-ups about these. Write-ups are also available for a number of other workshops from which the Software Carpentry community could learn a great deal. A few exciting developments in Data Carpentry New lessons were developed and ideas to improve existing lessons were discussed Several other posts have been published by community members Read More ›

More on Educational Engineering
Warren Code / 2015-12-16
In my experience, the term "educational engineers" doesn't seem to have caught on to describe any particular activity, but I can provide some examples of people bringing ideas from the research to inform teaching. There's a range depending on what counts as "building". Read More ›

2016 Election: Belinda Weaver
Belinda Weaver / 2015-12-15
I co-organise, run and teach at Software Carpentry workshops in Queensland, Australia, and frequently tweet about them and other Software Carpentry activities as @cloudaus I first heard of Software Carpentry on Twitter in 2013 and thought it was a great initiative. Through the A/NZ mailing list, I made contact with Australian Software Carpentry trainers, and was able to organise the inaugural Brisbane bootcamp in July 2014, flying in instructors from Melbourne and Auckland. Read More ›

Educational Engineering
Greg Wilson / 2015-12-15
One of the participants in this week's instructor training course mailed me to say, "[We] were discussing some of the ideas we were talking about in educational research and I feel like we are missing 'educational engineers'. Does this discipline exist? I feel like we don't really have the people to take the research that's been done and build something with it." The answer is that we actually do have educational engineers: at the K-12 level, many of the people who write textbooks and other learning materials have lots of training in pedagogy, and the curriculum they create goes through the same sort of careful review that plans for a new dam go through. But that's usually not true in higher education: most of the people who write textbooks at the post-secondary level are domain experts with little or no training in pedagogy. And even in K-12, people with no background in education at all frequently overrule experts on both content and method. Read More ›

Teaching For Loops
Greg Wilson / 2015-12-10
A few days ago, Karin Lagesen asked people what metaphors they use when teaching for loops. It kicked off an entertaining thread. When we talk about pedagogical content knowledge (PCK) in instructor training, this is exactly the kind of thing we mean: Read More ›

Community Calendar Available
Raniere Silva / 2015-12-10
During this year we host many events for the community including lab meetings, sub-committees and de/briefing sessions. Unfortunately we fail to let everyone know of the events and to try solve this problem next year we are launching a community calendar. If you want to add the community calendar into your agenda you should use https://calendar.google.com/calendar/ical/oseuuoht0tvjbokgg3noh8c47g%40group.calendar.google.com/public/basic.ics. Read More ›

Software and Data Carpentry Workshop in Stockholm
Roman Valls Guimera / 2015-12-09
On April 14, three Software Carpentry Instructors from Stockholm, Radovan Bast, Olav Vahtras and myself got an email from Greg Wilson: Subject: Any interest in putting together a workshop for Stockholm this summer? Read More ›

Introducing the Research Bazaar
Damien Irving / 2015-12-09
One of the lessons Software Carpentry has learned over the years is that there's no substitute for a face-to-face learning experience. The unscripted interactions that occur between instructors, helpers and participants at our workshops are as important, if not more important, than the formal teaching syllabus itself. Of course, we aren't the only educator in the digital research space to which this lesson applies. The social aspect of in-person learning experiences is equally important to the various organisations, eResearch departments and libraries around the world tasked with "up-skilling" the next generation of digital researchers. Read More ›

Feedback on Practicum Proposal
Karin Lagesen / 2015-12-09
We have wanted for some time to give newly-trained instructors more experience with workshop materials and our teaching methods before sending them into the field. The mentoring committee and other community members have now created a proposal for doing this, and would like to test it out early in 2016 with participants in the two training session that ran this fall. We have therefore created a short questionnaire to find out whether our plan is feasible and takes us in the right direction. Read More ›

Software Carpentry and Data Carpentry on Podcast.__init__
Maneesha Sane / 2015-12-08
A couple of weeks ago I was interviewed by the Podcast.__init__ team to talk about Software Carpentry and Data Carpentry. The request for an interview actually came in a couple of months ago and I asked Greg Wilson about doing it, telling him I was a big fan of the podcast. He said he'd be happy to have me do it myself, giving me the chance to star in one of my favorite shows. I decided to do it, in pursuit of my goal of promoting Software Carpentry and Data Carpentry (and maybe more public exposure for myself). Read More ›

Call for Candidates for the 2016 Steering Committee
Jonah Duckles / 2015-12-08
From February 15-19, Software Carpentry will hold its annual election for the Steering Committee of The Software Carpentry Foundation. The inaugural Steering Committee has been an incredible team of dedicated community members working to position Software Carpentry so that it has a strong foundation and continued impact in scientific communities around the world. Please consider giving back to the community by standing for election! Read More ›

Bank of America Merrill Lynch to sponsor the first Workshop for Women in Science and Engineering in the UK
Aleksandra Pawlik / 2015-12-08
We are very happy to announce that the first UK Software Carpentry Workshop for Women in Science and Engineering received generous support from Bank of America Merrill Lynch. The event will take place on 14 and 15th December at the University of Manchester. Read More ›

Announcing Instructor Training Materials
Greg Wilson / 2015-12-07
We are pleased to announce that the lessons for our instructor training course are now available online and in this GitHub repository. They have evolved a lot in the three and a half years since we started running this class, and more improvements would be welcome: please file issues or send pull requests to fix, extend, or otherwise improve the material. Read More ›

Launching Pre Workshop Help Session
Raniere Silva / 2015-12-04
Organizing one workshop isn't easy and sometimes things out of your control go wrong in the last minute like the internet access on the room is very slow. After months of discussions of what could be done to help instructors and hosts when organizing their workshops and being prepared for the show time the Mentoring Subcommittee decided to test some helping sessions. Read More ›

Intel to sponsor the first Workshop for Women in Science and Engineering in the UK
Aleksandra Pawlik / 2015-12-03
We are very happy to announce that the first UK Software Carpentry Workshop for Women in Science and Engineering received generous support from Intel. The event will take place on 14 and 15th December at the University of Manchester. Read More ›

First Workshop in Venezuela
Francisco Palm / 2015-12-03
Last November Francisco Palm taught the first ever Software Carpentry workshop in Venezuela, the second country from Latin America to enter in our list of previous workshops. We are very excited with our expansion across Latin America and we hope to add more countries to that list next year. Learners from the first workshop in Venezuela. Francisco allowed us to share some of his words about the workshop: Read More ›

Software Carpentry workshop at EITN in Paris
Bartosz Teleńczuk / 2015-12-02
The Python workshop on November 19-20th at the EITN was the first course organised at European Insitute of Neuroscience in Paris. The course offered a practical introduction to Python programming and scientific software development for PhD students and post-docs. The programme followed closely the syllabus recommended by the Software Carpentry. Read More ›

Data Science for Social Good: an Experiment in Data Science Training
Ariel Rokem, Micaela Parker, Sarah Stone, Anissa Tanweer / 2015-12-01
Data science faces many challenges in the traditional academic setting. At the same time, many research fields are becoming increasingly dependent on data science tools and techniques. A key element in tackling these challenges is the education of a new generation of researchers that are fluent in both their research domain and in data science methodologies. In this post, we discuss an immersive approach to training in data science, the University of Washington eScience Institute's inaugural Data Science for Social Good (DSSG) program. Read More ›

Software Sustainability Institute Funding
Greg Wilson / 2015-12-01
The Software Sustainability Institute, a partner organization which is committed to cultivating world-class research through software, has received £3.5M in funding to continue its valuable support for the UK's research software community. Two new funders, the Biotechnology and Biological Sciences Research Council (BBSRC) and Economic and Social Research Council (ESRC), have joined forces with the Institute's original funder, the Engineering and Physical Sciences Research Council (EPSRC), to continue to invest in research that is underpinned by software until at least 2019. For the full announcement, please see this post on the SSI site. Congratulations! Read More ›

December Instructor Training - Announcing Selected Groups
Raniere Silva / 2015-11-30
Thank you to all the teams that applied for instructor training that sent us applications. Selection of the teams was a very difficult tasks because the applications were excellent, we had a small amount of time to discuss all the applications and we need to improve capacity to support the demand we see for instructor training. The mentoring subcommittee has selected the teams that inform at their cover letters that booked a room for the instructor training at Shrum Science Centre Physics, Vancouver, Canada The University of Texas at Arlington, Dallas, US James Cook University eResearch Centre, Brisbane, Australia University of Wisconsin–Madison, Madison, US Universidade Federal do Paraná, Curitiba, Brazil Aristotle University of Thessaloniki, Thessaloniki, Greece University of Toronto, Toronto, Canada Max Planck Institute, Berlin, Germany Institute Unite de Neurosciences, Information et Complexite, Gif-sur-Yvette, France We already notified all lead groups of the status of their application but if you didn't head anything please contact us. If you have any question or comments about the selection process please contact me or Jonah Duckles, Software Carpentry's Executive Director. Statistics We had applications from 44 teams and we could accommodate 12 total teams that will be trained by Christina Koch, Fiona Tweedie, Greg Wilson and Steve Crouch. More than half of the applications came from the US. Canada is the country with the highest number of accepted teams because we received application from three very small teams in close proximity that we merged into one. In terms of individual applicants, the teams were comprised of more than 200 people and we accepted 49 to the instructor training in December. The below chart shows the gender balance of the December Instructor Training. Source code and data The source code for the plots are available here as a Jupyter Notebook and the CSV files used on it are available here and here. Both CSV files are a copy of the applications that we received without names because of privacy concerns. Note: We forgot to ask the gender to applicants so the numbers in the CSV is just a guess based on their names. Read More ›

Assistant Director Position with Data Carpentry
Tracy Teal / 2015-11-30
Data Carpentry seeks a full-time Associate Director to lead the organization’s community engagement and education activities, to cultivate a healthy supportive community and provide mentorship and training to current and future instructors. The Associate Director is one of the two key roles providing leadership to Data Carpentry’s core efforts and is expected to shape the organization’s operational functioning, influence training, and contribute to strategic planning. Data Carpentry is a not-for-profit organization developing and teaching workshops on fundamental data skills needed to conduct research. Its mission is to provide researchers high-quality, domain-specific training covering the full lifecycle of data-driven research. Data Carpentry lessons are intentionally domain specific, and span the life, physical, and social sciences. Data Carpentry workshops create an environment friendly to learners who have little to no prior computational experience, and are designed to empower researchers to apply the skills learned to data driven discovery in their own research. The Associate Director position is initially funded for 2 years through a grant from the Gordon and Betty Moore Foundation. They will be hired and paid as a contractor of NumFOCUS, Data Carpentry’s fiscal sponsor. Review of applications will begin December 18th, 2015, and the position will remain open until filled. Read More ›

December Instructor Training Selection Debrief
Jonah Duckles / 2015-11-25
Many of you who applied for instructor training in December are undoubtedly receiving disappointing news. We had 250 applicants on 45 teams from a wide range of locations. I'd like to talk through a few things about this process and how it played out as I think it will help the community understand the situation we were in with instructor training demand and our ability to meet demand. Read More ›

2015 Software + Data Carpentry Instructor and Helper Retreat
Tiffany Timbers / 2015-11-24
Software and Data Carpentry recently held their first ever Instructor + Helper Retreat. The aim of this event was to bring instructors and helpers together remotely and at local sites around the world for a day of sharing skills, trying out new lessons and ideas and discussing all things instructor and helper related. This Retreat attempted to meet these goals via 17 local meetups in North America, Europe and Australia, 14 sessions broadcast globally via Google Hangouts on Air, and many people participating remotely by watching and asking questions at the global broadcast sessions. In addition to bringing the community together, the Retreat also has generated additional resources surrounding many topics of interest to Software and Data Carpentry instructors and helpers; the global broadcast sessions are archived on YouTube (pending session leaders leaving them up) and can be watched at any time. The links to these videos are archived on the Retreat website. Notable other moments from retreat as captured via twitter can be accessed here. Read More ›

2015 Post-Workshop Instructor Debriefing, Round 22
Kate Hertweck, Christina Koch / 2015-11-24
There were relatively few attendees at the 22nd round of debriefing for instructors on November 10, but we thought we'd include a few quick notes about two specific topics: using Software and Data Carpentry workshops to reach out to potential users of high-performance computing (HPC) resources, and exploring new lesson materials related to databases. High Performance Computing University of Manitoba hosted a Software Carpentry workshop at the end of October in conjunction with Compute Canada. In addition to the normal half-day lessons in Unix shell, Git, and intro to Python, this workshop featured an introduction to using WestGrid/Compute Canada resources on the afternoon of the second day. The instructor for this workshop, Hossein Pourreza, said that this last lesson module was used as a capstone for helping reinforce and integrate material from the previous lessons, and appeared well-received by the students. Given that there are few training opportunities for using the cluster, this appears to be a great way to introduce new potential users to the basic tools they'll need to get started with larger-scale analysis. This recent workshop is a great example of catering the lesson materials to meet the needs of a particular audience. There are some resources currently available for outlining essential skills for remote computing. For example, Data Carpentry has a genomics lesson involving cloud computing under development. Databases Continuing the theme of piloting new material, a recent workshop at the Pacific Northwest National Laboratory included a short lesson on MongoDB, added at the end of the official Software Carpentry SQL material. The motivation for including MongoDB (an example of a NOSQL database) was the instructor's own use of it in his work, and it's growing influence in scientific computing. It was hard to draw conclusions about the value of introducing NOSQL, mostly because there isn't enough time in a 2-day workshop to do git, shell, Python *and* databases, especially with the lessons as written. If instructors want to include databases (SQL or otherwise) in their workshops, they should be aware that the Python lesson will probably have to be significantly shortened to fit into half a day. This could be a good strategy for workshops with a specific audience, where all the participants are not novice programmers. Alternatively, if a workshop is being taught with local instructors, databases could be a follow-on day or half day after the first two days. Question for the community: do other instructors use NOSQL databases in their daily work? In what circumstances is it a useful tool or skill? Thanks We are grateful to the instructors who attended debriefing sessions this round: Hossein Pourreza Donny Winston Read More ›

A New Lesson on Testing
Greg Wilson / 2015-11-22
Katy Huff has just posted a new lesson on testing and continuous integration with Python drawn in part from the book that she and Anthony Scopatz recently published. There's a lot of useful material in here, all of it tested in the classroom—please check it out. Read More ›

The Morea Framework
Greg Wilson / 2015-11-20
I first met Philip Johnson, a professor of Computer Science at the University of Hawaii, through shared interests in empirical software engineering research and Google Summer of Code. He has recently been developed the Morea Framework for creating structured course websites using GitHub and Jekyll. "Morea" stands for "Modules, Outcomes, Readings, Experiences, and Assessments", which are the five main elements the framework supports. As you can see from the project gallery, it's much more structured than our lessons. It also requires more tooling—Morea Framework sites are built using custom Jekyll plugins, and the source relies much more heavily on include files than our template—and it's geared very strongly toward traditional semester-long courses. I'm really impressed with the thought that's gone into Morea, and would enjoy hearing what you think. Read More ›

Applications for December Instructor Training Are Now Closed
Greg Wilson / 2015-11-20
Applications to take part in December's two-day instructor training class are now closed. We received more than three dozen applications from four continents, including over 400 people, and will let people know early next week whether they have been selected. Groups that we can't include in this round will be given priority to take part in the new year. Read More ›

Test-Driven Data Analysis
Greg Wilson / 2015-11-19
My former colleague Nick Radcliffe has started posting a series of article on test-driven data analysis, in which he explores systematic ways of checking whether one-off data analyses are correct. It'll be really interesting to see how the series unfolds, and comments on the posts would be very welcome. Read More ›

rOpenSci Announces $2.9M Award from the Helmsley Charitable Trust
Karthik Ram / 2015-11-19
rOpenSci, whose mission is to develop and maintain sustainable software tools that allow researchers to access, visualize, document, and publish open data on the Web, is pleased to announce that it has been awarded a grant of nearly $2.9 million over three years from The Leona M. and Harry B. Helmsley Charitable Trust. The grant, which was awarded through the Trust's Biomedical Research Infrastructure Program, will be used to expand rOpenSci's mission of developing tools and community around open data and reproducible research practices. Read More ›

Python Lesson Rewrite
Matt Davis / 2015-11-15
Recent analysis and introspection regarding Software Carpentry's Python lesson has led us to conclude that the lesson would benefit from a complete rewrite. The subcommittee on lesson development met twice to plan much about the new lessons (yes, that's plural). This post will describe our plans for the new lessons. We're going to write two new Python lessons, one intended for students who are completely new to programming (the "novice" lesson), and another for students with some programming experience in any language (the "intermediate" lesson). Data Carpentry also targets new programmers so we'll be developing the new novice lesson in collaboration with DC, and both DC and SWC will use the lesson. We decided to use Gapminder data for both the novice and intermediate lessons in order to have a consistent experience and approachable, meaningful data (and because some lessons are already using it). Read More ›

A Practical Computing Course
Steve Haddock / 2015-11-15
While it was not a SWC course, this summer Casey Dunn and I taught a 12-day Practical Computing summer class at Friday Harbor Labs. It was a great group of students—mostly somewhat beginner level—and we covered regular expressions, the shell, Python, R + ggplot, Git, graphics, and bit of electronics. The centerpiece (which the students really got into) was a personal project that was applicable to their own interests and which they solved with some combination of the tools we covered. Many of the students went from zero experience to having a script that actually gave them insight into the real-life research back home. Read More ›

Miscellaneous Projects
Greg Wilson / 2015-11-15
This post is a bit of a link fest, but after talking about how to contribute at yesterday's instructor retreat, I thought it might be useful to post a few additions to our projects page. Read More ›

CourseSource: A(nother) New Hope
Greg Wilson / 2015-11-15
I came across CourseSource a few weeks ago, and I'm pretty excited: CourseSource is an open-access journal of peer-reviewed teaching resources for undergraduate biological sciences. We publish articles that are organized around courses in biological disciplines and aligned with learning goals established by professional societies representing those disciplines. Read More ›

2015 Post-Workshop Instructor Debriefing, Round 21
Kate Hertweck, Tiffany Timbers / 2015-11-13
The mentoring subcommittee held the 21st round of debriefing for instructors on October 27. We had some great discussions with a few instructors about recent workshops and their experiences with implementing a module on imposter syndrome and methods for integrating lessons throughout the workshop. Read More ›

Data Carpentry Instructor Certification for Software Carpentry Instructors
Tracy Teal / 2015-11-12
Re-posted from the Data Carpentry blog. Data Carpentry has been growing over the last year, and now with the addition of Maneesha Sane as Program Coordinator, we have the chance to run even more workshops in more domains. To date, we have used some great instructors from the Software Carpentry community as teachers; what we would like to do now is put that on a more regular footing. We want to update the instructor records to see who’s interested and qualified to teach Data Carpentry, so we’ve created a Data Carpentry Instructor FastTrack certification. If you have already taught a Data Carpentry workshop, you’re registered as a Data Carpentry instructor and don’t need to go through the FastTrack certification process. If you are currently a Software Carpentry instructor and would like to teach core data wrangling skills to people who are new to computing, this program is for you! Read More ›

October 27 - November 10, 2015: Maneesha Sane, Retreat Activities, Instructor Training, Code Review Revisited, and a WiSE Workshop.
Anelda van der Walt / 2015-11-10
Highlights Meet Maneesha Sane, the Software Carpentry (and now Data Carpentry) course coordinator. Our instructor/helper/community retreat is planned for 14 November with participating sites around the world. Line-ups include roundtable discussions, open house sessions, Worldwide Library Hour, and more. Join us virtually or live for the whole day or only the sessions of interest. Conversations Do you have any experience with code review? David Pèrez-Suárez described his experience over ten years and asks some interesting questions. Read More ›

R Foundation Announces Code of Conduct Policy
Kara Woo / 2015-11-08
The R Foundation recently announced that all conferences it supports must have a code of conduct. They encourage other R meetings to adopt codes of conduct as well, stating that: A code of conduct serves two important purposes. Firstly, it sends a clear message to those outside the community that an R conference is a professional and comfortable working environment for all participants. Secondly, it provides a mechanism for reporting and monitoring any incidents of harassment that may occur. Read More ›

Clarification about December Instructor Training
Raniere Silva / 2015-11-06
Since we announced that the applications for December Instructor Training we have received many questions. We are now amending the announcement to clarify some things that have come up. Read More ›

Teaching Bimodal Workshops with a Large Range
Daniel Chen / 2015-11-05
Teaching a technical class always has its challenges. Teaching a technical workshop covering 3 or more topics over the span of 2 days is what makes us Software Carpentry instructors. A common observation instructors have is the class demographics are bimodal/multimodal. There are students who attend the workshop with little to no experience, some have been using the tools we teach but want a more formal course, and finally, a student may know some of the tools, but use the workshop to learn Git or Python, for example. How do we as instructors deal with workshops where the range of student skills is extremely high? This post is a case study of my most recent workshop and a call for help and discussion about we should do to make our workshops enjoyable for everyone. In many ways it is a complement to Peter and Cam's post Pulling In Those Left Behind. Read More ›

Introducing Maneesha
Maneesha Sane / 2015-11-02
This is a long overdue post introducing myself to all of you. I've met many of you (at least virtually!) over the past few months but I wanted to formally introduce myself to the Software Carpentry and Data Carpentry community. Read More ›

2015 Post-Workshop Instructor Debriefing, Round 20
Kate Hertweck, Raniere Silva, Rayna Harris / 2015-11-01
On October 13th, the mentoring subcommittee held the 20th round of debriefing for instructors. At this round, we have a great discussion text editors, workshop organization and challenges for learners. Adrianna Pińska wrote a great post about the workshop in Johannesburg that you should check out. Read More ›

Pulling In Those Left Behind
Peter Steinbach, Cam Macdonell / 2015-10-29
A common challenge that arises before or during a workshop is that participants' prior expertise in programming or broadly speaking their abilities of using computers for science is distributed randomly. At best, this distribution peaks at the expectations of the instructor. Usually, this distribution is quite wide and thus a considerable portion of the participants do lack the necessary predispositions for the workshop level, or have a slower learning rate or simply are too shy to ask questions. I recently taught a follow-up workshop to the Software Carpentry (SWC) Novice material and struggled to keep the pace of teaching at a level so all learners would come along. Given the feedback on the SWC mailing list (see the original post), this problem occurs quite often. Thus, this blog post is a summary of the discussion initiated among fellow SWC instructors on how to pull in those learners again that fall behind or how to pace/design a course so that a minimal portion of learners fall behind. Read More ›

October 19 - 26, 2015: 500 Workshops and 16,000 Participants, Retreat, Debriefing Sessions, Digital Data Storage, and A Science Competition.
Anelda van der Walt / 2015-10-28
Highlights The Software Carpentry team has now run more than 500 workshops for 16,000 participants! Remember to join our global instructor/helper retreat on 14 November. If you care about training, join a site near you or dial in via Google Hangouts. Dates Please take note of the planned dates for upcoming debriefing sessions for instructors. Publications Ten Simple Rules for Digital Data Storage: A new article on the importance of considering data storage and metadata tagging that started as a discussion on the Software Carpentry mailing list. Opportunities Are you building products or developing services to advance Open Science? Read more about the global science competition launched by the Wellcome Trust and NIH. Read More ›

Site Planning for the Instructor and Helper Retreat
Bill Mills / 2015-10-28
With less than three weeks to go before Software Carpentry and Data Carpentry's first ever Instructor and Helper Retreat coming up on November 14, many of the lead organizers for sites around the world got together this week to discuss plans and ideas for the event. Our goal for the retreat is to create opportunities for instructors, helpers, and the community to get together to practice teaching, break ground on new lessons, and start discussions. Here are just a few of the ideas coming up that you can get involved with. Read More ›

2015 Post-Workshop Instructor Debriefing, Round 19
Raniere Silva / 2015-10-28
On September 29th the mentoring subcommittee held the 19th round of Instructor Debriefing and had some interesting discussions about our lessons and opportunity for learners after the workshop. Read More ›

Code Review - a Needed Habit in Science
David Pérez-Suárez / 2015-10-28
More than a year ago, Marian Petre and Greg Wilson wrote an article about reviewing code, but I was not aware of it until last week when Greg talked about their findings in a workshop in London (though it had been mentioned here before). I have been reviewing code for years, however I have not realised that I was doing so till GitHub made it easy. Reading code from others feels the same as proofreading a Spanish text (or any other text), where you have to pay attention to the orthography, the grammar and whether it says what it's meant to say (or it does what it's meant to do). Read More ›

Software Carpentry for Women in Science and Engineering UK
Aleksandra Pawlik / 2015-10-28
The Software Sustainability Institute in collaboration with ARCHER and Women in HPC is organising the first Software Carpentry workshop for Women in Science and Engineering (WiSE) in the UK. The event will take place at the University of Manchester on 14-15 December 2015. Read More ›

Visualizing Repository Activity
Greg Wilson / 2015-10-27
I am updating the lessons learned paper, and would like to include histograms showing how many people have contributed how often to our lessons. More specifically, I have 9 data sets (one for each lesson), ranging in size from 5 to 16 records, in which each record shows a number of commits and how many people have committed that often. For example, the data for our SQL lesson is: Read More ›

Call for Applications to December Instructor Training
Raniere Silva / 2015-10-26
This blog post was amended with the information from this other post. This blog post was updated with the answers that we got so far. As we announced previously, we will run another round of the free/open Instructor Training in December. People who are interested in participating in this round can now apply to take part. Read More ›

Recent Statistics
Jonah Duckles / 2015-10-25
At some point in the last few weeks, we ran our 500th workshop for our 16,000th learner: Read More ›

Debriefing Sessions and Winter Recess
Raniere Silva / 2015-10-24
We are getting close to the begin of 2016 and although there will be some workshops between Christmas and New Year the Mentoring Subcommittee will give a winter recess for the debriefing session. But there will be some debriefing sessions before then: Read More ›

Ten Simple Rules for Digital Data Storage
Greg Wilson / 2015-10-23
Edmund Hart, Pauline Barmby, David LeBauer, François Michonneau, Sarah Mount, Timothée Poisot, Kara Woo, Naupaka Zimmerman, and Jeff Hollister have just posted a pre-print on PeerJ titled Ten Simple Rules for Digital Data Storage. The paper is a distributed collaborative effort spawned from a thread on the Software Carpentry instructors mailing list and further carried out on GitHub. There are a lot of good ideas in it, many of which we should fold back into our lessons, and we hope it will spark more collaborations in our community. Read More ›

Programming Historian Live
James Baker / 2015-10-21
Originally posted at http://cradledincaricature.com/2015/10/21/programming-historian-live/. On 19 October curious historians descended on the British Library for Programming Historian Live. The Programming Historian is a suite of open access, peer reviewed lessons that provide practical instruction to historians thinking about using data, code, and software in their research. It is co-edited by 2013 Software Sustainability Institute Fellow Adam Crymble and it does an amazing job at bringing the methods and motivations of the (small but growing) Digital History community to the wider historical profession. This "Live" spin-off, funded by my 2015 Software Sustainability Institute Fellowship, was designed to take into account the fact that whilst some of us learn just fine through self-directed tutorials, others need the mental space, in person support, and peer pressure of seminar style learning. Read More ›

Open Science Prize
Greg Wilson / 2015-10-21
The Wellcome Trust and National Institutes of Health have launched a global science competition for new products or services to advance open science. Up to six teams stand to win US$80,000 each to develop their ideas into a prototype or to advance an existing early stage prototype. The prototype judged to have the greatest potential to further open science will receive $230,000. For more information, see the announcements on the Wellcome site and from the NIH, or visit the prize website. Read More ›

Inserting Software Carpentry Graduates into Coding Communities
Damien Irving / 2015-10-19
One of the issues I'd like to see Software Carpentry tackle is what happens to learners after they've attended a workshop. Read More ›

October 5 - 18, 2015: Jonah Duckles, Instructor/Helper Retreat Still Growing, Peer Reviewed Lessons, Data Management, and AMY 1.0 Released.
Anelda van der Walt / 2015-10-18
Highlights Meet our new executive director, Jonah Duckles. 14 sites worldwide have registered for the first ever Software Carpentry and Data Carpentry instructor and helper retreat on 14 November. If you care about training, join a site near you or dial in via Google Hangouts. Lessons The obvious next step for the Software Carpentry lessons: Peer review Contribute Rayna Harris suggested we look at how Software Carpentry and Data Carpentry can play a role in teaching or disseminating information about data management plans. Please let us know if you are aware of good resources or would like to discuss how we can contribute as community? AMY 1.0 was released. Django web programmers are welcome to join the development team. Read More ›

Journals as Repositories
Greg Wilson / 2015-10-17
I had a really good conversation yesterday with Cath Brooksbank and Sarah Morgan, who do training at EMBL-EBI in Hinxton. During the conversation we touched on CourseSource, a peer-reviewed journal in which people can publish undergraduate biology lessons—not studies of the lesson's effectiveness, but the lessons themselves. This is a brilliant idea, and thinking about it has made me realize why I've never been excited about online lesson repositories. We already have repositories for the things academics do: they're called journals. And we have portals (or aggregators, or whatever you want to call them): they are things like PubMed. What we don't have is people putting things into the system in the first place. Growing a separate parallel system to do those things for lessons hasn't worked: as far as I can tell, most of what's uploaded to lesson repositories just sits there. Read More ›

2015 Post-Workshop Instructor Debriefing, Round 18
Raniere Silva, Sheldon McKay / 2015-10-15
On September 15th, the mentoring subcommittee held the 18th round of debriefing for instructors. At this round, we have a great discussion about teaching (or not) branches during the Git lesson and about Python packages used in examples. Read More ›

Feedback from a Software Carpentry workshop at PyConZA 2015 in Johannesburg
Adrianna Pińska / 2015-10-14
On the 3rd and 4th of October we ran a Software Carpentry workshop associated with PyConZA 2015, a conference for Python developers in South Africa. The workshop ran concurrently with the post-conference sprints, in a neighbouring room in the same venue. Because we expected little overlap between the attendees of the conference (who are mostly existing Python programmers) and the target audience of the workshop, we organized the workshop as a separate event with separate tickets. The turnout was very low, which may have been caused by our delay in advertising the workshop through academic research channels, and by our limited academic contacts in the Gauteng area. Although an instructor from the University of the Witwatersrand was initially involved in organisation of the workshop, the timing made it difficult for him to participate. In future, I think that we should seek more up-front interest from local academic institutions. We have considered other possible causes of the low turnout, such as an inconvenient time, location, price or payment method -- but we received no specific feedback regarding any of these. We ended up with four attendees who were present for most of the time, one attendee who was present for part of the second day, one helper and two instructors. Most of the attendees had heard of the workshop through the conference. The turnout was more or less in line with what I would have expected for a tutorial session attached to the conference. Despite the low turnout, I thought that the workshop went quite well. The atmosphere was a lot more informal than it would have been with a larger audience, and we took more questions during the course of the lessons. Because there were so few learners we were able to give them more individual attention and assistance, and towards the end of the last day we spent some time looking at specific problems that they wanted to solve in their work. We also had more freedom to shift the schedule around. Our original plan was to have the Bash lesson on Saturday morning with the first half of the Python lesson in the afternoon, and the second half of the Python lesson on Sunday morning followed by Git. We ended up doing Bash for 3/4 of the first day, which allowed us to go as far as scripting, although unfortunately we had to omit the find and grep component. We spent the rest of the day introducing Python. On the second day, we started with a brief Git primer, so that during the rest of the Python lesson we could demonstrate how to version control the evolving snippet of code used to plot the data in the exercises. We then continued with Python, and managed to cover most of the practical material, although we had to abbreviate some of the later chapters. For the last quarter of the day we covered the more advanced material in the Git lesson, and for the last half an hour we assisted the two remaining attendees with specific tasks: writing a script to process CSV data, and setting up a GitHub repository. I am not entirely happy with the default Git lesson materials, which is why this lesson ended up with the most modifications. I understand the logic of introducing local repositories before discussing remotes, but I feel that this obscures the most common and expected use case of version control (a remote backup of changes) and overcomplicates setup (it is much cleaner and simpler to create a repository on GitHub and clone it to a local machine than to create a local repository and push it to GitHub, and this is the approach we took in our morning quick-start introduction). Although we still ended up having to skip some parts of the lessons because we were short on time, I'm much happier with the amount of material that we covered, compared to what I managed during the last workshop where I was an instructor -- I am very likely to extend the Python lesson to two sessions in future. I also enjoyed the environment of the smaller workshop, and would find it valuable to run a variant like this again, but perhaps by design rather than by accident. Feedback received from attendees in the post-course survey: The good: "Instructors were good. Food was good. Setting was good." "Excellent coverage of the basics. Small workshop allowed a lot of attention to individuals." "The material covered were really helpful and the assistance provided was outstanding. The supervisors were outstanding with their communication and teaching the material. They were also very friendly which made the learning experience all the more enjoyable. Furthermore the supervisors have given their time to assist us which I am all the more appreciative! Thank you!" The bad: "Sometimes the instructors over elaborated on some issues." "It would have been nice if there was air conditioning as the room temperature became relatively unbearable, which made concentrating an effort." Read More ›

Assessing Assessment
Rayna Harris / 2015-10-10
Jason Williams and I met with two consultants at the University of Texas at the end of September to get feedback on Software Carpentry's post-workshop survey. They gave us detailed suggestions for improving six of the questions, and felt the rest were OK as they are. The feedback is given below; even without the whole questionnaire (which we will post shortly), we hope it's helpful. Read More ›

AMY 1.0 Released
Piotr Banaszkiewicz / 2015-10-10
We are very pleased to announce the release of Version 1.0 of AMY, our online workshop administration tool. A complete list of features is included in this blog post, and this milestone describes what we're planning to do for Version 1.1. If you enjoy web programming in Django, you'd be welcome to join in. Read More ›

A Summary of Debriefing Feedback on Our Python Lesson
Alistair Walsh / 2015-10-08
Last month, we discussed results from a survey of how our instructors are teaching Python. We now have a summary of the feedback we've received in our bi-weekly debriefing meetings. The recurring themes are: A greater choice of exercises and multiple choice questions would allow instructors to select domain specific examples and cater to varying levels of learner. Some instructors added an explanation of the Jupyter notebook or Spyder IDE environments. Some instructors added an explanation of basic Python datatypes before presenting the lessons. Comments that there is too much material to fit into a workshop and that some sections seem rushed. Request for a better explanation of the advantages of the Anaconda Distribution at the start of workshops and resources for post workshop learning. Comments on presenters style were positive for funny and entertaining examples and negative for highly mathematical examples. Read More ›

Data Management Plans: A Role for Software and Data Carpentry
Rayna Harris / 2015-10-07
I spent the better part of the last three weeks working on an NSF-IOS Doctoral Dissertation Improvement Grant (DDIG) proposal. Pretty much daily, I consulted this list of publically available grant proposals in the biological sciences to look at other people's proposals. It's an awesome resource if you want to see how people write their project description, but there are no links to example data management plans, facilities, summaries, etc. Where does one go for examples of or advice on these supplementary documents? At least part of the answer is "here". The last page of NSF's information about Data Management Plan requirements, updated on October 1, urges readers to check out Data Carpentry and Software Carpentry for resources and training. This is a huge shout-out (see these tweets), so how can SWC and Data Carpentry do more? Read More ›

A Workshop in Brisbane
Belinda Weaver / 2015-10-07
A sea of green stickies: that really sums up the 28-29 September Brisbane Software Carpentry bootcamp—it ran extremely smoothly. We had 40 people signed up (with a waitlist of six) but lost one at the last minute to acute appendicitis—ouch—and another to project deadlines. The remaining 38 comprised post docs (7), Master's candidates (2), PhD candidates (19), research technical support (4), and four people from industry (one from an NGO and three from a virtual lab). We also had one undergrad (a first for us) and one very bright high school student. Read More ›

More About Jonah Duckles
Jonah Duckles / 2015-10-06
I'm thrilled to be stepping into the role of Executive Director of the Software Carpentry Foundation. I come to you as an experienced instructional community member excited to share my administrative, grant writing and professional background to help the Software Carpentry Foundation become sustainable and reach exciting new goals. Read More ›

September 20 - October 4, 2015: A New Executive Director, Instructor and Helper Retreat, Data Visualisation Lesson, Teaching, and Lesson Citations
Anelda van der Walt / 2015-10-04
Highlights Please welcome our new executive director, Jonah Duckles! Please join us on 14 November for the first international multi-site instructor and helper retreat. The blog post explains how you can get involved or host a site. Lessons Read about the new lesson on Data Visualisation with D3 and the pros of interactive data visualisation in Isabell Kiral-Kornek and Robert Kerr's post. Contribute We're actively revising our teaching practice and lesson design. The latest feedback from people who've taught the Python lessons is summarised and there's another opportunity for you to add your thoughts via a short survey or longer discussion. We'd love to hear from you! We're still talking about the citation format for our lessons. You can follow the latest discussion and contribute your suggestions at the blog post. Read More ›

A Case for Online Data Visualization
Isabell Kiral-Kornek, Robert Kerr / 2015-10-02
This article originally appeared in The Research Bazaar. Thorough data analysis is only one part of good research. Equally important is communicating the outcome well and accessibly. And visible research is accessible research. Our main motivations for publishing our research results are: making them openly accessible to the public, informing fellow researchers about new outcomes that will help them in their research, and strengthening our professional profiles. Read More ›

Please Welcome Our New Executive Director
Greg Wilson / 2015-10-01
We are very pleased to announce that Jonah Duckles has accepted the position of Executive Director of the Software Carpentry Foundation, and will start on Monday, October 5, 2015. Jonah was most recently the Director of Informatics and Innovation at the University of Oklahoma where he partnered with researchers to improve their computational workflows while developing maker spaces for the campus. He holds a BS in Physics and an MS in Forestry and Natural Resources, both from Purdue, and has been a very active contributor to Software Carpentry for several years. Read More ›

Citation Format
Greg Wilson / 2015-09-30
Earlier this month, we published our lessons by giving them DOIs through Zenodo. As we said in an earlier post, though, we've been struggling to figure out (a) how to cite them in text and (b) how to express their metadata in standard bibliographic formats to produce those human-readable citations. Read More ›

Thinking About Teaching
Greg Wilson / 2015-09-28
A little over a year ago, we blogged about jugyokenkyu, or "lesson study", a bucket of practices that Japanese teachers use to hone their craft, from observing each other at work to discussing the lesson afterward to studying curriculum materials with colleagues. Getting the Software Carpentry Foundation up and running almost immediately pushed that aside, but now that the SCF is up and running, it's time to return to the subject. Discussion of how teaching practices are transferred is part of that; so are two other developments this week. Read More ›

Announcing the 2015 Instructor and Helper Retreat
Bill Mills / 2015-09-22
Update: to stay current, please visit the Etherpad at https://etherpad.wikimedia.org/p/swc-instructor-helper-retreat-2015. The Mentorship Committee is very excited to announce the first ever Software & Data Carpentry Instructors & Helpers Retreat, happening worldwide on November 14. We're inviting all Software and Data Carpentry instructors and helpers to get together at sites around the world for a day of sharing skills, trying out new lessons and ideas and discussing all things instructor and helper related. Read More ›

September 6 - 19, 2015: A Retreat, New Mentors, Instructor Training Update, Preparing Researchers, Interactive Excercises, and a Student's Experience.
Anelda van der Walt / 2015-09-19
Highlights The first Software Carpentry and Data Carpentry instructor and helper retreat will be held on 14 November 2015! Please get involved by hosting a local session or attending one. Remote participation will also be possible. We have seven new members on the Mentoring Subcommittee. Thanks to Rayna Harris, Christina Koch, Sue McClatchy, Mariela Perignon, Phil Rosenfield, Michael Sarahan, and Belinda Weaver for joining us in this important task. Instructor training has been revised. Read the blog to learn about the new plans for 2015 and beyond. Jazib Askari, a remarkable 6th form Student at Altrincham Grammar School for Girls wrote about her Software Carpentry work experience. Recommended Christina Koch mentioned some fun interactive excercises for explaining concepts related to automation and version control in her latest blog. Contribute "What would you teach to prepare researchers better to do good science given the changing technological landscape?" An interesting post by Naupaka Zimmerman about being wrong, provenance, and structured data. What are your ideas? Events Join the Software Credit Workshop on 19 October 2015 in London to participate in the conversation about career advancement for research software developers. Read More ›

Software Credit Workshop in London, 19 October 2015
Shoaib Sufi / 2015-09-18
Securing credit for research software is the subject of the Software Credit Workshop. This is taking place at the Natural History Museum, London, UK on the 19th October 2015. Explore what contributions software can and should make for career advancement. Discuss ways in which you see software tools and applications supporting the current needs of researchers and software developers seeking credit for their involvement in research software. Identify and propose ideas for improving the way software’s contribution to better research are connected and how this should support appropriate reputational credit for research software enablers. Join other funders, publishers, software developers, researchers, leaders, citation experts and altmetrics visionaries to make your thoughts heard and shape the conversation. Find out more and register at www.software.ac.uk/software-credit. Read More ›

Software Engineering Practices in Science
Greg Wilson / 2015-09-16
Dustin Heaton and Jeffrey Carver have just published a paper titled Claims About the Use of Software Engineering Practices in Science: A Systematic Literature Review: Read More ›

Teaching to the Workflow
Naupaka Zimmerman / 2015-09-15
"Teaching to the test" has a deservedly bad reputation, but what about "teaching to the workflow"? A group of us came together at the NCEAS Open Science Codefest last year and put together a paper on open science in ecology. In it, we sketch three examples of possible open science workflows (Figure 2 in the paper). In response, I was asked what Software Carpentry should teach to prepare people for working in those ways. My top three things are (in order): Read More ›

Rebooting Instructor Training
Greg Wilson / 2015-09-14
Instructor training has been key to Software Carpentry's growth, but it was clear by the time our last online class finished in April that we couldn't and shouldn't keep doing it the way we had been: couldn't because of the time it ate up, and shouldn't because it wasn't doing enough to prepare people for actually teaching. After five months of thinking, talking, and revising, we finally have a plan for rebooting the course. We are going to run it twice between now and the end of the year: once in the usual multi-week format, and once more compressed into two full days. We are also going to introduce some preparatory requirements, mentoring for new instructors, and mini-lessons to help ease people into organizing and running workshops. Finally, we will start training a few more people to run instructor training so that we can afford to give trainees more personal attention. The full proposal is included below, and we will be contacting people who have applied for instructor training over the next few days to invite them to take part. I apologize right now for not being able to offer everyone a place right away, but if this reboot goes well—and I think it will—we should be able to start clearing the rest of our backlog soon. Read More ›

Not Quite Lesson Material
Christina Koch / 2015-09-14
Originally posted on the author's blog. Lesson plans are a funny thing. The Software Carpentry lessons are an odd balance of a lesson to be read (like a textbook) and a lesson plan for instructors, where they bring the text to life via speaking, live coding, and doing exercises. This text format doesn't lend itself well to a few activities I've used in Software Carpentry workshops. So I'm going to throw them up here for other instructors to look at - feel free to add your own activities in the comments, or submit another blog post! Read More ›

How Teaching Knowledge Is Transferred
Greg Wilson / 2015-09-13
I hesitate to say so, but I believe that we pedagogues tend to exaggerate greatly the amount of change in educational practice that results from reading what other people say should be done... — Stephen Corey, 1951 Being back in Edinburgh thirty years on has occasioned much reflection upon lessons learned. This has occasioned a re-reading of some papers by Prof. Sally Fincher, whose research group at the University of Kent studies the teaching and learning of computer science. In particular, I have been looking at what they have discovered about how educators share teaching practices. I hope these excerpts and reflections are of interest. (Note: section titles link to papers.) Read More ›

Workshop at the University of Arizona
Uwe Hilgert / 2015-09-13
Software Carpentry Workshop Who: Practicing and aspiring research scientists When: October 3 and 4, 2015 (9am - 5 pm on both days) Where: UA Integrated Learning Center How: Register at http://iplantsc.eventbrite.com Read More ›

Reporting on a Commercial Workshop
Joshua Ryan Smith / 2015-09-09
On August 22, Marty McGuire (@schmarty) and I (@joshua_r_smith) taught a for-profit workshop based on Software Carpentry materials, particularly the novice Python lesson. This post is a debrief of that workshop. Read More ›

2015 Post-Workshop Instructor Debriefing, Round 17
Rayna Harris, Kate Hertweck / 2015-09-09
The mentorship committee held their latest round of debriefing for instructors of recent workshops on Tuesday, September 1. Then, on Friday September 4th mentorship committee met for a regular bi-weekly meeting. Some comments from the committee meeting have made their way into this debriefing post. Read More ›

August 27 - September 5, 2015: Lessons Published, ReScience, SSI Fellowships Open, Interview, and Mailman Threads as GitHub Comments.
Anelda van der Walt / 2015-09-05
Highlights Our lessons are now published! Please find important citation information in the blog post. New You can now publish your computational studies that replicates previous findings in ReScience. The Art of Data Science by Roger D. Peng and Elizabeth Matsui has been published. Greg Wilson's recent CS Education Zoo interview about Software Carpentry is available via YouTube. Contribute Do you know of a tool that can convert Mailman mailing list threads into comments on a GitHub issue? Opportunities Applications are open for the Software Sustainability Institute's Fellowship programme. Apply before 1 October 2015. Read More ›

Our Lessons Have Now Been Published
Greg Wilson / 2015-09-05
It's been a long time coming, but we have finally published Version 5.3 of our core lessons. Please cite them as: Read More ›

Announcing ReScience
Konrad Hinsen / 2015-09-03
It's our great pleasure to announce the creation of "ReScience" which is a peer-reviewed journal that targets computational research and encourages the explicit replication of already published research, promoting new and open-source implementations in order to ensure that the original research is reproducible. Read More ›

SSI Fellowship Applications Open
Shoaib Sufi / 2015-09-02
The Fellowship Programme run by the Software Sustainability Institute funds researchers in exchange for their expertise and advice. The main goals of the Programme are encouraging best practice and gathering information about research software from all disciplines, and encouraging Fellows to develop their interests in the area of software sustainability (especially in their areas of work). The Programme also supports capacity building and policy development initiatives. Read More ›

Running a Code Retreat
Terri Yu / 2015-09-02
I recently attended a scholarship retreat run by Google with about 40 other students. We spent one day doing a "code retreat" and I wanted to share my experience and what I learned about programming. Read More ›

Better Teaching Practices
Greg Wilson / 2015-09-01
It wasn't part of our original plan, but over time Software Carpentry has come to be about better teaching as much as it is about better computing. In aid of that, I would like to offer the following: Read More ›

Three Graphs I Would Like to See
Greg Wilson / 2015-08-31
I spent part of the weekend chatting with a friend in Cambridge who used to be science editor at The Independent and now edits Scientific Computing World. During those conversations, I realized that there are three graphs I'd really like to see: Read More ›

GSoC 2015 Finished
Raniere Silva / 2015-08-30
This year's edition of Google Summer of Code (GSoC) has now finished. We are very happy with the outcome of the work of the three students that worked under the NumFOCUS umbrella that we helped to coordinate. Read More ›

2015 Post-Workshop Instructor Debriefing, Round 16 (morning)
Kate Hertweck / 2015-08-28
The mentorship committee held their latest round of debriefing for instructors of recent workshops on Tuesday, August 18. This post highlights the themes discussed in both the morning and evening sessions. Our participants included multiple levels of experience, including new instructors and experienced instructors who were preparing to teach again soon. The most difficult aspects of their experiences are described below, as well as things that work well and may assist in ameliorating problems. Read More ›

August 18 - 26, 2015: Instructors' Retreat, Applying Discounts, Adding Lessons, Be a Mentor, Undergraduate Training, and Improving RStudio.
Anelda van der Walt / 2015-08-26
Highlights Do you want to meet other Software Carpentry instructors and share your training experience? Let us know when you'll be available for our first Virtual Instructors' Retreat. We've already heard back from 60 people! How do you know if a fee waiver or discount will apply to your Software Carpentry Workshop? Read our new guidelines. Mike Jackson wrote about how easy and rewarding it is to add lessons to the Software Carpentry repertoire. It's inspirational to see the power of openness and collaboration. Contribute Join our mentoring sub committee — it's a great opportunity to simultaneously learn and give back. Share your ideas for a Maths undergraduate Python curriculum with Ian Hawke (and us). RStudio wants to hear from you about how to improve their software for training purposes. An interesting discussion is already under way. Read More ›

Fee Waivers and Discounted Fees
Raniere Silva / 2015-08-24
A couple of weeks ago the Steering Committee met in person and one of the topics discussed was fee waivers and discounted fees for workshops and instructor training. Continue reading for our conclusions. Read More ›

Virtual Instructors Retreat
Tiffany Timbers / 2015-08-24
The Mentoring sub-committee is planning a Virtual Instructors Retreat this fall. The goal of the event is to build community between instructors, as well as give us all a chance to practice our teaching and give and receive feedback from each other. This will be a day-long event where we meet in person with our local community, where possible, and remotely where meeting in person is not possible. World-wide, all groups will also interact with each other via internet conferencing for specific portions of the event. We want as many of our instructors as possible to be able to participate, so please help us to do this by filling out this Doodle poll. Also, we are interested in your ideas about such an event. Please feel free to mail any ideas, comments or questions to the Mentoring sub-committee. Read More ›

What Is ORCID?
Will Fyson / 2015-08-22
Part of being a successful researcher lies in the ability to stand out from your peers, which can be done through making and being acknowledge for valuable and original contributions. Once acknowledged for one discovery this can then act as a springboard to allow your peers to identify your other scholarly contributions, or alternatively identify potential for future collaboration, or be used as a proof of your research skills when applying for further funding. In short, making your work and accomplishments known is crucial to success in academia. Yet whilst so many functions of the academic process hang on the concept of citations and as such the ability to identify the researchers behind a piece of work, the actual means of identifying a researcher is not without its problems. For example how to identify the discoveries and related work of a specific "John Smith" after coming across one of the author's particularly informative publications? How do we keep up to date with a researcher's publications if they change their name? How do we keep track of a successful researcher who works across a number of institutions over the course of their career or who engages in work across a range of disciplines? Read More ›

Join the Mentoring Subcommittee
Raniere Silva / 2015-08-22
The Mentoring Subcommittee is seeking for new members to help with the debriefing sessions and future activities. If you are interested in join us, please send a email to mentoring@lists.software-carpentry.org. Read More ›

Feedback on Math with Python for Undergraduates
Greg Wilson / 2015-08-21
Ian Hawke (who put together these notebooks on testing numerical code) is now putting together some Jupyter Notebooks to teach Python to first-year undergraduates in mathematics. He would be grateful for feedback, and we'd be grateful if you could give him some: we'll learn a lot about what we should teach from seeing what you think he should. Read More ›

Improving RStudio as a Teaching Tool
Noam Ross / 2015-08-20
On Twitter, the RStudio support team requested suggestions for how to make RStudio better as a teaching tool. So I've started an issue on their support site for instructors to chime in with ideas. Go ahead and let them know what would make teaching with RStudio easier and better! Read More ›

Experiences Adding a Lesson on Make
Mike Jackson / 2015-08-20
In June I added a lesson on Automation and Make. In this blog post, I describe how the lesson evolved, my experiences in porting it into the Software Carpentry lesson template, and the community's response... Read More ›

Stickers
Greg Wilson / 2015-08-18
Want to dress up your laptop? Software Carpentry stickers are now available from Sticker Mule. All proceeds will go to the Software Carpentry Foundation. Read More ›

August 3 - 17, 2015: Data Carpentry Funded, Citations, Improving Our Lessons, and Lab Data Management.
Anelda van der Walt / 2015-08-17
Highlights Fantastic news from our sibling organisation, Data Carpentry, who received USD 750,000 in funding from the Gordon and Betty Moore Foundation. Contribute Do you have suggestions for the citation format of our lessons? Should all contributers be named? Please participate in this discussion to help us move forward with making our lessons citeable. How can we improve the next version of our Python 3 lesson? Contribute to the discussion before September when content of the next release will be decided upon. Meanwhile, a few posts focussing on preparing for and teaching various lessons have appeared over the past months. You can help improve what and how we teach by sharing your experience in terms of content, format, and preparation for teaching at a Software Carpentry workshop. Have you designed and built a data management system for your lab or project? Please share your experience to help us rethink how we teach databases and their integration with other tools. Read More ›

Science Track at PyCon UK 2015
Sarah Mount / 2015-08-17
Today, researchers in the sciences, humanities and arts, all use code as an everyday part of their work. Often such code is written using the popular Python programming language. Thanks to generous funding from the Software Sustainability Institute, PyCon UK will have a track for scientists and other researchers who want to improve their coding skills, learn from colleagues, and discover new ways in which Python and its community can support their work. Read More ›

Prepping for the Python Lesson
Greg Wilson / 2015-08-16
Inspired in part by Byron Smith's post about trimming our standard Python lesson, Christina Koch has written a post of her own about preparing to teach that lesson. She organizes her discussion around the motivating question that opens the lesson: "We have to accomplish a task (reading in data, analyzing and plotting it) by writing a program. How can we be smart about it?" It's a good read, as it shows how an experienced instructor thinks about (re-)designing teaching material. I hope her conclusions will feed into our discussion of how to revise the lesson (which we're going to decide in September). Read More ›

Teaching in Bali
Areej Alsheikh / 2015-08-14
On July 2015, I was fortunate to receive an invitation to the first international workshop of the Diversity of the Indo-Pacific Network (DIPnet), which explains the "Bali" part, in order to teach R and the Unix command line during the first two days. The weeklong workshop was co-hosted by the Hawai'i Institute of Marine Biology and Indonesia Biodiversity Research Center (IBRC), and lead by Dr. Eric Crandall from CSU Monterey Bay. The theme of the workshop was "Molecular Ecology and Bioinformatics in Developing Countries", which featured lectures and labs developed by the invited participants in their area of expertise. For this, a two-day software carpentry workshop was very much needed to begin the week and prepare all participants for the computational sessions coming up in the week. Read More ›

Checking What We Teach
Greg Wilson / 2015-08-14
Back in May, Jonathan Klaasen wrote a post about setting up a lab data management system. After re-reading it, I think it's a good reality check on what we teach about databases: from what I can tell, we cover most of the information management needs that Jonathan touches on. I also think it could be a great motivating example for a lesson on databases, and on how to combine them with other tools like shell scripts. If you've ever built or used something like what Jonathan describes, I'd be grateful for comments describing where your setup is the same and where it's different. Read More ›

Trimming the Python Lesson
Greg Wilson / 2015-08-13
We are close to releasing a new version of our standard Python lesson that uses Python 3 instead of Python 2. As soon as that's done, we're going to warm up this discussion about fixing that lesson. One new contribution to that is this post by Byron Smith about how he trimmed the existing lesson. If you have other experiences or comments, please add them to this GitHub issue so that we can make decisions in September about what to include in Version 5.4 in November. Read More ›

Data Carpentry Receives Grant from the Moore Foundation
Greg Wilson / 2015-08-13
Reposted from the Data Carpentry blog. We are extremely pleased to announce that Data Carpentry has received $750,000 in funding from the Gordon and Betty Moore Foundation. Read More ›

Publishing, Metadata, and Being Ahead of the Curve
Greg Wilson / 2015-08-12
As described in earlier posts, we are publishing our Version 5.3 lessons through Zenodo to make it easier for people to cite them. We're getting closer, but there are still a few bumps in front of us, and we would like our readers to help us figure out what to do next. Read More ›

2015 Post-Workshop Instructor Debriefing, Round 15
Kate Hertweck, Tiffany Timbers / 2015-08-07
The most recent installation of instructor debriefings by the mentoring subcommittee was held on August 4 to discuss recently completed workshops. We were joined by new instructors as well as a number of very experienced instructors (some of whom also maintain lesson repos), who all taught recently or are preparing to teach workshops. We highlight below a few of the main points from our discussions, including interesting new ideas, things that worked well, and things that were difficult. Read More ›

A Workshop for Undergraduates at UC Berkeley
Kunal Marwaha / 2015-08-06
We ran a workshop for undergraduates at UC Berkeley on July 9-10, 2015. This was a quickly planned workshop from concept to completion (~19 days). I was excited to run a workshop geared towards undergraduates, especially those involved in research projects in the summertime. Read More ›

2015 Post-Workshop Instructor Debriefing, Round 14
Sheldon McKay / 2015-08-04
On July 21 the mentorship team ran the 14th round of instructor debriefing session and received feedback from workshops at the NIH, UC Berkeley and UC Davis. Read More ›

July 23 - August 2, 2015: Online Survey Still Open, Recorded Lessons Available, and Another WiSE Workshop Coming Up.
Anelda van der Walt / 2015-08-02
Highlights Complete the online SWOT survey before 5 August to help steer Software Carpentry in the right direction over the next few years. Software Carpentry lessons from SciPy 2015 were recorded and is available in YouTube. Opportunities We're developing a template to provide solutions to challenges in a sensible way. Please share your ideas for a template with us. Events Another workshops for women in science and engineering will be held at UC Davis on August 17-18, 2015. Read More ›

SciPy 2015 Workshop Videos
Matt Davis / 2015-07-29
Software Carpentry was pleased to present a full two-day workshop during the recent SciPy Conference tutorials. The entire conference was recorded, including all sessions of our workshop: Shell Python Git Scientific Python The course materials are accessible via the workshop webpage and GitHub repo. The course would not have been possible without the help of several Software Carpentry members: Azalee Bostroem, Matt Davis, Jess Hamrick, Ted Hart, Katy Huff, Thomas Kluyver, Jens Nielsen, April Wright, and Elizabeth Seiver. Many thanks to these talented folks and to the SciPy organizers for inviting us! Read More ›

Solution for the Challenges
Raniere Silva / 2015-07-28
One of the enhancement that we have in mind for the next release of our lessons is to provide the solution for the challenges. Having the solutions will help instructors speed up when preparing the lesson for workshops and learners reading the lessons. Before we start adding the solutions will be great to have one template for how it should be write. If you have some suggestions, please add it at this issue on GitHub. Read More ›

WiSE Workshop at UC Davis Aug 17-18
Greg Wilson / 2015-07-23
We are pleased to announce that the latest in our series of workshops for women in science and engineering will be held at UC Davis on August 17-18, 2015. This workshop is aimed at those in science, engineering, medicine, or related fields who identify as women, female, or on the non-binary spectrum. Our instructors are experienced with creating safe spaces for those on the trans* spectrum, and one openly identifies as a trans woman; ensuring that we maintain an inclusive environment is a top priority for this workshop. For more information, or to register, please see the workshop website. Read More ›

July 07 - 22, 2015: Hiring an Executive Director, Strategic Planning, SWC-inspired Book, Open Research Repository, AMY version 0.6, and New Team Members.
Anelda van der Walt / 2015-07-23
Highlights Software Carpentry is entering the next phase and will be hiring a new Executive Director. Please apply if you are passionate about the mission of the Software Carpentry Foundation! Do you want to help steer Software Carpentry in the right direction over the next five years? Help us by completing a short anonymous SWOT survey. Changes Two new coordinators, Maneesha Sane and Katarzyna Zaczek, have joined the Software Carpentry team, and we're saying goodbye to Amy Brown. We've changed the workshop administration fees. Please read the post for details. Resources "Effective Computation in Physics" written by two Software Carpentry instructors, Anthony Scopatz and Katy Huff, is now available. An Open Research Glossary generated through a crowd-sourcing effort with a number of named contributors is now available. AMY version 0.6 has been released with a long list of new features and some fixes. Read More ›

A Pair of Workshops
Greg Wilson / 2015-07-23
Do you know your options for software licensing? Have you heard of new funders' requirements for software sharing? This workshop in Cambridge (UK) on Monday, September 14, is your chance to get expert advice on these and other questions about software licensing. Speakers include Neil Chue Hong, the Director of the Software Sustainability Institute, and Shoaib Sufi, the SSI's Community Leader. The event is open to everyone and you are welcome to bring your questions to the workshop. There will lots of opportunities to discuss your queries at the sessions and during the dedicated networking lunch. For more information, please see the workshop's website. There is also a workshop on Monday, October 19, at the Natural History Museum in London on getting credit for software. This workshop will explore what contribution software can and should make for academic reputational credit; i.e., how can the production of software tools and applications contribute to career advancement in the academic research setting for both researchers who build software as part of their research and developers who build tools and support research. Read More ›

Changes to Workshop Administration Fees
Greg Wilson / 2015-07-20
After discussion with our Advisory Council and Data Carpentry, we have agreed to make some changes to the administration fee we charge for workshops that we help organize in order to reflect the real cost of our staff's time and our overheads, and to reflect the value of the training. Please see the workshop request page for the updated pricing information. Read More ›

Welcome Maneesha and Katarzyna
Greg Wilson / 2015-07-20
We are very pleased to announce that Maneesha Sane (pronounced "sah-nay") will be joining us in August as our new Program Coordinator. Having coordinated and managed public events and mentoring programs for several organizations in the Philadelphia area, Maneesha retrained as a software developer. She has been managing events and matching mentors to learners for over a decade, and we are looking forward to working with her. Read More ›

Top 10 Myths about Teaching CS
Greg Wilson / 2015-07-18
Mark Guzdial (whose blog has been a frequent inspiration) recently wrote an article title Top 10 Myths about Teaching Computer Science: The lack of women in Computer Science is just like all the other STEM fields. To get more women in CS, we need more female CS faculty. A good CS teacher is a good lecturer. Clickers and the like are an add-on for a good teacher Student evaluations are the best way to evaluate teaching. Good teachers personalize education for students' learning styles. High schools just can't teach CS well, so they shouldn't do it at all. The real problem is to get more CS curriculum out into the hands of teachers. All I need to do to be a good CS teacher is model good software development practice, because my job is to produce excellent software engineers. Some people are just born to program. Read More ›

Help Software Carpentry's Strategic Planning
Adina Howe / 2015-07-18
The SCF Steering Committee is undertaking a strategic planning process that will help us to identify the SCF's working priorities for short- and long-term success, based on input from you. Your input is crucial—we are beginning the process by asking that you identify key Strengths, Weaknesses, Opportunities, and Threats (SWOT) facing SCF over the next five years via this anonymous survey. We will review the survey results, summarize key issues, and develop a summary and plan that addresses the issues raised. To help, please fill in the SWOT Survey. Note that the survey asks for 3 answers for each strengths, weaknesses, opportunities, and threats but you may fill out more or less as you see fit. Please fill out the survey only once and by August 5, 2015. Read More ›

2015 Post-Workshop Instructor Debriefing, Round 13
Raniere Silva, Sheldon McKay, Tiffany Timbers / 2015-07-18
Last week the mentorship team ran the 13th round of instructor debriefing session and received feedbacks from the workshops at Brigham Young University (check Belinda's post about it), Johns Hopkins University, Notre Dame University, Pennsylvania State University, the Jackson Laboratory, the University of Queensland, the University of Hawaii at Manoa, and the University of Toronto - Women in Science and Engineering (check Pauline's post). Read More ›

AMY Version 0.6
Piotr Banaszkiewicz / 2015-07-18
Yesterday was the release day for AMY v0.6.. Here's what's new and what's changed in this release. Read More ›

The Open Research Glossary
Ross Mounce / 2015-07-16
It's been knocking around for a while now, but this week saw a big new release of the Open Research Glossary, a crowdsourced glossary of terms, acronyms, tools and concepts in and around open science. There's a lot of jargon in this area and it's often a barrier to understanding for the uninitiated. For example: 'green', 'gold', 'diamond' and 'hybrid' open access are shorthand terms which aren't always fully understood. The alphabet soup of acronyms are even worse! Take SHERPA/RoMEO for instance: it's a brilliant resource for checking author self-archiving rights relative to publisher-imposed restrictions and embargoes, but the name 'SHERPA/RoMEO' doesn't exactly make that clear. This glossary aims to elucidate all the need-to-know terms in open scholarship. Read More ›

Teaching with Jupyter
Jessica Hamrick / 2015-07-14
We had a great birds of a feather session at the SciPy conference last week and decided to create a mailing list specifically for instructors who are interested in using the Jupyter Notebook for teaching. The aim of the mailing list is to provide a place for instructors to share materials, strategies, advice, etc. on teaching with the notebook and the logistics that are involved with that. If you're interested in joining the mailing list as well, you can add yourself at https://groups.google.com/forum/#!forum/jupyter-education Read More ›

Software-Carpentry-Inspired Book: Released and On Sale!
Katy Huff / 2015-07-14
We're excited to announce the official release of "Effective Computation In Physics". This book was written by two Software Carpentry instructors, Anthony Scopatz and myself, Katy Huff. We were enormously inspired by the vision and work of the Software Carpentry community and expanded on that vision in the book to create a "field guide to research in Python." While examples and more advanced content are presented in the context of research in the physical sciences, the majority of the book will be useful to all researchers doing scientific computation. In book form, we were able to dive in and expand on best practices more deeply and more extensively than is possible in a workshop. We're extremely proud to have created "SWC in a book" as instructor Daniel Chen recently described it. You (or your students and colleagues) can even get a 50% discount if you grab it before July 17th using the code: WKPYDP. Read More ›

What I Learned in Brisbane
Belinda Weaver / 2015-07-07
Not a single negative was recorded about the people either teaching or helping out at the Brisbane Software Carpentry bootcamp last Thursday and Friday (2-3 July). There were a lot of positive comments though: The workshop is great with passionate instructors All the helpers in the room are very helpful and fun Excellent support from helpers Instructors readily available to help especially when falling behind or [we] need help understanding codes Read More ›

Congratulations to Project Jupyter
Greg Wilson / 2015-07-07
This is wonderful news: the Helmsley Charitable Trust, the Alfred P. Sloan Foundation, and the Gordon and Betty Moore Foundation have just pledged $6M over three years to Project Jupyter (formerly known as the IPython Notebook). Fernando Perez of University of California, Berkeley and Lawrence Berkeley National Laboratory and Brian Granger of California Polytechnic University, San Luis Obispo will lead the project. Congratulations to them, their team, and their community—this is well deserved, and will help scientists all over the world in ways we cannot yet even imagine. Read More ›

June 29 - July 06, 2015: Research Software Engineers, Not Changing Lesson Build Tools, and Moving to Python3.
Anelda van der Walt / 2015-07-06
Highlights Do you know what a Research Software Engineer is or why we desperately need to recognise the role RSEs are playing in research? Read about the history of RSEs and a fellowship programme available for RSEs in the UK. Version 5.4 of our lessons will be released at the end of November, rather than mid-August, and lesson build tools will remain unchanged until then. Most importantly we'll be changing the Python lessons to run on Python 3. Contribute Are you looking for other ways to contribute to Software Carpentry? Visit our Projects page to see an exciting list of opportunities for you to get involved. Read More ›

Our Next Big Step
Greg Wilson / 2015-07-06
With Software Carpentry's rapid growth over the past couple of years, the combined responsibilities of being the Executive Directory and running the instructor training program have become more than a single person can manage. And after five years of working to grow Software Carpentry into the world-wide community it has become, I'd like to spend more time with my family. The Software Carpentry Foundation is therefore hiring a new Executive Director. I will transition to running instructor training so that the new ED can devote themselves to building relations with partners, overseeing the development of our curriculum, being Software Carpentry's spokesperson, and working with the Steering Committee to set our future direction. Our new hire will initially be co-Executive Director, and will job-share with me during a brief transition period, after which they will become the new ED. This is the next logical step in Software Carpentry's evolution, and one that we have been working toward for more than a year. As with the election of the Steering Committee in January, it's a sign that Software Carpentry is here to stay, and nothing could make me prouder. Read More ›

Hiring a New Executive Director for Software Carpentry
Greg Wilson / 2015-07-06
The Software Carpentry Foundation seeks to hire a new Executive Director to build relations with partners, oversee the development of our curriculum, be Software Carpentry's spokesperson, and work with the Steering Committee to set our future direction. The successful candidate will initially be co-Executive Director, and will job-share with the current Executive Director during a brief transition period, after which they will become the new ED. To apply, please send email to team@carpentries.org by July 31, 2015 with "Co-Executive Director" in the subject line, and include: A brief resume or CV (approximately two pages). A brief statement (approximately two pages) of what you would hope to accomplish in your first year as Executive Director. Please also include a paragraph about any work you may have done with Software Carpentry in the past and another about your experience working with other volunteer organizations. We will begin interviews immediately after July 31, and hope to have someone in place no later than the end of August. Read More ›

Pushing Back
Greg Wilson / 2015-07-01
A week ago, we posted a proposal to use Jekyll to build our lessons rather than Pandoc. The immediate reaction was almost uniformly positive, but in the days since, people have pushed back on two fronts: Read More ›

What is a Research Software Engineer?
Greg Wilson / 2015-06-29
By now, many people in the UK (well, many of the sort who read this blog) will have heard the term Research Software Engineer, but what exactly is an RSE, and what effect will the creation of this title have? To understand, we need to go back to the Software Sustainability Institute's Collaborations Workshop in early 2012 (summarized in these blog posts and others). Those discussions led to this position paper at Digital Research 2012, whose authors argued that: Read More ›

June 17-28, 2015: A Lesson on Make, AMY 0.4 Released, Opportunities to Contribute, Practical Tips for Running Workshops, and Appointing a Program Coordinator.
Anelda van der Walt / 2015-06-28
Vacancies Software Carpentry is hiring a Program Coordinator. Please see the blog post for more information. Highlights A new lesson on Make has been added to the Software Carpentry repertoire. AMY 0.4 was released. New features are listed in the blog post. Contribute Would you like to contribute by recycling previously submitted assessment excercises and adding the best ones to our lessons? It can now easily be done by visiting the newly created repo of submitted MCQs and exercises. Learner assessment has been receiving a lot of attention over the past months. A first draft of the new feedback survey is now available. We're looking forward to your comments/contributions. Should we change our lesson templates to make use of Jekyll rather than Pandoc? Please contribute to the discussion to help us make an informed decision. Greg Wilson summarised lessons learned from teaching instructors. There is again an opportunity for you to contribute to the direction instructor training takes in future. Let us hear from you? Useful Tips Splitting the terminal window allows the instructor to display recent commands while continueing with the lesson at the same time. Read the post by Raniere Silva to see how it's done. Read More ›

Training Lessons
Greg Wilson / 2015-06-26
I wrote about our experiments with the format of instructor training back in May. At that time, we had run the class as: a multi-week online class, an in-person two- or three-day class, and a mixed mode with the trainees physically together for two days with the trainer coming in via teleconference. We have since tried the mixed mode twice with the trainees at three different sites (three universities in Arizona for one run, and universities in Cape Town, Sheffield, and Ann Arbor for the other). We've also gathered a lot of feedback on what people want from instructor training and what its prerequisites should be. Here's what we've learned. Read More ›

2015 Post-Workshop Instructor Debriefing, Round 12
Kate Hertweck / 2015-06-26
The mentoring subcommmittee hosted instructor debriefings on 23 June 2015 to discuss recently completed workshops. We are delighted that so many new instructors are joining us at these sessions as a way to prepare for upcoming workshops, and welcome anyone else interested to attend as well. Below we highlight a few discussion points from our sessions, including issues with lesson pacing and Python installation, as well as tips on using the etherpad and GitHub organizations. A more in-depth synopsis of a recent workshop can be found in this fantastic post on Raniere Silva's blog. Read More ›

Workshop at CERN
Raniere Silva / 2015-06-25
At the beginning of June Rémi Emonet, Kwasi Kwakwa, and Chelsea Chisholm ran a workshop at CERN. Rémi has just posted a review. It went well, and there are a lot of good ideas in his write-up—from using a whiteboard for diagrams to the IPython Notebook's successor (Jupyter) and some semi-improvised intermediate material. Read More ›

Using Jekyll for Lessons
Greg Wilson / 2015-06-24
A recurring complaint about our lesson template is that it requires authors to commit generated HTML files to their repositories as well as their Markdown source files. This is necessary because we use Pandoc to convert Markdown to HTML, but GitHub will only run Jekyll. There were a bunch of reasons for using Pandoc instead of Jekyll, but it is now clear that the simplicity of only committing Markdown—i.e., of using GitHub pages the way they're meant to be used—is more important. We have therefore created a prototype of a Jekyll-based template (which is rendered here). The most important changes are: Read More ›

Assessing Our Learners Part I
Daniel Chen / 2015-06-23
Three weeks ago, Jason Williams, Jeramia Ory, and Daniel Chen met at the New York Public Library to work out an initial survey to assess our learners. Greg Wilson and Katerena Kuksenok joined virtually to provide feedback. The goal was to take the comments from the various initial GitHub issues and create a draft of an assessment survey for everyone to provide input. Our first draft is up, so please provide feedback at https://github.com/swcarpentry/assessment/issues/6. Read More ›

Another Good Workshop in Brazil
Greg Wilson / 2015-06-23
The indefatigable Raniere Silva has just posted a description of a workshop at the University of Ceará that he and Dani Ushizima just finished teaching. It went well, and there are a lot of good ideas in his write-up — please check it out. Read More ›

Program Coordinator Position Available
Greg Wilson / 2015-06-22
Software Carpentry has grown and grown again since our re-launch in 2010. We are now helping thousands of scientists every year, and while many of our partners and instructors are now organizing workshops on their own, a lot of details still need to be sorted out to keep the whole show on the road. We therefore wish to hire a Program Coordinator to manage our day-to-day operations. This paid position will initially be part-time, but we expect that it will convert to full-time after a probationary period if funding allows. The successful candidate does not need to be either a programmer or a scientist, but must be well-organized, and can be located anywhere with reliable Internet access. The full description is included below; to apply, please email team@carpentries.org with "Program Coordinator position" in the subject line and a resume (either attached as PDF, or a link to something online). And please help us spread the word: we're a fun bunch to work with, and this would be a chance for someone to help a lot of scientists get more done in less time, and with less pain. Finally, I'd like to take this opportunity to thank Arliss Collins for all her hard work over the past year and a half. She is moving on to other duties now that our relationship with the Mozilla Science Lab has ended, but we couldn't have gotten through the past eighteen months without her. I'd also like to thank Amy Brown, who has come back to keep things going while we search for someone permanent, and welcome Kasia Zaczek, who is about to start handling workshops for us in Europe on behalf of Cyfronet in the same way that Giacomo Peru and Aleksandra Pawlik have been handling them in the UK on behalf of the SSI. Here's hoping that one day, somewhere, we can all get together for a group photo... Read More ›

Splitting the Shell Window
Greg Wilson / 2015-06-21
Raniere Silva has written a short post about a trick he found (via Kate Hertweck) for splitting the terminal window when teaching the shell so that recent commands stay visible at the top. It's a clever idea; we would welcome feedback from other instructors who have tried it or similar things. And if you have tricks of your own that you'd like to share, please let us know—we'd be happy to feature them here. Read More ›

Research-Based Course Design
Greg Wilson / 2015-06-21
I've written before about the breadth and depth of Juha Sorva's work on computing education. His latest contribution is a paper co-authored with Otto Seppälä titled "Research-Based Design of the First Weeks of CS1". In it, they tie the specifics of the new intro programming class at Aalto University directly back to CS education research. More specifically (from their abstract): Read More ›

Recycling Training Course Material
Greg Wilson / 2015-06-21
Over the past two years, more than 200 people have written multiple choice questions and other assessment exercises as part of Software Carpentry instructor training. Many of these are very good, some are excellent, and they all deserve a second look to see whether they should be integrated into our lessons. Read More ›

2015 Post-Workshop Instructor Debriefing, Round 11
Tiffany Timbers, Kate Hertweck / 2015-06-19
We held our 11th round of instructor debriefing last week where we discussed the Software Carpentry workshops held at the University of Oslo, University of Connecticut, University of Campinas, University of the Basque Country, Lawrence Berkeley Lab, Berkeley Institute for Data Science, Murdoch Childrens Research Institute, and Oklahoma State University, as well as a Data Carpentry workshop at the National Data Integrity Conference at Colorado State University. At this debriefing we also had 3 new instructors join us (who have yet to teach a workshop but will be teaching one in the near future) to gain an insight about what works (and what doesn't) at our workshops out in the wild. Read More ›

Amy Version 0.4
Piotr Banaszkiewicz / 2015-06-19
Today's the deadline for AMY v0.4. It contains a bunch of usability fixes, so all our admins should be happy :-) Read More ›

Why I Am Not Excited About Julia
Greg Wilson / 2015-06-18
If you hang out in scientific programming circles, you're probably heard of Julia by now. If you don't, or you haven't, it is: Read More ›

Software Development Practices in Academia
Greg Wilson / 2015-06-18
Derek Groen, Xiaohu Guo, James Grogan, Ulf Schiller, and James Osborne have just submitted a paper to arXiv.org titled "Software development practices in academia: a case study comparison". From the abstract: Academic software development practices often differ from those of commercial development settings, yet only limited research has been conducted on assessing software development practises in academia. Here we present a case study of software development practices in four open-source scientific codes over a period of nine years, characterizing the evolution of their respective development teams, their scientific productivity, and the adoption (or discontinuation) of specific software engineering practises as the team size changes. We show that the transient nature of the development team results in the adoption of different development strategies. We relate measures of publication output to accumulated numbers of developers and find that for the projects considered the time-scale for returns on expended development effort is approximately three years. We discuss the implications of our findings for evaluating the performance of research software development, and in general any computationally oriented scientific project. Read More ›

Adding a Lesson on Make
Greg Wilson / 2015-06-18
We are very pleased to announce the addition of a lesson on automation and Make, which was created by the SSI's Mike Jackson and Steve Crouch. The repository contains everything you need to teach it, and pull requests are very welcome. Read More ›

Get More Done in Less Time
Greg Wilson / 2015-06-17
Over the past year, Alexandra Simperler has interviewed participants in Software Carpentry workshops to find out what impact we've actually had on their work. Her results are now available on arXiv.org: The aim of this study was to investigate if participants of Software Carpentry (SC) get more done in less time. We asked 32 questions to assess 24 former participants to analyse if SC gave them the computing skills to accomplish this. Our research shows that time was already saved during the workshop as it could shorten the learning process of new skills. A majority of participants were able to use these new skills straight away and thus could speed up their day to day work. Like Jory Schossau's study, Alexandra's work shows that workshop participants believe we're making their lives better. Read More ›

June 10-16, 2015: Software Carpentry is Saving Time, Lessons Version 5.4, Greg's Time, Our Project List, and a Lesson on Reproducible Research.
Anelda van der Walt / 2015-06-16
Highlights Software Carpentry get more done in less time is the latest publication about Software Carpentry and the impact it's having on research. It reports research done by Alexandra Simperler and is available at arXiv. Version 5.4 of our lessons is currently in the pipeline and due for release in the middle of August. Please join the discussions and participate in addressing new and existing issues to help us reach our next milestone. Over the last few months a lot has improved in the way Software Carpentry operates. Greg Wilson revisited some changes and the impact on how his time is spent six months after election of the Steering Committee. Do you want to know how you can help to move Software Carpentry forward? Take a look at our project list or get in touch. There are loads of opportunities for people with a wide variety of skills. Resources Titus Brown posted a lesson on reproducible research on YouTube. Read More ›

Updating the Project List
Greg Wilson / 2015-06-15
Updating my description of where my time goes made me realize that our list of things we need help with had fallen out of date. The highlights are below; please see the projects page for details, and get in touch if you'd like to help. Read More ›

A Lesson on Reproducible Computational Analysis
Greg Wilson / 2015-06-15
Titus Brown has recorded a two-hour lesson on reproducible computational analysis and posted it to YouTube. Many thanks to the folks at ICER for making it available. Read More ›

Where the Time Goes (Version 2)
Greg Wilson / 2015-06-14
Last November, I wrote a post about where my time was going. A lot has changed since then, including my workload, so here's an update: Read More ›

Routinely Unique
Greg Wilson / 2015-06-14
Back in April, Jeffrey Chang wrote wrote an article for Nature in which he pleaded for those who analyze bioinformatics data to be recognized as creative collaborators in need of career paths. In it, he observed: To give greater support to researchers, our centre set out to develop a series of standardized services. We documented the projects that we took on over 18 months. Forty-six of them required 151 data-analysis tasks. No project was identical, and we were surprised at how common one-off requests were. There were a few routine procedures that many people wanted, such as finding genes expressed in a disease. But 79% of techniques applied to fewer than 20% of the projects. In other words, most researchers came to the bioinformatics core seeking customized analysis, not a standardized package. Read More ›

Running a Remote Workshop in South Africa
Laurent Gatto, Anelda van der Walt, David Merand / 2015-06-13
Last month we ran a workshop at Stellenbosch University in South Africa. The workshop instructors were Laurent Gatto (from the UK - coming in as remote instructor), and two local instructors - David Merand and Anelda van der Walt. We had four helpers in the room and 32 participants. Read More ›

Warming Up for Version 5.4
Greg Wilson / 2015-06-12
It's time to start thinking about what should be in Version 5.4 of our lessons (which we plan to release in the middle of August to get us through to the end of the year). We have opened a discussion ticket for each of the core lessons on GitHub; please add your thoughts there. Unix shell Git Mercurial SQL Python R MATLAB Read More ›

Why We Can't Have Nice Things
Greg Wilson / 2015-06-11
In the beginning, there were tables: rows upon rows, with columns separated by commas or tabs or something more exotic. They were elegant but limited, so programmers said, "Let there be XML!" And lo, there came a great wailing and gnashing of teeth, for who among us can truly comprehend external (parsed) general entity declarations and the encoding thereof? Read More ›

Teaching at NIH
Fan Yang / 2015-06-11
Reposted from Fan's blog. I just finished my two day software carpentry workshop held at National Cancer Institute (NCI), Rockville, MD. The course went for 2 days, from June 9th to June 10th and taught by Jonathan Guyer, Adina Howe, and I. I was in charge of the unix shell and version control (git) sessions (Also many thanks to Dilip Banerjee, Lynn Young for the help!). Personally, I think the course went really well but there are definitely things I could adjust and improve in the future. Below is the run downs. Read More ›

Call for Chapter Proposals: Software Engineering for Science
Jeffrey Carver / 2015-06-10
With the continuing increase in the importance and prevalence of software developed in support of science, there is a need to gather a set of best practices and case studies to serve as a standard reference book. We are producing a peer-reviewed, edited book to address this need. The book will be composed of chapters related to one of three topics that address the important needs. The book will have three sections, each related to one of those topics. The outline below enumerates those sections along with examples of the types of chapters that would fit within those sections. We solicit proposals from interested authors. Chapter proposals should fit into one of the following book sections. Read More ›

May 29 - June 9, 2015: New Lesson About Data, SWC at ScipPy 2015, Updating our Lesson Templates, and Amy 0.3.
Anelda van der Walt / 2015-06-09
Highlights Our latest addition to the Software Carpentry curriculum focuses on using Python to work with data on the web. We've created an ambitious list of proposed improvements to our lesson templates and aim to have it implemented by September. Please add your comments or help us improve the templates. Amy 0.3 has been released and include several improvements that will make administrators' lives easier. Events One of the aims of running a Software Carpentry workshop at SciPy 2015 is to provide Python novices with the basics ahead of the conference. There are still spaces available for the workshop. Read More ›

2015 Post-Workshop Instructor Debriefing, Round 10
Sheldon McKay / 2015-06-09
The mentorship team ran the 10th round of instructor debriefing session on May 26. Thanks to David Dotson, Daniel Chen, Sahar Rahmani, Christina Koch, David Merand and Laurent Gatto for feedback on their workshops. Rémi Emonet and Fan Yang, instructors at upcoming workshops, also attended. Read More ›

Updating the Lesson Template
Greg Wilson / 2015-06-07
A couple of weeks ago, we asked our lesson maintainers what changes they would like to see in our lesson template based on their experiences getting Version 5.3 ready for publication. Their comments are summarized below; it's an ambitious list, but I think we can do most or all of this between now and September. If there are other things you think we should change in the way we structure lessons (rather than in particular lessons themselves), please add comments, and if you'd like to dive into one of these, please checkout the current lesson-template issues and create a pull request against the lesson-template repository. Read More ›

An Update on Publishing Our Lessons
Greg Wilson / 2015-06-07
Just a quick update on publishing our lessons: we prepared the release on schedule, uploaded it to Zenodo, and then discovered that we couldn't actually specify editors through their web interface, only authors, even though "editor" is an allowed value in the metadata. We also couldn't specify "lesson" as a type: our choices were "publication" (which is pretty generic) or "presentation" (which isn't quite right). Figshare and COS wouldn't let us do this either, but Zenodo is now working on it. They hope to release it by the end of June, at which point we'll wrap this up. Read More ›

Teaching at Monsanto
Will Trimble, Asela Wijeratne / 2015-06-07
As we discussed in April, the Steering Committee decided to run a small number of workshops for companies this year in order to see how our material would work for them, and whether they'd be willing to pay a higher administration fee that we could then use to underwrite workshops for people who might not otherwise be able to host one. The first of these workshops was held at Monsanto, and seems to have gone well: experience reports from two of the instructors are included below, and we hope to repeat the experiment for other companies in the coming months. (If you know a firm that would like our help, please do introduce us.) Read More ›

A Remote Workshop at the University of Campinas
Greg Wilson / 2015-06-07
Raniere Silva has posted a summary of a workshop at the University of Campinas at which Jennifer Shelton, Maneesha Sane, and Natalie Robinson taught over the web. 23 learners from 3 countries enjoyed lessons and pizza, and we learned a few more things about how to teach remotely. Read More ›

Amy Version 0.3
Greg Wilson / 2015-06-07
Version 0.3 of Amy, the web application we're building to manage workshops, has just been released. Among the improvements are: many UI upgrades autocompletion password and permissions management an easier way to award badges and many others It's already making our admins' lives better, and there are lots more improvements in the works. For details, please see Piotr Banaszkiewicz's recent blog post. Read More ›

Software Carpentry at SciPy 2015
Matt Davis / 2015-06-05
This year at SciPy 2015 Software Carpentry is excited to be running a full two-day workshop during the tutorials that precede the main conference. We'll be covering our standard topics of shell, introductory Python, and version control with Git. We'll also be teaching a unit introducing the basic libraries of scientific Python. Attendees need not have any experience with the topics we cover. We have our usual goals of helping scientists do more in less time with less pain, but at SciPy we also have a bonus goal of easing Python novices into the conference. A technical conference can present a steep learning curve to people unfamiliar with the field and we hope that a Software Carpentry workshop beforehand will give attendees a boost before wading into the conference deluge by exposing them to basic Python concepts and fundamental scientific Python libraries. There is still plenty of room in the Software Carpentry tutorial, so if this seems useful to you please register with SciPy. And if you know anyone who would benefit from a workshop and would be interested in the conference please share this with them. Hope to see you in Austin! Read More ›

Teaching Biocomputing at UT
Greg Wilson / 2015-06-03
Becca Tarvin has just posted an article about her experiences teaching biocomputing at the University of Texas. Like last month's report on teaching geoscientists, it's a welcome chance to compare what Software Carpentry does with what's possible in more conventional academic settings. As Becca says in summary: From student surveys we note that students benefit greatly from our online resources, including cheatsheets, markdown lessons, and example code, and they like live-coding more than PowerPoint presentations. Although it's difficult to keep attendance up over the semester, the students that attend/use the online resources greatly improve their coding abilities. We are working to improve the accessibility of our course to all levels of programmers by offering additional resources including Open Coding Hour and by creating an online forum. Read More ›

Workshop at OU Libraries
John D. Corless / 2015-06-01
During 19-20 May 2015, I taught a Software Carpentry workshop at the University of Oklahoma (OU) at their fantastic Bizzell Memorial Libary with Jonah Duckles, Logan Cox, and Jeremiah Lant. Jonah and I taught in one room, and Logan and Jeremiah in the other (about 30 students in each room). Tuesday morning was the Shell with Jonah, Tuesday PM and Wednesday AM was Python with me (except the command line section which Jonah covered), and Wednesday PM was git with Jonah. This was my first time teaching the workshop so I decided to write up my observations and student feedback. Read More ›

Working With Data on the Web
Greg Wilson / 2015-06-01
We have just added a new short lesson called Working With Data on the Web to our repertoire. If you would like to help us improve it, please fork it on GitHub and send us comments or pull requests. Read More ›

May 19 - 28, 2015: New Learner Assessments, Remote Instructor Training, Coding for Librarians, and Evolution of a Geoscience Computing Course.
Anelda van der Walt / 2015-05-28
Highlights Software Carpentry instructors are encouraged to help develop new assessments for learners by suggesting questions that should be asked of learners. A two-day remote instructor course will be trialed across three sites on June 8-9. Watch this space for feedback about the experience. Resources Andromeda Yelton has produced a wonderful resource for librarians who code (or want to code) in the form of six articles titled "Coding for Librarians: Learning by Example". Christian Jacobs, Gerard Gorman, and Lorraine Craig's paper on the evolution of their geoscience computing course and the influence of Software Carpentry is available from arxiv.org. Read More ›

A Few Articles on Education
Greg Wilson / 2015-05-25
Over the past year, I've come to realize that Software Carpentry will only work if knowledge flows in several directions. Scientists need to learn about software development, but software developers need to learn about science, too. In particular, they need to learn that it's possible to study software and programming scientifically, which is what motivated yesterday's post about my favorite papers from ICSE 2015. And both groups need to learn about evidence-based teaching practices and the politics that education is embedded in (because without an understanding of the latter, no change is possible). While I don't have a snapshot like ICSE to offer, here are a few recent articles I've found illuminating: Read More ›

ICSE 2015
Greg Wilson / 2015-05-24
Back when I was still trying to do science myself, my field of study was software engineering. The International Conference on Software Engineering is the big gathering for researchers in that area, and this year's has just wrapped up. Thanks to this Gist from Mike Hoye, I was able to browse the papers presented at ICSE and co-located workshops (like him, I'm outside the Great Paywall of Academia), and I've included titles and abstracts below from the ones I think readers of this blog might enjoy. They're only a fraction of what was presented, and I freely admit the sample is biased toward the things I understand and find interesting, but I hope they'll convince you that people are doing solid empirical studies in software engineering, and producing insights that we can and should act on. Note: just over half of these papers (13 of 24) had an easily-findable version online. I'm not going to do the experiment, but I confidently predict that those 13 will be more widely read, and more influential, than the other 11. Read More ›

Coding for Librarians
Greg Wilson / 2015-05-22
Andromeda Yelton (who has featured in this blog before) has written a set of six articles for Library Technology Reports titled Coding for Librarians: Learning by Example. From the introduction: [This] draws from more than fifty interviews with librarians who have written code in the course of their work. Its goal is to help novice and intermediate programmers understand how programs work, how they can be useful in libraries, and how to learn more. Three chapters discuss use cases for code in libraries. These include data import, export, and cleanup; expanded reporting capability; and patron-facing services such as improvements to catalog and LibGuide usability. Most of the programs discussed are short&mash;under a hundred lines—so that implementing or modifying them is within the reach of relatively novice programmers. Where possible, links to the code itself are provided. Several scripts are explained in depth. Additional chapters focus on nontechnical aspects of library code. One chapter outlines political situations that have been faced by librarians who code and the solutions they have employed. Another chapter shares interviewees' advice on specific resources and strategies for learning to code. Read More ›

Plan to Assess Our Learners
Daniel Chen / 2015-05-21
The assessment subcommittee seeks to assess the effectiveness of the activities of the Software Carpentry Foundation (SCF). It met a few weeks ago and drafted an action plan on how to move forward to create a series of assessments for our learners. This action plan will be the basis on how further assessment tools will be developed. Read More ›

Experiences with Geoscientists
Greg Wilson / 2015-05-21
Christian Jacobs, Gerard Gorman, and Lorraine Craig have written a paper titled "Experiences with efficient methodologies for teaching computer programming to geoscientists" that describes how their intro to computing course has changed over the last few years. It includes discussion of ideas they've borrowed from Software Carpentry, and some data on hwo students have responded. It's a good read, and we'd welcome more experience reports of this kind. Read More ›

Online Instructor Training Revisited
Greg Wilson / 2015-05-19
We have now run instructor training in three formats: an in-person two- or three-day class, a multi-week online class, and a hybrid version in which the trainees are co-located, but the trainer comes in via the web. We've also gathered a lot of feedback on what people want from instructor training and what its prerequisites should be. Based on all of that, we're going to try to combine the best features of everything we've done so far. Read More ›

May 12 - 18, 2015: 79 New Instructors and Instructor Debriefing Round 9.
Anelda van der Walt / 2015-05-18
Highlights 79 new instructors have qualified since the last announcement made in March! Instructor Tips Changing your prompt in the terminal when teaching the Shell lessons could provide more screen real estate. "export PS1='$ '" will change your current terminal only. Read More ›

New Members of the Team
Greg Wilson / 2015-05-16
It's been several months since we last welcomed new instructors to the team. A lot of people have finished training since then, so please say hello to: Read More ›

May 6 - 11, 2015: Lesson Prep for Publication, Capturing Instructors' Commands, and Instructor Debriefing.
Anelda van der Walt / 2015-05-13
Highlights 15 people in 5 countries are preparing the Software Carpentry lessons for publication. Instructor Tips Do you find that participants fall behind during Software Carpentry workshops because they can't follow commands entered into the terminal? You can use redirect and Dropbox to create a live document for participants to follow. Read More ›

2015 Post-Workshop Instructor Debriefing, Round 9
Raniere Silva, Kate Hertweck / 2015-05-13
This week the mentorship team ran the 9th round of instructor debriefing session. Thanks to Andrew MacDonald, Doug Latornell, Evan Morien, Ewan Barr, Isabell Kiral-Kornek, Jackie Milhans, Kara Woo, Karl Broman and Tiffany Timbers for the great feedback of the workshops at Northwestern University, Simon Fraser University, Swinburne University of Technology, University of Melbourne and Washington State University. Read More ›

2015 Post-Workshop Instructor Debriefing, Round 8
Kate Hertweck, Rayna Harris / 2015-05-08
The mentorship team met last week for a discussion with instructors who recently taught, including workshops at the National Center for Atmospheric Research (NCAR) and University of Texas at Arlington (the latter of which was taught by both authors of this post). Three important issues emerged during our discussion: recording the instructor's shell code, using example scripts to model increasing complexity in coding, and preparing instructors/helpers with answers to challenges. Read More ›

April 28 - May 5, 2015: GSoC Projects, Nightly Rebuilds, Katy Huff, and a PhD Starter Kit.
Anelda van der Walt / 2015-05-05
Highlights The Google Summer of Code students have been selected. Well done to Piotr Banaszkiewicz, Ian Henriksen, and Amit Jamadagni! Nightly rebuilds of the Software Carpentry lessons version 5.3 are now available. People Meet Katy Huff - chair of the Steering Committee and a member of the finance subcommittee. Resources Achintya Rao wrote a fascinating PhD Starter Kit listing some best practices and useful tools postgraduate students could find useful on their journey towards graduation. Events Learn about Research in the Cloud in Feltham this July. Read More ›

Research in the Cloud in London
Greg Wilson / 2015-05-02
Mark Stillwell and others are running a Research in the Cloud workshop in Feltham (near London) on July 15-17. Along with the standard Software Carpentry curriculum, they'll teach modules on cloud computing, including deployment, configuration, and management of virtual machines. Please see his blog post for details, or the workshop web site to register. Read More ›

Achintya Rao's PhD Starter Kit
Greg Wilson / 2015-05-02
Achintya Rao started a PhD last January, and in response to a request for advice from a friend, wrote a PhD Starter Kit that lists useful tools and practices. Most of them involve software of one kind or another, and it's interesting to compare the list to what we teach: if nothing else, it tells me that we really do need to figure out what to teach people about publishing science in the 21st Century. Read More ›

GSoC Projects for 2015
Raniere Silva / 2015-05-01
We're very pleased to announce that three students will be working on Google Summer of Code (GSoC) projects under the NumFOCUS umbrella that we are helping to coordinate. Read More ›

Getting to Know: Katy Huff
Amy Brown / 2015-04-30
This is the second in a series of posts about our contributors. We're posting these so our community can get to know each other better. If you'd like to be profiled, or you'd like to nominate another member, send an email to communications@lists.software-carpentry.org. This profile is of Katy Huff, a member of our steering committee, who is also on the finance subcommittee. We already know a little about Katy from her election nomination post; now, read about her history with Software Carpentry and her surprising backup career plan. Read More ›

April 21 - 27, 2015: The People Behind Software Carpentry, Debating Scientific Software, Learning Objects, and Ally Skills Workshops.
Anelda van der Walt / 2015-04-27
Highlights Get to know the people behind Software Carpentry. First up: Matt Davis. Conversations A lively conversation about scientific software has been taking place on Titus Brown's blog, culminating (for now) in this post on popping the open source/open science bubble. Recommendations We'd like to encourage our instructors and lesson contributors to read about the paradox of learning objects as discussed by David Wiley. Watch the video of the recent Ally Skills workshop offered by the Ada Initiative at PyCon 2015. Read More ›

Getting to Know: Matt Davis
Amy Brown / 2015-04-27
The subject of the first of our "Getting to Know" series of contributor profiles is Matt Davis, a long-time Software Carpentry team member. Matt is the vice-chair of our Steering Committee, and the Software Carpentry Foundation liason for the Lesson Organization and Development subcommittee. Read More ›

Van Lindberg's Keynote: Say Thanks
Greg Wilson / 2015-04-25
Van Lindberg, the chair of the Python Software Foundation, gave a really insightful keynote at PyCon 2015 last week. In a nutshell: The PSF's greatest challenge is that it's short of time: it has one full-time and three part-time employees. What can you do to help? Say thanks to people for what they're doing. Raise up and mentor others. Persevere when trying something new. (Bonus points if you help someone out who doesn't look like you.) How can a new organization grow up to be the PSF? Don't rush it. (In particular, don't put too much process in place too soon.) Default to openness. Build a culture of service. There's lots more, and you should watch the whole thing, but I think these are good guidelines for Software Carpentry (and most other things, too). Read More ›

Ada Initiative's Ally Skills Workshop
Greg Wilson / 2015-04-25
The Ada Initiative ran their Ally Skills workshop at PyCon 2015, and by all accounts it was useful and thought-provoking. They don't do an online version, but you can watch this video of a workshop at the Wikimedia Foundation. Recommended. Read More ›

The Paradox of Learning Objects
Greg Wilson / 2015-04-22
Warren Code recently forwarded this post by David Wiley, a serial innovator in open education and educational reform. In it, he recapitulates the history of "learning objects" and the paradox at the core of the idea of remixing and reusing teaching material. Since Software Carpentry is (sort of) trying to do exactly that, I think everyone who's currently teaching for us or helping us meet our first publication deadline should look it over. Read More ›

April 13 - 20, 2015: A DOI for Software Carpentry Lessons, Good Enough Scientific Computing Practices, Code Reviews, and Library Carpentry
Anelda van der Walt / 2015-04-21
Highlights Publication of our lessons to obtain DOIs is planned for May 14th. Please help us tidy lessons up or get in touch if you have any experience in publishing lessons built in GitHub. Conversations Let's write a follow-up to our article "Best Practices in Scientific Computing" called "Good Enough Practices in Scientific Computing". Should or could Software Carpentry teach participants how to do code reviews? What are your thoughts? Events Library Carpentry is a new program piloted by James Baker. The first Library Carpentry event will be hosted in London in November 2015. Read More ›

Learning in Both Directions
Greg Wilson / 2015-04-21
We have spent a lot of time thinking about how to assess the impact that Software Carpentry is having. We've done some small studies and collected a few testimonials, but it's been small potatoes compared to the 5000 people we taught last year alone. After some back and forth with a colleague whose work I have admired for years, though, I realize that I've been trying to do this the wrong way. My training as an engineer taught me that only controlled, quantitative experiments were "real" science—that as Ernest Rutherford said, it's either physics or stamp collecting. I now understand that there are other rigorous ways to generate actionable insights, some of which are better suited to our needs than something like randomized control trials. More than that, I finally understand what one of my first teachers told me: Teaching only works well when the teacher is also learning. Read More ›

AAS Reflections
Azalee Bostroem / 2015-04-18
We just finished* a workshop at the American Astronomical Society. I was lucky to recruit 3 instructors (in addition to myself) - Matt Davis, Erik Bray, and Phil Rosenfield. We also had Pauline Barmby volunteer as a helper (and on the fly instructor). While I wrote this blog post, you will see comments inserted by the instructors. Read More ›

Publishing Our Lessons
Greg Wilson / 2015-04-17
Digital Object Identifiers (DOIs) are one of the building blocks of academic bibliography systems. It's now possible to get a DOI for a GitHub repository (or more accurately, for the state of a GitHub repository at a particular point in time). We are going to use this to publish a citable version of our core lessons. Read More ›

Invitation to Millions of Compute Hours: Announcing the Open Science Grid User School
Christina Koch / 2015-04-17
If you could access thousands or even millions of hours of computing, how would it transform your research? What discoveries might you make? Each year the NSF-funded Open Science Grid (OSG) selects a group of 25-30 students to attend the OSG User School, a week-long dive into high-throughput computing approaches, technologies, and skills, within a larger context of computational research design that students can take into their future careers as researchers. Students across the country from nearly any research discipline are invited to apply, and selected applicants will obtain direct access to the OSG beyond the duration of the school. Read More ›

Library Carpentry
Greg Wilson / 2015-04-17
We wrote about the digital skills classes at the British Library last October. We were therefore very pleased to see that James Baker, a Software Sustainability Institute Fellow, is piloting a new program called Library Carpentry. The first run will take place in November 2015 at the Centre for Information Science at City University London; the program will consist of four three-hour sessions, each for 40-50 participants. The announcement has more details, including a call for participants and another for volunteers. Please check them out, and lend a hand if you can. Read More ›

Close Enough Redux
Greg Wilson / 2015-04-17
Back in October, we explained why we don't teach testing in Software Carpentry workshops. In response, Ian Hawke has put together a really nice series of articles about how he would test a small numerical program. It's great content, and it also shows yet again how Jupyter (formerly the IPython Notebook) is changing the way scientists create and share ideas. Read More ›

Korean Translation of Software Carpentry - version 5.2
Victor (Kwangchun) Lee / 2015-04-16
Based on Software Carpentry version 5.2, we translated all the lessons in the form of pdf, html, mobi, epub, and azw3. It took almost 6 months: the first half spent on mostly lessons, and the second half spent on remainings. While translating and building various ebook formats, the original ebook production software of Software Carpentry seemed not to have upfront architectural concerns in terms of Chinese, Japanese, and Korean (aka, CJK)characters problems. Also, we are wondering how to generate various ebook formats from the current developing lessons. Read More ›

Quality Is Free - Getting There Isn't
Greg Wilson / 2015-04-15
Worried about the rising tide of retractions, Nature Biotechnology recently announced that, "Its peer reviewers will now be asked to assess the availability of documentation and algorithms used in computational analyses, not just the description of the work. The journal is also exploring whether peer reviewers can test complex code..." That's a welcome step in theory, but I worry about how it will play out in practice. Scientists already complain about how much time they spend reviewing papers: reviewing code as well will take even more time, particularly if: Read More ›

2015 Post-Workshop Instructor Debriefing, Round 7
Kate Hertweck / 2015-04-14
The mentorship team held our latest round of post-workshop debriefing sessions for instructors who taught recently. Instructors from workshops at Clemson University, University of Miami, University of Melbourne, University of Oklahoma Libraries, Harvard School of Public Health, and Weill Cornell Medical College joined us for our discussions, as well as a few new instructors who will be teaching in coming weeks. Here's a recap of common themes and highlights: Read More ›

April 6 - 13, 2015: The Steering Committee, Workshops for Companies, and a Discussion of WiSE Events
Anelda van der Walt / 2015-04-13
Highlights A summary of the steering committee's activities has been provided by Karin Lagesen. Her post includes links to board meeting minutes and an overview of the newly-formed subcommittees. Software Carpentry will be running five workshops for companies. Please get in touch if you are aware of companies that may be interested to participate in this pilot program. Conversations What are your feelings about women-only events to address the gender gap in STEM? Read about the experiences and observations from instructors during a recent WiSE workshop. Read More ›

The Future Then and Now
Greg Wilson / 2015-04-13
Jon Udell's Internet Groupware for Scientific Collaboration taught me how to think about the web. He started work on an update a couple of months ago, and it has now been published by PLOS. What strikes me upon re-reading the first is how far we've come; upon reading the second, is how far we still have to go to make all of this normal. As William Gibson said, "The future is already here—it's just not very evenly distributed." Read More ›

How to Send a Pull Request to the Lesson Template
Raniere Silva / 2015-04-13
At the end of last year we split out lessons to have one Git repository per topic that we teach. To ensure that all lessons have the same look we use one template for our lessons. If you want to create a new lesson from scratch we already have a nice step-by-step guide but what about if you want to contribute to the template itself? In this post I will explain how I wrote my pull request for our template. Read More ›

Good Enough Practices in Scientific Computing
Greg Wilson / 2015-04-13
April Wright recently wrote a blog post about the reproducibility of a paper she recently submitted. In it, she said: Read More ›

Reflections Following a Women in Science Workshop
Nancy Soontiens, Karina Ramos Musalem, Tiffany Timbers, Daisie Huang / 2015-04-12
The number of women pursuing careers in STEM fields has increased over the last few decades, yet, there are still major gender gaps in the areas of mathematics, computer science, engineering and physics (Hill, Corbettt and St Rose, 2010). In the US, women make up close to 50% of the work force yet the percentage of STEM jobs occupied by women is only 25% (Beede, Julian, Langdon, McKittrick, Khan, and Doms, 2011). Part of this gap may be linked to biases and stereotypes that suggest "boys are better than girls" at math. These stereotypes can affect a girl's performance on tests and her perceived skill at math, ultimately influencing her decision to pursue a career and education in fields that require a foundation in mathematics and computing (Shapiro and Williams, 2011; Gunderson, Ramirez, Levine, and Beilock, 2012). How does this relate to Software Carpentry? Software Carpentry's mission is "to teach researchers basic lab skills for scientific computing." Given that female participants may experience an added anxiety related to gender biases when learning computing skills, Software Carpentry instructors should be aware of these stereotypes and how they might affect their learners. Further, workshops with at least one female instructor can counteract these stereotypes by showing that women are skillful in computing tasks. But how much intervention is necessary and could extra support for women actually be a detriment? Following a Data and Software Carpentry workshop for Women in Science at the University of British Columbia, we have thought about the advantages and disadvantages of holding a workshop for women only. We have posed the following question: "Is a women-only Software Carpentry workshop helpful or harmful for increasing the number of women in STEM?" A few points in each category are given below. Read More ›

The Steering Committee has Landed!
Karin Lagesen / 2015-04-12
The Steering Committe has now had its first few meeetings. Minutes from them are up, please go have a look if you are wondering what we are up to! Read More ›

A Project Inception Deck for Research Coding
Greg Wilson / 2015-04-11
I've never seen the point of comparing programmers to ninjas or samurai, but the people who do so often have good ideas. One that I particularly like is the Agile Inception Deck from Jonathan Rasmusson's The Agile Samurai, which sets out a ten-step process for making sure that everyone involved in a new project is actually trying to build the same thing. The ten steps are: Read More ›

Workshops for Companies
Matt Davis / 2015-04-09
At our March 12, 2015 meeting, the Software Carpentry Foundation Steering Committee discussed whether and under what terms to provide workshops to for-profit corporations. There were no objections to the idea of doing workshops for corporations, but it is something new to Software Carpentry. We've decided to run a pilot program encompassing five corporate workshops so we can learn more about working with corporations. The pilot program will allow us to gauge corporate interest and collect feedback from instructors, coordinators, and others about how things go. Read More ›

March 31 - April 6, 2015: A Lab Meeting, a LinkedIn Group, Two New Capstones, and Ideas for Instructors
Anelda van der Walt / 2015-04-06
Highlights A summary of last week's lab meeting is now available. Please take a moment to read through the highlights and notes. Software Carpentry and Data Carpentry instructors are invited to join our newly-created LinkedIn group. Resources Do you need to create a brand new lesson repository? The new process is outlined with an example - please try it out and send us your comments. Damien Irving created a capstone example specifically for oceanographers. A biomedical engineering MATLAB capstone was made available by Isa Kiko. Technical Challenges Git on Mac OS X 10.8: Instructors can find or propose solutions to the "Lazy symbol" error when installing git on old Macs (Mac OS X 10.8) Read More ›

April 2015 Lab Meeting
Greg Wilson / 2015-04-03
We held our second lab meeting of 2015 on April 1st, and had near-record turnout. Notes from the Etherpad are included below; the highlights are: We hope to have a first-quarter financial report by the end of this month. The good news is that there's lots of interest in partnerships and affiliations; the bad is that we're not collecting admin fees from nearly as many workshops as we need to. We will charge for-profit organizations four times as much for organizing workshops as we charge universities and other non-profits; the extra money will be used to underwrite workshops for places that otherwise might not be able to afford to host them. We have created a LinkedIn group for Software Carpentry and Data Carpentry instructors — if you're a LinkedIn user, you're welcome to join. Leigh Sheneman and Lynne Williams have volunteered to moderate the group. Peter van Heusden and Gabriel Devenyi have volunteered to help with system administration — our thanks to Jon Pipitone and David Rio for all their help over the past couple of years. Noam Ross is putting together a lesson on how to get unstuck, and Christina Koch is managing work on some extra Unix shell material. We hope to have the current lessons tidied up by the end of April so that we can give them DOIs, and thereby make it easier for everyone who has contributed to them to get proper credit for their work. Read More ›

2015 Post-Workshop Instructor Debriefing, Round 6
Tiffany Timbers / 2015-04-01
We held our sixth round of post-workshop debriefing last week. We discussed the Software Carpentry workshops held at the University of Arkansas, Utah State University, the University of Waterloo and the first Software Carpentry workshop ever held in Korea at the Korea Radio Promotion Association (and yes we now have lesson translated in Korean)! We were also joined by a Data Carpentry instructor who taught in a recent workshop at Espoo, Finland. Read More ›

March 23-30, 2015: A Lab Meeting, a Dataset, NGS Course at MSU, and Postdoc Positions at BIDS
Anelda van der Walt / 2015-03-30
Highlights Please remember to join us for the next online lab meeting on April 1st at 10:00 and 19:00 Eastern time. Resources If you are looking for a teaching-ready dataset, take a look at Ethan Whites simplified version of the Portal Projects Database now availble in csv, json, and sqlite. Events A 2+1 week NGS course will be offered by Titus Brown and others from 10 - 21 August. Applications are now open. EuroSciPy 2015 call for papers is open until 20 April. The conference will take place in Cambridge, UK on 26-30 August. Opportunities Berkley Institute for Data Science (BIDS) is inviting applications for Postdoctoral Researchers in Data Science. Apply before 20 April. Read More ›

Teaching in Yangon
Ben Marwick / 2015-03-25
On Sat 7 March I spent a full day teaching a Software Carpentry workshop at the University of Yangon with 23 archaeologists from the Department of Archaeology. The workshop is part of a training component of an archaeological research project funded by the Australian Research Council, the University of Washington and the University of Wollongong. The group included graduate students, tutors and lecturers. Archaeology in Myanmar has a strong art history flavour, partly due to its British colonial heritage (where archaeology and art history are often paired, compared to the where US archaeology is usually a sub-field of anthropology) but mostly due to the country's extreme isolation from the rest of the world, where archaeology has taken a scientific turn in recent decades. This isolation takes several forms: travel restrictions that make it difficult for locals to travel overseas, and until recently, for foreigners to visit; small library budgets that make it difficult for university libraries to keep their collections and subscriptions current; and slow and unreliable internet connectivity make browsing the web, watching videos, and downloading files a lengthy, uncertain and frustrating process. All of this meant that the group's familiarity with using computers for research was lower than what might be expected from a Western audience, and so we adapted the SWC materials to accommodate this. We knew we wouldn't get through as much as a typical workshop, but we had the advantage of everyone starting at an equivalent skill level, so the sticky notes all went up and down at much the same time and we had a pleasant and relaxed atmosphere. Read More ›

Weekly Update: March 16 - March 22, 2015
Anelda van der Walt / 2015-03-22
Highlights The next online lab meeting will take place on 1 April at 10:00 and 19:00 Eastern time. Remember to sign up. Resources Jenny Bryan contributed a Gapminder data package now available through CRAN which might be helpful in setting up for Software Carpentry workshops. Data Carpentry developed new material for teaching dplyr using their ecological dataset. Opportunities Students can apply to participate in our Google Summer of Code projects until 19:00 UTC on 27 March 2015. We have several exciting projects available. Insight Data Science launched a new seven week fellows program in Health Data Science. Read More ›

April 2015 Lab Meeting
Greg Wilson / 2015-03-20
The next Software Carpentry online lab meeting will take place on Wednesday, April 1 (no, really) at 10:00 and 19:00 Eastern time. (As usual, we will hold the meeting twice to accommodate people in different time zones.) Please sign up on this Etherpad to let us know whether you'll be attending, and if so, and what time. We'll post an agenda next week; if there's anything you'd particularly like to discuss, please let us know. Read More ›

Weekly Update: March 7 - March 15, 2015
Anelda van der Walt / 2015-03-17
Conversations Get some fantastic teaching tips from novice and seasoned trainers. Join the conversation by adding your tips. Useful advice is given about what to pack when you teach. Your contributions can potentially help fellow trainers. The Steering Committee has voted to remove SQL from the list of core topics for workshops. Instructors should still teach it if they think it's right for their audience, but they may also now use that time for more programming, testing, or other topics. Read More ›

Workshop at iPlant
Uwe Hilgert / 2015-03-17
The iPlant Software Carpentry Workshop in February at the University of Arizona in Tucson was an awesome realization of iPlant's and BIO5's collaborative nature. Bringing together iPlant, BIO5, the UA and Software Carpentry, this workshop served a large group of students and staff from a wide variety of backgrounds and a wide array of interests. 53 participants registered within 36 hours of publicizing the workshop. Participant demographics were as follows: Read More ›

And Now We Are Three
Greg Wilson / 2015-03-17
The four core topics that every Software Carpentry workshop is supposed to teach are automating tasks using the Unix shell, structured programming in Python, R, or MATLAB, version control using Git or Mercurial, and data management using SQL. In practice, many workshops omit the fourth, either because instructors want to put more time into the first three, or because they don't think SQL is relevant to their learners. The Steering Committee has therefore voted to take SQL out of the core. This doesn't mean that it can't or shouldn't be taught: it's still useful for many researchers to know, and the best way we've found to introduce key ideas in data management like atomic values, keys, and how to handle missing information. However, if instructors and learners would rather cover something else, they can do so. Read More ›

What Do People Want to Learn?
Tiffany Timbers / 2015-03-15
In the planning phase of organizing a Software Carpentry workshop for my home department of Molecular Biology & Biochemistry I started to wonder what participants want to learn. I designed a short survey to answer this question, and from my small department, ~20% (30 people) filled it in. Here's what they said: Read More ›

Teaching Tips
Greg Wilson / 2015-03-15
Last week's post on what's in your bag generated so many useful comments that we'd like to follow it up with another: what tips do you have for new instructors? The ones we've collected so far are listed below; please tell us what else we should tell people who are about to teach for the first time (and what else we should remind experienced instructors about). Read More ›

2015 Post-workshop Instructor Debriefing, Round 5
Sheldon McKay, Rayna Harris / 2015-03-13
At our fifth round of post-workshop debriefing this week, we discussed workshops held at the New York Academy of Sciences, the University of Oslo, and the University of British Columbia. This was a very instructive meeting with important lessons learned from the perspective of both new and veteran instructors. One of the key take-home lessons is that new instructors would benefit from attending an instructor debriefing prior to doing their first workshop. Read More ›

What's In Your Bag?
Greg Wilson / 2015-03-11
What do you have in your knapsack when you travel to teach a workshop? My list is: Read More ›

Weekly Update: Feb 28 - March 6, 2015
Anelda van der Walt / 2015-03-09
Highlights: Do you want to help shape the future of Software Carpentry? The Software Carpentry Foundation is now calling for volunteers to serve on various standing committees. NumFOCUS has been selected as a Google Summer of Code mentoring organisation. Students can apply between 16 and 27 March 2015. A Contributor Covenant was added to our lessons and other repositories to promote a harassment-free environment for contribution to Software Carpentry. Resources: An early release version of Bioinformatics Data Skills is available via O'Reilly. Vince Buffalo's book is highly recommended for both novice and experienced bioinformaticians. Opportunities: Scientific Software Engineer positions are available at the UK Met Office Changes: Daisie Huang will be taking over from Jess Hamrick to maintain our Git lessons with Ivan Gonzalez. Our thanks to Jess for all her help. Read More ›

Shape The Future of Software Carpentry in an SCF Standing Subcommittee
Katy Huff / 2015-03-05
The Software Carpentry Foundation Steering Committee was elected to pursue ambitious goals driven by the voices and expertise in our community. We are therefore thrilled to announce a set of community-driven subcommittees aimed at initiatives essential to the Software Carpentry Foundation. Today, we are asking for representatives from the community to support this effort and shape the future of Software Carpentry. Please step forward to volunteer for the standing committee most important to you. Committee members will be asked to dedicate a few hours per month. To join us, please email the SCF Steering Committee with your name and a few sentences about your interest. Read More ›

Workshop in Krakow
Leszek Tarkowski / 2015-03-04
Last weekend we (Paulina Lach and Piotr Banaszkiewicz, both from AGH; Klemens Noga from Cyfronet and myself from Cztery Bity organized a workshop in Kraków. We decided to do itourselves, as a group of certified Software Carpentry instructors, which was simply a faster and more effective solution than finding an institution which would act as a host. However, we received significant support from a few organisations. The PhD Students' Association at the Jagiellonian University offerred the venue. Thanks to the support provided by ACC Cyfronet AGH and Cztery Bity we were able to offer small catering. We were also supported by great helpers: Iwona Grelowska, Jakub Kruczek and Tomasz Jonak, all from AGH and Marcin Klimek, from Cztery Bity. Read More ›

Funding Software Carpentry Workshops
Noam Ross / 2015-03-04
Software Carpentry workshops are taught by volunteers, but hosts need to fund instructor travel, accommodation as well as our administrative fees of $750-$1250. While these costs are low, it can sometimes be a challenge to get funding for your first workshop, especially if others are unfamiliar with SWC. Here are some successful strategies that hosts have used to fund workshops: Read More ›

NumFOCUS Accepted as Google Summer of Code Mentoring Organization
Raniere Silva / 2015-03-03
A few weeks ago we announced that we would help NumFOCUS apply to Google Summer of Code (GSoC) as a mentoring organization. Yesterday the list of mentoring organization was announced and NumFOCUS had been selected. Read More ›

The Most Viewed PLOS Biology Paper of 2014
Greg Wilson / 2015-03-03
We were very pleased to learn that the most viewed article in PLOS Biology in 2014 was "Best Practices for Scientific Computing", written by Dhavide Aruliah, C. Titus Brown, Neil Chue Hong, Matt Davis, Tommy Guy, Steven Haddock, Katy Huff, Ian Mitchell, Mark Plumbley, Ben Waugh, Ethan White, Paul Wilson, and yours truly. According to Stavroula Kousta, the article has been viewed more than 66,000 times. Read More ›

Ten More Instructors
Greg Wilson / 2015-03-02
It's a pleasure to welcome another ten instructors from the Southern Hemisphere to our team: Read More ›

The First Software Carpentry in Korea
Victor (Kwangchun) Lee / 2015-03-02
Last week, xwMOOC xwMOOC ran the first Software Carpentry workshop in Seoul. The workshop delivered not only an introduction to Unix Shell, Git and GitHub, Python and SQL but also computer science unplugged, rur-ple, python for informatics and cloud basics. Listening to attendees feedback, attendees concerns such as software business, all about what start-ups should know, how software business environment looks like and so on are provided during three days workshops. Read More ›

Adding a Contributor Covenant
Greg Wilson / 2015-03-02
The Software Carpentry Foundation's Steering Committee has voted to add a Contributor Covenant to our lessons and other repositories. Like the Code of Conduct for our workshops, the Contributor Covenant's aim is to ensure that participation in Software Carpentry is a harassment-free experience for everyone, regardless of level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion. Read More ›

Weekly Update: Feb 21 - Feb 27, 2015
Anelda van der Walt / 2015-03-01
Conversations: How do you fund your SWC workshops? Please join the conversation. Should we remove old material when adding new in order to keep the lessons manageable? Should people have to teach the standard material a couple of times before introducing their own? Please contribute your ideas. Resources: Remember to sign up for Software Carpentry's mailing lists. They are rich resources for scientific computing. Great R cheat sheets available at RStudio include sheets for dplyr, tidyr, Shiny, and R Markdown. A preview release of RStudio v0.99.315 is now available for testing and feedback. New features include for example the removal of the 1K row limit in Data Viewer. IPython 3.0 has been released - congratulations to the whole team. So you think you can data? The rOpenSci Hackathon is going to host a friendly data challenge between Hadley Wickham and Wes McKinney - please follow this GitHub issue for news. Quotable: "This is @swcarpentry: try to teach enough in two days that things people find on the internet make a little more sense." @jiffyclub Read More ›

Eleven New Instructors
Greg Wilson / 2015-02-28
It's a pleasure to welcome another eleven instructors to our team: Read More ›

Wrong Is Useful: Lessons as Packages
Greg Wilson / 2015-02-27
"What would Greg do? [pause] OK, now that we've ruled that out..." — overheard I wrote a post last July about using package managers like RPM, Homebrew, and Conda to track dependencies between lessons, so that a student could say something like conda install unit_testing and get a lesson on unit testing, along with the code, sample data, and other lessons it depends on. I also mused that it could help make research more reproducible: after all, a paper is just a lesson on something that's never been taught before. Read More ›

2015 Post-workshop Instructor Debriefing, Round 4
Sheldon McKay / 2015-02-27
We had our fourth post-workshop debriefing of the year this week, in which we discussed recent workshops at Jagiellonian University in Krakow, Poland, and Michigan State University. All four instructors from the Krakow workshop attended the debriefing and we had a fairly in-depth discussion about that workshop Read More ›

Improving Instruction
Greg Wilson / 2015-02-27
It's been quite a year for Software Carpentry instructor training: we had a great session at UC Davis (see also this excellent post from Rayna Harris) and another at the Research Bazaar kickoff in Melbourne. We've also started the twelfth round of online instructor training with 75 participants, making it our largest yet. All of this has led to a flurry of activity in our GitHub repositories. The comments and pull requests are very welcome, but we need to keep three things in mind: We already have more material than we can cover in two days. We are not our learners. No lesson survives its first presentation intact. Read More ›

Weekly Update: Feb 14 - Feb 20, 2015
Anelda van der Walt / 2015-02-22
Highlights: We're looking for mentors and projects for Google Summer of Code: please get in touch if you have questions or suggestions. Research Bazaar was again a highlight on Twitter this week, and Damien Irving summarised plans for Software Carpentry in Australia and New Zealand. Conversations: An interesting conversation about "Library Carpentry" started by James Baker took place on Twitter. New resources: A new Browsercast tutorial "Creating a workshop website" was published. Please let us know if you have any comments. An early release of "Effective Computation in Physics" has been made available. The book is relevant to scientists of all kinds Opportunities: SciPy call for participation is open. Please submit your proposal for tutorials, talks, or posters. Call for Data Science Fellow applications through the Berkley Institute for Data Science is now open. Read More ›

Applying to Google Summer of Code
Raniere Silva / 2015-02-21
During last week's Steering Committee meeting we talked about applying to Google Summer of Code as a mentoring organization and decided to contact NumFOCUS to suggest that they applying as a umbrella organization for all the projects they support. NumFOCUS agreed, and for the past few days we have been racing to fill in the application. Read More ›

Software Carpentry Set to Explode in Australia and New Zealand
Damien Irving / 2015-02-21
Given the global success of Software Carpentry (10,000 learners and counting), it's easy to forget that the first two-day workshops were held only three short years ago. Excited by the potential of the project in those early days, Josh Madin (Macquarie University) and myself (University of Melbourne) pooled our funds and got Greg Wilson out to Australia to run the first ever workshops outside of Europe and North America. Since those initial Sydney and Melbourne workshops in February 2013, an additional 15 have been held around Australia and New Zealand and there are a dozen or so active local instructors. Read More ›

Managing GitHub Notifications
Noam Ross / 2015-02-20
SWC has many GitHub repositories for lessons, websites, and workshops, and we have many conversations about our work take place through GitHub issue threads. Since GitHub creates e-mail notifications of repositories and issues you participate in, that means you can quickly start getting loads of e-mail. While it's helpful to be kept abreast of developments on a project, if you get a flood of GitHub messages you won't see what's important. Here's a quick primer on fine-tuning your notifications so that you only see what you want, gleaned from this helpful conversation: Read More ›

Workshop at the University at Albany, SUNY
Thomas Guignard, Jeramia Ory / 2015-02-16
We recently ran a workshop hosted by the University at Albany, SUNY's Department of Informatics. The workshop ran on the weekend of January 31 and February 1, 2015 with the following schedule: Day 1 AM:Automating tasks with the Unix shell (instructor: Thomas Guignard) Day 1 PM:Building programs with Python (instructor: Jeramia Ory) Day 2 AM:Version Control with Git (Jeramia) Day 2 PM:Managing data with SQL (Thomas) Read More ›

Weekly Update: Feb 07 - Feb 13, 2015
Anelda van der Walt / 2015-02-13
Highlights: NeSI (New Zealand) became a SWF affiliate. SWC instructor training at the Research Bazaar in Australia created a lot of activity on Twitter this week - 50 trainees attended the workshop. Conversations: Daniel Chen made his "Project Cookie Cutter" bash script available and triggered a conversation about this. Please get in touch with us via GitHub or email us if you use cookie cutter project templates or have comments. Vacancies: The National Ecological Observatory Network (NEON) is looking for a Science Educator/Evaluator in Boulder, Colorado. Read More ›

Online Scientific Collaboration: The Sequel
Greg Wilson / 2015-02-12
Jon Udell's Internet Groupware for Scientific Collaboration taught me how to think about the web. He's now revisiting that report, and would like our help. Details are below; please give him a shout if you can help. Read More ›

NeSI Becomes Software Carpentry Affiliate
Greg Wilson / 2015-02-12
We are very pleased to announce that New Zealand eScience Infrastructure (NeSI) has become an affiliate member of the Software Carpentry Foundation. Read More ›

Science Educator/Evaluator Position at NEON
Greg Wilson / 2015-02-12
The National Ecological Observatory Network (NEON) is looking for a Science Educator/Evaluator (SEE). This person will coordinate development and implementation of NEON's Education and Public Engagement (EDU) program/product evaluation plans. The program includes web-based educational resources, university student programs, and citizen science programs, as well as learning resources for university students and faculty. The SEE will coordinate the development of measures for assessing implementation and outcomes of existing programs and new programs as they are developed. The SEE will also work with scientists/science educators to develop learning modules that are pedagogically appropriate for undergraduate students and assessment tools that enable evaluation of the effectiveness of these learning modules. An important part of this effort will be coordination of collection and analysis of data to assess NEON EDU program impact on a variety of audiences (including university students and faculty, scientists, the general public, educators (formal and informal), and other program stakeholders). The SEE will work closely with NEON EDU management and with contractors hired to provide external evaluation of EDU program goals. The SEE will work out of NEON HQ offices in Boulder, Colorado. For more information, please see the full advertisement. Read More ›

2015 Post-workshop Instructor Debriefing, Round 3
Greg Wilson / 2015-02-11
We had our third post-workshop debriefing of the year yesterday, in which we discussed several recent workshops. The most important point was probably that workshops are running more smoothly today than they did a year ago, even for first-time instructors. Read More ›

Cookie Cutter
Daniel Chen / 2015-02-10
I was first introduced to William Stafford Noble's paper "A Quick Guide to Organizing Computational Biology Projects" when Ivan Gonzalez and I taught Harvard last November. Noble describes how scientists in Computational Biology should set up their project folders so code, results, outputs, figures, and papers are all in easily understandable locations. He also writes about how one should run experiments (using driver scripts) to make workflows reproducible, readable, and understandable to others (and your future self). Read More ›

Weekly Update: Jan 31 - Feb 06, 2015
Anelda van der Walt / 2015-02-09
Highlights: SWF gets 2 new affiliates (iPlant and Lab for Data Intensive Biology at UC Davis) and a new partner (University of Washington). Conversations: "...what matters most in teaching is the act itself - the verb": some observations on teaching and a book recommendation. Scientific coding and software engineering: what's the difference? A post by Daisie Huang: please add your comments. Rewarding Software Sharing by Mapping Scientific Software A post by Chris Bogart: please share your ideas. Events: SWC instructor training and Intro to Web Programming at Lawrence Berkeley Lab: March 2015 rOpenSci Unconference: March 2015 Software Sustainability Institute's Collaborations Workshop: March 2015 Other News: Eight new instructors Read More ›

Plot This
Greg Wilson / 2015-02-09
The most useful little guide to visualization I've ever found is the decision tree created by Andrew Abela, which you can find here. Do you want to show a comparison, a distribution, a relationship, or a composition? If it's a comparison, is it among items or over time? Each choice leads to a different kind of plot, and while you may not agree with all the choices, it makes the reasoning behind them concrete. Read More ›

Scientific Coding vs. Software Engineering
Greg Wilson / 2015-02-08
Daisie Huang recently wrote a great article for the Software Sustainability Institute's blog titled "Scientific coding and software engineering: what's the difference?". As a professional programmer who has become a scientist, rather than a scientist who's learned how to program, she has a fresh take on the differences between what those two groups do. Comments on her post would be very welcome. Read More ›

Rewarding Software Sharing by Mapping Scientific Software
Chris Bogart / 2015-02-07
Sharing software you write with other scientists can magnify the impact of your research, but there can be a surprising amount of sometimes thankless extra work involved. I work with a group at Carnegie Mellon's Institute for Software Research who have been asking scientists what that extra work is, and what motivates them to do it—despite a sometimes uncertain link between that extra work and the ways many of them are evaluated in their jobs. We're looking at ways of measuring and mapping software and its impacts, in order to help scientists demonstrate the positive impact that their work on shared software has on science. We're running an experiment that you can help with. Read More ›

2015 Post-workshop Instructor Debriefing, Round 2
Sheldon McKay / 2015-02-05
Greg Wilson and I recently hosted a second post-workshop debriefing session for January to capture more experiences and lessons learned from instructors in the field. This meeting was attended by 10 instructors covering four recent workshops. We discussed how the workshops went, what worked, what didn't, and what could be improved. Read More ›

Workshop in Illinois
Neal E. Davis / 2015-02-03
Last week (29-30 January), Matthew Turk (NCSA), Ivan Gonzalez (Martinos Center), David LeBauer (UIUC, and a new SWC instructor), and I (Neal Davis, CSE, UIUC) taught a SWC workshop co-sponsored by NCSA and CSE. We introduced two innovations into the workshop model different what that used in prior events I've participated in: Read More ›

Welcome Our Newest Instructors
Greg Wilson / 2015-02-03
It's a pleasure to welcome another eight instructors to our team: Read More ›

University of Washington Becomes Software Carpentry Partner
Greg Wilson / 2015-02-03
We are very pleased to announce that the eScience Institute at the University of Washington has become a partner of the Software Carpentry Foundation. Read More ›

Software Sustainability Institute's Collaborations Workshop 2015
Shoaib Sufi / 2015-02-03
The Software Sustainability Institute's Collaborations Workshop 2015 (CW15), which will be held on March 25-27, 2015 in Oxford, UK, focusses on software, best practice and the social side of working past the boundaries of traditional disciplines and roles to accelerate research outcomes—or put differently, interdisciplinarity done right! What will you learn by attending: The social and technical sides of interdisciplinary working. Examples of best practice of when it's been effective - and horror stories of when it has not. Examples of techniques and technologies that could be applied from one discipline to another - discipline hopping. Read More ›

Workshops in March at Lawrence Berkeley Lab
Greg Wilson / 2015-02-02
I will be teaching a two-day Software Carpentry instructor training course on March 10-11, 2015 at Lawrence Berkeley Lab, and a one-day course on web programming on March 13, 2015. Details are given below; the instructor training is open to LBL staff and to graduate students and staff associated with the Berkeley Institute for Data Science, while the web programming class is reserved for LBL staff alone. Read More ›

Weekly Update: Jan 24-30, 2015
Anelda van der Walt / 2015-02-02
Highlights: New steering committee selected We have now taught ten thousand people Conversations: Keeping momentum going after workshops Factors and formulae in R lessons Other News: First new instructors of 2015 Our first workshop in South Korea Read More ›

Lab for Data Intensive Biology at UC Davis Joins Software Carpentry as an Affiliate
Greg Wilson / 2015-02-02
We are pleased to announce that the Laboratory for Data Intensive Biology at UC Davis has joined the Software Carpentry Foundation as an Affiliate Member for three years starting in 2015. "We've been long-term supporters of Software Carpentry, and Affiliate status lets us support the Software Carpentry Foundation in a tangible way," said Dr. C. Titus Brown, the lab director. "This status also gives us the opportunity to include Software Carpentry as part of a larger biological data science training program at UC Davis." Read More ›

rOpenSci Unconference in March 2015
Greg Wilson / 2015-02-02
The good folks at rOpenSci have just announced that their second annual unconference (which will actually be more like an unhackathon) is happening at GitHub's headquarters in San Francisco in March 2015. They're also hosting a Data Science Social on March 26 - please see their site for details. Read More ›

Our First Workshop in South Korea
Greg Wilson / 2015-02-02
We are very pleased to announce that our first workshop in South Korea will be taking place in Seoul later this month. Many thanks to Kwangchun (Victor) Lee for organizing and teaching, and to the Korea Radio Promotion Association for hosting. Read More ›

Nouns and Verbs
Greg Wilson / 2015-02-02
I've spoken and written many times about how puzzled I am that massive, open collaboration on lessons is so rare in the age of Wikipedia and open source software development. Hundreds of people have helped build the Wikipedia articles on Marvel Comics and the planet Mars, and hundreds more have helped build things like the Django web programming framework; why then are teachers still writing all their own slides and handouts? As is frequently the case, the answer might be that I've been asking the wrong question. When I brought this up last week in an online call organized by the Open Knowledge Foundation, Phil Barker said: "One difference between Wikipedia and OER [open educational resources] is that editing Wikipedia is about moving to consensus on facts, [but] editing teaching material is about tailoring it to specific local requirements (different teaching styles, different students, different curriculum standards)." Read More ›

iPlant Becomes Software Carpentry Affiliate
Greg Wilson / 2015-02-02
We are very pleased to announce that the iPlant Collaborative has become an Affiliate Member of the Software Carpentry Foundation. Read More ›

Announcing 2015 Steering Committee
Greg Wilson / 2015-01-31
The election for the Software Carpentry Foundation's Steering Committee is now complete. 122 ballots were exercised out of 179 mailed out, and our new committee is: Read More ›

Interim Steering Committee Meeting: Dec 16, 2014
Greg Wilson / 2015-01-30
Software Carpentry Foundation Interim Board Meeting: Dec 16, 2014 Read More ›

Data Carpentry Genomics and Assessment Hackathon
Tracy Teal / 2015-01-28
If you're working or interested in genomics or assessment, we hope you'll consider applying for our upcoming Data Carpentry Genomics and Assessment hackathon. We're very excited about this event and the opportunity to develop lessons targeting genomics researchers and build assessment into the Data Carpentry curriculum. Travel support is available, so please apply to participate! It's a short application, and the deadline is this Friday, January 30th. If you have any questions about the event, please let us know. Dates: March 23-25, 2015 Location: Cold Spring Harbor Labs, NY Call for Participation Application Read More ›

Cast Your Vote
Greg Wilson / 2015-01-26
Voting is now open for Software Carpentry's new Steering Committee. If you are a member, you should have received a ballot by email from elections@electionbuddy.com. If you did not, please check your spam folder; if it is not there, please get in touch and we will sort it out as quickly as we can. Read More ›

Welcome Our First New Instructors of 2015
Greg Wilson / 2015-01-24
One of the best parts of this job is welcoming new instructors to our team. Many of this year's first group have already helped with workshops, and I hope to have a chance to teach with all of them before too long. Read More ›

The Other Ninety Percent
Greg Wilson / 2015-01-24
Ninety percent or more of learning a skill takes place outside formal lessons as people try things out for themselves and turn attention into habit. This works best if a mentor is on hand to answer questions and provide feedback, but our workshop format doesn't lend itself to that: in most cases, our instructors are back on a plane (or back in their own lab) as soon as they're done teaching, so our learners have to make sense of what they've just been shown on their own. Based on follow-up discussions and Jory Schossau's work, here are things they stumble over in order of increasing pain: Read More ›

University College London Becomes Software Carpentry Affiliate
Greg Wilson / 2015-01-22
We are very pleased to announce that University College London has become an affiliate of the Software Carpentry Foundation. We are grateful for their support, and look forward to working with them more closely. Read More ›

Improving the Balance
Greg Wilson / 2015-01-22
Jennifer Martin's recent article "Ten Simple Rules to Achieve Conference Speaker Gender Balance" reminded me that while we've been reporting the number of workshops we've run, and the number of people who've attended, we haven't reported the gender balance among our instructors in a while. I did a quick check, and the result was sobering. From a high of almost 30% eighteen months ago, our instructor pool is now only 18% female. It seems that as we grow, we are regressing to computing's unfortunate mean: from a higher of 38% in the early 1980s, the proportion of computer science degrees awarded to women has dropped to 15-18%, and CS is the only STEM discipline where the gender balance has actually been getting worse over the past few decades. Read More ›

Call for ELIXIR Node Coordinators
Aleksandra Pawlik / 2015-01-22
One of the 2015 ELIXIR Pilot Projects focuses on supporting training in collaboration with the Data and Software Carpentry initiatives. The pilot "Working up and building the foundation for Data Carpentry and Software Carpentry within ELIXIR" includes a series of initial events across different ELIXIR Nodes. In order to achieve the goal of running Data and Software Carpentry (DC/SWC) workshops, promoting the DC/SWC teaching model, developing and sharing of the training materials, it would be useful if not essential, that each ELIXIR Node has a DC/SWC Coordinator. Read More ›

Workshops in Oxford
Aleksandra Pawlik / 2015-01-21
The Wellcome Trust Centre for Human Genetics at the University of Oxford hosted its first Software Carpentry workshop on 13th and 14th January. Instructor and co-organizer Philip Fowler has blogged about it, and included both photographs and some bubble charts of the feedback he received. Separately, James Allen has blogged about the first and second days of a separate workshop that he ran at Oxford for people in atmostpheric, oceanic, and planetary physics. This was James' first time teaching for us, and his summary of things that he thinks could be improved includes some ideas that we'll fold back into the main notes. Read More ›

Post-Workshop Instructor Debriefing, Round 1
Sheldon McKay / 2015-01-21
In the first two weeks of January 2015, 14 Software Carpentry workshops were held at various locations. Every workshops brings its share of lessons learned and new experiences, particularly with new instructors coming online. On January 14, Greg Wilson hosted a post-workshop debriefing meeting attended by 21 instructors from 10 recent workshops. We got together to discuss how the workshops went, what worked, what didn't, and what could be improved. Read More ›

Feedback from Workshop at UFSC
Diego Barneche / 2015-01-20
Last December we ran another R-based SWC workshop at University of Santa Catarina (Florianòpolis, SC, Brazil). The primary target audience encompassed grad students from the Biology Faculty who had some experience in programming and R. Raniere Silva (an undergrad in Applied Maths at UNICAMP, Campinas, SP, Brazil) co-taught the workshop with me and the Ecology grad students Renato Morais Araùjo and Juliano Bogoni (Department of Ecology and Zoology) acted as our hosts and in situ organizers. Read More ›

Orwell, Dickens, and How We'll Know We're Done
Greg Wilson / 2015-01-17
I started working on a short capstone example last month to show learners how to get a badly-formatted reference list out of an Excel spreadsheet and into a relational database so that it would be easy to answer questions like, "Who has co-authored papers with whom?" I'd like to work up another capstone as well, but there's a problem: I can't actually do it myself for reasons that are both technical and political. Read More ›

2015 Election: Adina Howe
Adina Howe / 2015-01-16
Hi! My name is Adina Howe, and I thank you for considering my qualifications to serve on the Steering Committee of the Software Carpentry Foundation. Software Carpentry has played a major role in my professional development, and I would argue played an integral role in my ability to fulfill my dreams as becoming a tenure-track professor. In 2008, I participated in a Software Carpentry workshop at Michigan State University as a postdoc. At the time, my research required some computational prowess that I did not possess. SWC provided me a foundation, resources, and mentors to help me improve my skills and discuss my frustration and eventually success! This impact (and great instruction and mentorship) insprired me to join as one of the first round of instructors (2011-ish). For more details, please feel free to peruse my complete vitae. Read More ›

2015 Election: Ivan Gonzalez
Ivan Gonzalez / 2015-01-16
I've been an instructor since September 2013 and have taught 10 workshops so far. I'm finishing a master's in Science Communication and work at the Martinos Center for Biomedical Imaging. I would propose three things to the new Committee: improve our outreach and communication strategies, expand Software Carpentry to other regions and languages, and strengthen the mentoring program for new instructors. Read More ›

2015 Election: Jonah Duckles
Jonah Duckles / 2015-01-16
Hi, I'm Jonah Duckles, I have been active in teaching Software Carpentry workshops (seven so far) and have found a joy in teaching and learning this curriculum that has made me passionate about the mission of the organization. In my job, I work to spread knowledge, skills and competency in wide ranging areas of scholarly computing. Serving on the Software Carpentry Steering Committee would be a natural extension of my current job, dovetailing well into my current responsibilities and I'm excited to be considered. Read More ›

2015 Election: Tim Cerino
Tim Cerino / 2015-01-16
On the first day of my first programming job, my boss handed me 700 lines of spaghetti code written in PL/1 and asked me to "fix it." On paper, I was a Research Assistant at an Economic Policy think-tank. I had taken a few Computer Science classes and had coded for fun as a kid and in college. But at that moment I felt that I had entered the world of "real" coding. (Ultimately the "fix" was to completely discard the code and redo everything using proper testable modular code, but I digress...) Read More ›

2015 Election: John Blischak
John Blischak / 2015-01-15
I feel I would be a good addition to the Steering Committee of the Software Carpentry Foundation because 1) I am a scientist that had to learn programming on the job, just like our target audience, 2) my experience performing many roles within SWC, most importantly as the maintainer of the R lesson materials, and 3) my goal to increase the collaborative aspect of our lesson development process. Read More ›

2015 Election: Jason Williams
Jason Williams / 2015-01-14
"You can trust in Jason, he's your friend!" — Decent People I can't recall ever running in an election - rule by fiat is what "just feels right" to me. I have on occasion thought about running for office but I usually lack ambition (although I'm attracted to power). The fantasy of being the candidate people vote for even though he is clearly against their interest just makes my pedipalps tingle. So, when I learned about the Software Carpentry Foundation election, I thought, "let's try that!" Read More ›

2015 Election: Jeramia Ory
Jeramia Ory / 2015-01-14
At SciPy 2013, I had the good fortune to attend a tutorial lead by Matt Davis and Katy Huff. As an undergraduate educator in the sciences, I was immediately impressed with the people and excited to try their approach with my students. Matt encouraged me to complete instructor training, which I did in April of 2014. Read More ›

2015 Election: Sheldon McKay
Sheldon McKay / 2015-01-14
I started my career as a molecular biologist working in the data-intensive field of functional genomics, enventually becoming a full-time bioinformatician. With the perspective of a person who transitioned from the wet lab to informatics, I have often served as a liaison between researchers and software developers and truly enjoy empowering scientists by teaching them computing skills to accelerate their research. Over the past decade, I have contributed to a variety of outreach and training efforts in scientific computing. When I learned about Software Carpentry, its mission resonated with me and I wanted to get involved. I completed instructor training in May 2014 and have been an instructor at six workshops since then and continue to do about one workshop every six weeks. I also serve as a volunteer topic maintainer for SQL. I am very proud to be associated with the Software Carpentry Foundation and have a lot of experience to offer. I would like to contribute more to our ongoing success by becoming a member of the steering committee. Read More ›

2015 Election: Karin Lagesen
Karin Lagesen / 2015-01-14
I have been involved with Software Carpentry since I attended a workshop in Oslo, Norway, in 2012. I started out as a student, then went through the second round of instructor training, before being an instructor at my first workshop in 2013. Since then, I have organized one workshop and been an instructor at five others. I have also been involved with revamping the pre-assessment forms. This work has been very fulfilling and challenging on very many levels, and I now want to expand on my efforts by volunteering for the steering committee. Read More ›

Practical Computing for Biologists (and Other Scientists)
Greg Wilson / 2015-01-14
We are big fans of Steve Haddock and Casey Dunn's book Practical Computing for Biologists, which covers everything we do and more (and is suitable for all kinds of scientists, not just biologists). We were therefore very pleased to learn that they are running a course at Friday Harbor Laboratories this summer. The outline is below, and applications are due on February 1st. Read More ›

Language Wars and Others
Greg Wilson / 2015-01-14
We often get asked, "Why do you teach [X]? Why don't you teach [Y]?" where X and Y are random permutations of Perl, Python, R, MATLAB, Julia, C++, and Javascript, or equally random permutations of different version control systems or text editors. There are three answers: Read More ›

Thanks to RStudio
Greg Wilson / 2015-01-13
Our thanks to RStudio for their generous donation to help us run Software Carpentry workshops. Inspired by the innovations of R users in science, education, and industry, RStudio develops free and open tools for the R community. These include the RStudio development environment as well as the shiny, ggvis, and dplyr packages (among many others). RStudio also offers enterprise-ready professional products to make it easier for teams to scale and share work. Read More ›

2015 Election: Aleksandra Pawlik
Aleksandra Pawlik / 2015-01-13
My first degree was in Computer Science. When I started the university I had very little idea about programming. I struggled a lot. I graduated very discouraged and convinced that I just couldn't do it. But then eight years later Greg asked me if I'd like to be a Software Carpentry instructor. I finished the instructor training and I realized that had I been taught differently, maybe now I'd have a great career as a software developer. But then, I wouldn't be writing this blog post. Read More ›

Introducing Software Carpentry
Greg Wilson / 2015-01-13
As we said in December, we're planning to use the latest incarnation of Browsercast as a web-native alternative to screencasting for recording and presenting our slideshows. Katy Huff has just recorded the first of these, which you can watch on GitHub. We'd be very grateful for feedback both on its content and on Browsercast in general. Read More ›

2015 Election: Raniere Silva
Raniere Silva / 2015-01-12
A year and a half ago I read the announcement of the Mozilla Science Lab by Mark Surman and that way I discovered Software Carpentry. Months later, I was participating in the 6th round of Instructor Training, and since then I have: taught at all 6 workshops held in Brazil, improved the lessons and helped new instructors do so too, suggested changes to the new layout of our lessons and helped in the transition, and publicized Software Carpentry. Volunteering with Software Carpentry is one of the things that I'm really proud of doing and I'd be honored be a member of the SCF Steering Committee. Read More ›

January 2015 Lab Meeting
Greg Wilson / 2015-01-12
We will hold our next online lab meeting at 11:00 and again at 19:00 Eastern time on Thursday, January 22, 2015. People who are standing for election will have a chance to talk about what they hope to do and to answer questions from other members, and we'll present updates on our finances and on our new workshop and lesson templates. If you're planning to attend, please add yourself to this Etherpad. We look forward to seeing you then. Read More ›

Instructor Training at UC Davis
Greg Wilson / 2015-01-12
On Tuesday and Wednesday of last week, we ran a live instructor training class at UC Davis. Over 40 people from all across the country got a lightning introduction to the basics of educational psychology and instructional design, and had a chance to hear how and what we teach. Read More ›

2015 Election: Katy Huff
Katy Huff / 2015-01-09
Software Carpentry has been part of my life for over six years, I began as an instructor, organizer, and curriculum developer and have been honored to grow as a researcher alongside Software Carpentry as it evolved over that time. In these years, I have organized and taught over a dozen workshops and have, for the last four months, served as an interim Steering Committee member. I have also made an effort to expand the reach of Software Carpentry as a co-author of both the Best Practices paper and a new O'Reilly book, "Effective Computation in Physics: Field Guide to Research in Python." Of course, none of this pays the bills. For that, I am a nuclear engineer... Read More ›

2015 Election: Matt Davis
Matt Davis / 2015-01-09
In January 2012 I had the good luck to be in one of Greg Wilson's first workshops after he rebooted Software Carpentry in the current two-day format. I ended up as a helper in that workshop, and a month later I was in Toronto teaching Python as Software Carpentry's first volunteer instructor. Over the past three years I have: taught at more workshops than I can remember helped transition our materials from SVN to GitHub written lessons and ipythonblocks recruited and mentored new instructors publicized Software Carpentry at conferences Volunteering with Software Carpentry has been one of the most rewarding things I've ever done and I'd be honored to continue helping as a member of the SCF Steering Committee. Read More ›

Research Software Engineer Position at the Oxford e-Research Centre
Aleksandra Pawlik / 2015-01-06
Research Software Engineer position available at the Oxford e-Research Centre, University of Oxford, UK (Java, Python, Web-app, Database, Semantic-web). Read More ›

2015 Election Nominations
Greg Wilson / 2015-01-05
As we announced last year, an election will be held on January 26-30 for the seven positions on the Steering Committee of the Software Carpentry Foundation. If you are a qualified instructor who has taught at least twice in the past two years, or have done a significant chunk of non-teaching work for Software Carpentry, you can both stand for election and vote. We strongly urge you to consider standing: if you're willing and able to commit to giving the Foundation 3 hours a week, you'll help thousands of scientists. It'll be fun, too: few things in life are more satisfying than working with a dedicated bunch of people to make something useful happen. In order to stand for election, you must write a blog post to introduce yourself to the community by Friday, January 16. This post must be around 500 words long, can be written in any format (e.g. question and answer, paragraph text), and must be titled, "2015 Election: Your Name". It should explain: what your background is, what your previous involvement with Software Carpentry has been, and most importantly what you will do as a member of the Steering Committee to contribute to the growth and success of Software Carpentry. Read More ›

2015 Election: Damien Irving
Damien Irving / 2015-01-05
Damien Irving has withdrawn from the election in order to focus on completing his thesis. We're grateful for his work on the interim Steering Committee, and wish him the best of luck with his PhD. Read More ›

The Future and Funding of Science
Greg Wilson / 2015-01-04
I was talking with friends over the holiday about the future of science and how it might one day be funded. Since it'll be ten years before I'm proven wrong, it seems like a good topic with which to start the new year. Read More ›

Projects, Projects, Projects
Greg Wilson / 2014-12-28
We have updated our projects page with links to: things we're building ourselves, and things that our members are building. The first list includes the templates for lessons and workshop websites, a a tool for managing workshops (for which we're using Django), the latest version of Browsercast, and more. The second list has everything from active papers to utilities for simulating vacuum thermionic energy conversion devices. If you'd like to help with the first, or if you're a member and would like your project listed in the second, please get in touch. Read More ›

Welcome Aboard
Greg Wilson / 2014-12-23
A lot of people qualified as instructors this fall and winter, thanks in part to the live sessions we ran in Charlottesville, Norwich, and Seattle. They join the 86 other people who received their badge this year; we look forward to seeing them all run workshops before 2015 is over. Read More ›

Interim Steering Committee Meeting: Dec 2, 2014
Greg Wilson / 2014-12-19
Software Carpentry Foundation Interim Board Meeting: Dec 2, 2014 Read More ›

Standing for Election
Greg Wilson / 2014-12-18
From 26-30 January, an election will be held for the seven vacant positions on the inaugural Steering Committee of the Software Carpentry Foundation. This will be one of the biggest steps in the project's journey from two guys staying up until 3:00 am fourteen years ago to write lessons on Perl for scientists at Los Alamos to a mature open project run by the volunteers it belongs to. If you are a qualified instructor who has taught at least twice in the past two years, or have done a significant chunk of non-teaching work for Software Carpentry, you can both stand for election and vote. We strongly urge you to consider standing: if you're willing and able to commit to giving the Foundation 3 hours a week, you'll help thousands of scientists get more done in less time and with less pain. It'll be fun, too: few things in life are as rewarding as building something, and our members are building something extraordinary. In order to stand for election, you must write a blog post to introduce yourself to the community by Friday, January 16 (i.e., a full week before the start of the election). This post: must be around 500 words long, can be written in any format (e.g. question and answer, paragraph text), and must be titled, "2015 Election: Your Name" You can submit your post as a pull request to this website's repository or by email. It should explain: what your background is, what your previous involvement with Software Carpentry has been, and most importantly what you will do as a member of the Steering Committee to contribute to the growth and success of Software Carpentry. The last point is the most important. If you have experience managing money, we need a Treasurer; if your passion is helping new instructors or figuring out how well we're doing, we need people to lead mentorship and assessment, while if you come from a part of the world that hasn't seen much Software Carpentry activity yet, you might want to take the lead in getting us going there. (Actually, the point that's really most important is that everyone will still be very welcome to volunteer in other ways, and that doing so will be as valuable as ever. We will still need topic maintainers, help with the website, and many other things, and I hope that having more people coordinating things will actually make it easier for you all to lend a hand.) If seven or fewer nominations are received, those people who nominated will be automatically appointed to the Steering Committee and no formal election will be held. Vacancies on the Steering Committee can be filled at any time at the Committee's discretion. The regular positions on the Steering Committee (Chair, Vice-Chair, Secretary, Treasurer and then any others the Committee feels it needs) will be decided by a vote at the Committee's first meeting. Read More ›

All I Want for Christmas is a Pull Request...
Greg Wilson / 2014-12-18
As we said back in October, we're splitting the existing lesson repository into smaller and more manageable pieces. To do that, we have defined a new template for lessons, and have been extracting the history of the existing material from the current repository. (We wanted to get the entire history of each lesson so that people would receive credit for the work they've done.) The second step has taken longer than planned, but we now have all of the core novice lessons in repositories of their own: Read More ›

Who Are We?
Greg Wilson / 2014-12-15
For the last three years, I've been storing information about instructors, workshops, and other things in a small SQLite database so that I can look things up and generate statistics when I need to. I can't publish it, since it contains personal identifying information, but since I had to write a script to migrate the data to the tool we're building to manage workshops, it only took another few minutes to create a partly-redacted version of the data. ("Partly" because someone who was really keen could work backward workshop URLs to instructors' names, cross-reference, and recover the names of some fraction of our instructors. Since that information is all public anyway, though, I don't think I've introduced any new risks.) The SQL source for the database is here; with it, you can regenerate the database using: Read More ›

Guidelines for Extracting History
Aaron O'Leary / 2014-12-15
As discussed previously, we are currently extracting individual lessons from the bc repository to make them more modular, which will ease use, contribution, and maintenance. This post presents some guidelines for extracting individual lessons and how to contribute to lessons in the meantime. Read More ›

Feedback from WiSE in Krakow
Aleksandra Pawlik / 2014-12-13
A week ago Krakow in Poland hosted the Software Carpentry workshop for Women in Science and Engineering. The registration filled up within 72 hours which clearly shows the need for such events. Read More ›

UCL Research Software Dashboard Developer
James Hetherington / 2014-12-13
The University College London Research Software Development Initiative is seeking a full-stack web developer to work on its Research Software Dashboard project from January 2015 to July 2015. This is a new project, starting from scratch, to develop software to curate, promote, and manage the University's wide portfolio of cutting-edge scientific and scholarly software. The project will provide an overview of the research software output of the college for scientists, managers, funders, investors and clients, including both open-source software and software being commercialised through the university's business and consulting arms. It will integrate with the University's code management infrastructure, based on GitHub Enterprise, software testing infrastructure based on Jenkins, and commercial software sales platform e-Lucid. Software is an increasingly important scholarly output for research alongside publications, and this project will help retain UCL's leadership in this important aspect of twenty-first century research. Those interested in being involved in this important project on a freelance or contractor basis should get in touch with James Hetherington (j.hetherington@ucl.ac.uk) for more information. For more information, please see the full description. Read More ›

UC Berkeley Postdoctoral Position in Nuclear Engineering
Rachel Slaybaugh / 2014-12-13
The Department of Nuclear Engineering at the University of California, Berkeley is searching for one, possibly two high-caliber researchers to work with Prof. Rachel Slaybaugh's group in the area of computational neutronics. Fields of highest relevance are Computer Science, Applied Mathematics, and Nuclear Engineering. Prof. Slaybaugh's group researches methods and algorithms for solving the Boltzmann transport equation more effectively. These methods are often inspired by the physics of the problem at hand, developments in computer hardware, or both. Ongoing work involves deterministic solution methods, Monte Carlo methods, and hybrid methods in which deterministic solutions are used to accelerate Monte Carlo solutions. Potential Projects include Angle-informed Hybrid Methods, Deterministic "Plug-and-Play" Research Environment Creation, Improving Multigroup Cross Sections for Hybrid Methods and Monte Carlo on Graphical Processing Units We seek a candidate who can start this position as early as January 2015 but the appointment start date is flexible. The initial appointment is 100% time for one year, with the possibility of renewal for a second year, dependent upon job performance and funding. Starting salaries are in the range of $50,000 to $60,000 per year and commensurate with qualifications and experience. For more information, please see the full description Read More ›

Results of Software Sustainability Institute Survey
Greg Wilson / 2014-12-13
The Software Sustainability Institute's recent survey of researchers at top UK universities is out. Headlines figure are: 92% of academics use research software 69% say that their research would not be practical without it 56% develop their own software (worryingly, 21% have no training in software development) 70% of male researchers develop their own software, and only 30% of female researchers do For the full story, see this post on their blog. Read More ›

Feedback from the MSc Clinical Bioninformatics Workshop
Aleksandra Pawlik / 2014-12-13
Early in November we put on Software Carpentry for the students of the MSc course in Clinical Bioinformatics run by the University of Manchester, UK and the British National Health Service (NHS). The course combines academic curriculum with work-based programme. The students (who already are qualified professionals) are based at various clinical units in the UK and meet only a few times to attend short intense training sessions. The instructors at the Software Carpentry workshop were Aleksandra Nenadic, who taught for the first time, and myself; we were helped by Mike Cornell and Andy Brass. Read More ›

Templates: We Live, We Learn
Greg Wilson / 2014-12-09
We have partially converted four of our core lessons to the new lesson template, and are making a few tweaks as a result. The most important of these is to move the source files for a lesson's web pages out of a pages sub-directory and into the root directory, so that (for example) the first topic in a lesson will be ./01-intro.md rather than ./pages/01-intro.md. Doing this means that the Markdown source files will be in the same directory as the HTML pages compiled from them. That's generally considered a bad thing, since (a) it makes it harder for people to tell source files from generated files and (b) the chances of accidentally deleting source files when you just meant to delete generated files goes up significantly. We think it's the right choice in this case because: Read More ›

Software Carpentry Returns to Edinburgh
Mike Jackson / 2014-12-08
Last week, EPCC's ARCHER training team ran another Software Carpentry workshop here in Edinburgh, on 3rd and 4th of December. The workshop provided attendees with an introduction to version control and Git, building programs with Python, automating tasks with Make, and how (and how much) to test programs. These were set within the context of best practices for scientific computing. Read More ›

New Accessibility Guidelines
Greg Wilson / 2014-12-05
Software Carpentry values the participation of every member of the scientific community and want all attendees to have an enjoyable and fulfilling experience. Thanks to Pauline Barmby, we now have an accessibility checklist for instructors and workshop hosts. Please have a look and send us fixes and suggestions for improvements. Read More ›

Google to Support WiSE Poland
Aleksandra Pawlik / 2014-12-05
We are happy to announce that Google will support the Software Carpentry workshop for Women in Science and Engineering which will take place this weekend in Krakow! We are grateful for their support, and look forward to the event; to which registration filled up within 72 hours! Read More ›

Announcing the Lesson Validator
Raniere Silva, Andy Boughton / 2014-12-05
Thanks to the work of Andy Boughton and Raniere Silva, we now have a validator that determines whether lessons fit the new lesson template. This post explains what it does, where there is room for improvement, and how you can help. Read More ›

International Workshop on Software Engineering for High Performance Computing in Science
Greg Wilson / 2014-12-04
The 2015 International Workshop on Software Engineering for High Performance Computing in Science is being held in conjunction with the International Conference on Software Engineering in Florence, Italy in May 2015. Topics to be covered include: Read More ›

Cape Town South Africa Workshop
Jonah Duckles / 2014-12-04
Last week we ran the biggest software carpentry workshop in South Africa to date. We had more than 80 participants in two rooms, each using the novice Python material. Pre-workshop assessment surveys showed that we'd have a mostly novice crew, but with a few outliers on the upper end. We accepted slightly higher than 80 enrollment cap and encouraged those with more advanced skills to evaluate their interest in becoming SWC workshop instructors themselves and help us out as helpers. Read More ›

An Advanced Short Course in Leeds
Andrew Walker / 2014-12-04
Helped by Aaron O'Leary, Peter Willetts, Marlene Mengoni and Jo Leng, Martin Callaghan, Devasena Inupakutika and I recently delivered a modified Software Carpentry workshop at the University of Leeds. Aimed at environmental scientists from across the UK and funded by NERC, the course included an extra day where small groups got to develop tools for environmental data analysis. Read More ›

Summarizing the News
Greg Wilson / 2014-12-03
We made several big announcements this morning, so here's a short summary to guide you through them all: The bylaws for the Software Carpentry Foundation. Our organizational membership scheme. New rules for organizing and running workshops (including fees). Plans for instructor training (which also explains why we're delaying the start of the next online course by a month). Why we need more admin support and the tool we have started building to support that. Plans for mentorship and assessment, which are two of the things that elected members of our Steering Committee will be asked to take on... An update on our new lesson format (and links to some examples). The date of our first election. We'll talk about all of this at the Dec 4 lab meeting (which takes place at 10:00 and again at 19:00, both times Eastern — see this Etherpad for connection details). Hope to see you there... Read More ›

Software Carpentry Foundation: Workshops
Greg Wilson / 2014-12-03
THIS POST IS OUTDATED AND NO LONGER ACCURATE!!! As we said in the previous posts in this series, our interim Steering Committee has adopted bylaws for the Software Carpentry Foundation and a plan for organizational membership. Those memberships won't cover all of our central costs (such as instructor training), so we are going to start charging a fee for each workshop we help organize rather than just asking people to make a donation. In exchange, we will match hosts with instructors, handle registration, manage assessment, follow up to make sure people's travel expenses have been paid, and all the other running around that needs to happen behind the scenes. One thing that won't change is that anyone who wants to organize and run a workshop on their own will always be free to do so without charge provided they satisfy a few simple conditions. In fact, we strongly encourage groups to get to the point where they can do this regularly, and to share their experiences with the community so that we can all help teach good lab practices for scientific computing. Read More ›

Software Carpentry Foundation: Organizational Membership
Greg Wilson / 2014-12-03
THIS POST IS OUTDATED AND NO LONGER ACCURATE!!! As we said in the previous post in this series, our interim Steering Committee has adopted bylaws for the Software Carpentry Foundation. They have also agreed on four tiers of membership so that universities, companies, government labs, and other entities can help support and guide our work. The authoritative version is stored in this public GitHub repository, but in brief, the tiers are: Partners, who make a significant long-term contribution to organizing and delivering workshops; Affiliates, who are organizing workshops and helping with admin, but not at the same level; Sponsors, who underwrite the cost of particular workshops that would otherwise not have backers; and Donors, who simply wish to make cash or in-kind contributions to help with general operations. Each membership tier is described below. The Steering Committee and Advisory Council will revisit them in a year's time to ensure they're meeting everyone's needs, and we will always be willing to discuss other arrangements. Please mail board-inquiries@software-carpentry.org if you would like more information, or would like to start a discussion about how you could help. And of course, individual donations to the cause will always be welcome: Read More ›

Software Carpentry Foundation: Governance
Greg Wilson / 2014-12-03
I am pleased to announce that our interim Steering Committee has adopted bylaws for the Software Carpentry Foundation, which is the final step in us becoming an independent organization. The authoritative version is stored in this public GitHub repository, along with other key documents that will be outlined in the next couple of blog posts. In brief, the SCF has four parts: the Membership, which includes active instructors and others contributing directly to the project; the Steering Committee, which is elected by and from the membership and is the SCF's primary decision-making body; the Advisory Council, which includes representatives from our major partners, and the Executive Director, who is an employee of the SCF responsible for overseeing its daily operations. The Executive Director answers to the Steering Committee, but is not a member of it. I have accepted the position of interim Executive Director, and the first elections for the Steering Committee will be held early in 2015. These bylaws, and the agreement with NumFOCUS that enables us to operate as a 501(c)3 non-profit, are the structure we need to run a successful independent foundation. The next priority is fundraising: I've been without a salary since my contract with Mozilla ended in October, which clearly isn't sustainable. The next post in this series describes the different ways in which organizations can support us; we're close to signing deals with several groups, so I'm confident we'll start the new year with our finances in order. Read More ›

Plans for 2015: Workshop Organization
Greg Wilson / 2014-12-03
As I write this, 228 people have been certified to teach Software Carpentry workshops, of whom 136 have taught in the past twelve months. That's amazing, but as I said in a previous post, growth in one part of a pipeline inevitably turns another into a bottleneck. In our case, that bottleneck is organizational: Arliss Collins and Giacomo Peru are stretched to the limit handling requests and lining up instructors, and we're still not keeping up with demand. This kind of work can't be done by a volunteers in bits and pieces: the startup overheads are significant, as is the context required to manage each conversation, and there are time when mail really needs to be answered right now, not when it's convenient. We are therefore asking our Partners to organize workshops in their region, and hope that others will shoulder some of the burden as well. We're also trying once again to build a tool to simplify workshop management. I have started a simple Django application called Amy to keep track of who wants a workshop, who can teach what, who our learners have been, and so on. The data model is mostly done, and I've included a dump of our existing database with personal information redacted to aid further development and testing. All Amy can do now is display information. What we want to do is add and edit new data. If you'd like to help, please fork the project and send us pull requests—we'd be grateful for your help. Read More ›

Plans for 2015: Mentorship and Assessment
Greg Wilson / 2014-12-03
The previous posts in this set looked at instructor training and workshop organization. In this one, I'd like to look at mentorship and assessment, which are two of the biggest challenges we need need to address in the coming year, and are good examples of the kinds of tasks that Steering Committee members will be asked to take on. Azalee Bostroem did a great job summarizing why we need to mentor our new instructors (and keep those who've been with us a while up to date with changes to our lessons and procedures). The question is, how? Or more precisely, where will the hours come from? I said in September that I would organize a weekly meeting for instructors from recent and upcoming workshops. Only one took place, and saying "we just need to try harder" won't make the necessary hours appear. The same is true of assessment. Jory Schossau has done valuable service analyzing survey data and interviewing bootcamp participants, and Daniel Chen and others are working hard to revise the instructors' post-workshop questionnaire, but despite several attempts, we haven't found anyone willing to fund a systematic look at what we're actually doing and what impact it's having. Once again, we can either say "we need to try harder" or come up with an alternative plan. Read More ›

Plans for 2015: Lessons
Greg Wilson / 2014-12-03
The previous three posts in this set looked at instructor training, workshop organization, and the twin challenges of mentorship and assessment. In this final one, I'll summarize the state of the changes we're making to our curriculum. We described the new template for lessons back in October, and since then, a handful of people have been working to improve it and to extract our existing lessons from the 'bc' repository and convert them to the new format. The first step has taken longer than planned: we want to be sure we get the entire history of each lesson, so that people receive credit for the work they've done, and thats proving to be a slog. The good news is, you can now see what the result will look like. Our novice lesson on SQL now lives in this repository, and you can view its rendered form in this GitHub pages site. There's still a lot to do—the learning objectives need to be cleaned up, the challenges all need meaningful names, and there's clearly lots of scope for improving the lesson's appearance—but the pieces are there. Read More ›

Plans for 2015: Instructor Training
Greg Wilson / 2014-12-03
Instructor training has been going well: it looks like more than 50% of people who participated this summer and fall will complete and start teaching for us, which is a new record. What's even better is that we have 136 people signed up for the next online course. Even if only half of them complete, it will enlarge the pool of instructors by more than a third, and broaden our coverage both geographically and across disciplines. The question now is when that course should start. Teaching a group that size will take two full days a week, and in the short term, I need to focus on building partnerships, getting grant applications out the door, and overseeing election of our Steering Committee. Read More ›

Our First Election
Greg Wilson / 2014-12-03
The Software Carpentry Foundation's bylaws state that the Steering Committee must be elected annually. With a new year upon us, the interim committee has decided that the first such election will take place in the week of Jan 26-30, 2015. Here are a few key points: Every qualified instructor who has taught at least twice in the past two calendar years is automatically a member of the SCF, as is anyone who has done 30 days or more work for the SCF in the past calendar year, and anyone who has, in the opinion of the Steering Committee, made a significant contribution in the past year. All SCF members may both stand for election and vote in it. We'll explain how to do both after the next interim committee meeting on December 16. As discussed in an earlier post, members of the Steering Committee are expected to volunteer at least 2-3 hours a week to help mentor our instructors, conduct assessment, oversee major changes to the curriculum, manage our finances, and so on. Read More ›

What About MOOCs?
Greg Wilson / 2014-12-02
We frequently get asked whether Software Carpentry would work as a MOOC. The answer is that I think it can work well if it's what Siemens and Downes actually had in mind when they coined the term. They didn't people watching videos and then doing robo-graded exercises; instead, their connectivist model of learning assumed that participants would use the internet to collaborate in exploring ideas, rather than as a faster form of television. I'm definitely excited about the Siemens and Downes kind of MOOC. In particular, I believe that instructors who don't have time to teach a full workshop might give us an hour a week to help people via one-to-one or one-to-few sessions via Skype and screen sharing. There was a lot of enthusiasm among the instructors for this when we tried it in the spring of 2012; that experiment wound down because we lacked critical mass, but we're five times larger now, and I think it would be worth trying again. The most interesting question for me is where this fits. Should we start people off this way? Should people do the first day in person (so that we can get them through software setup and configuration issues), then do the rest online? Should this be used as the "day 3" follow-on that everyone keeps asking for? We'd like to try all of this and more; if you'd like to help, please let us know. Read More ›

Reminder: Lab Meeting on Thursday
Greg Wilson / 2014-12-01
Just a reminder that our last online lab meeting for 2014 will be held at 10:00 am Eastern time on Thursday, December 4, and will be repeated at 7:00 pm Eastern on the same day. We will be making several big announcements, so if you're planning to attend, please add yourself to the Etherpad at https://etherpad.mozilla.org/swc-labmeeting-2014-11 so that we have an idea of numbers. We look forward to seeing you Thursday. Read More ›

Goalposts for the Digital Humanities
Greg Wilson / 2014-12-01
As a follow-on to last month's post about courses at the British Library, I asked some people who are teaching digital humanists where their goalposts are, i.e., what they think the minimum someone in the humanities should know about working with computers and digital data. The brief responses were interesting: Read More ›

How to Manage Confidential Data
Greg Wilson / 2014-11-27
A couple of days ago, a member of our community wrote: I am very interested in working on improving research analytics in government using the principles developed by Software Carpentry. A key stumbling block I run into with my peers is their skepticism about these principles being able to protect data that is confidential or restricted use. Read More ›

Translating Software Carpentry into Korean
Greg Wilson / 2014-11-25
This is pretty amazing: a group has translated the core Software Carpentry lessons into Korean: Introduction: http://statkclee.github.io/xwmooc-sc/intro.html The Unix Shell: http://statkclee.github.io/xwmooc-sc/novice/shell/ Version Control with Git: http://statkclee.github.io/xwmooc-sc/novice/git/ Programming with Python: http://statkclee.github.io/xwmooc-sc/novice/python/ Programming with R: http://statkclee.github.io/xwmooc-sc/novice/r/ Using Databases and SQL: http://statkclee.github.io/xwmooc-sc/novice/sql/ Extras: http://statkclee.github.io/xwmooc-sc/novice/extras/ It's a remarkable and encouraging achievement—congratulations and thanks to the team that did the work: Victor KC Lee (이광춘): Translation Lead Jungsu Han (한정수): Translator Chungyeong Moon (문춘경): Illustrator Ri Jeong Kim (김이정): Graphic Designer Hwan Beom Kang (강환범): Support We still need to figure out how to manage this: Gabriel Devenyi's post outlined our options, but someone will need to take the lead to set something up so that versions in various languages can stay in sync. Read More ›

Congratulations to Data Carpentry
Greg Wilson / 2014-11-24
In case you missed the announcement last week, our sibling organization Data Carpentry has received funding from the Moore Foundation to support its activities and growth, and Dr. Tracy Teal (a long-time contributor to Software Carpentry, and one of the founders of Data Carpentry) has accepted a position as Data Carpentry's project lead. We're very excited by both developments, and are looking forward to continuing to work with them. Read More ›

Announcing WiSE Krakow!
Aleksandra Pawlik / 2014-11-23
After running several Software Carpentry workshops for women in science and engineering in North America, we are moving east: on 6th and 7th December we will run a WiSE workshop in Krakow, Poland. Like previous WiSE workshops, this one is open to women at all stages of their research careers, from graduate students, post-docs, and faculty to staff scientists at hospitals and in the public, private, and non-profit sectors. The instructors at the workshop will be Paulina Lach of Krakow's University of Science and Technology AGH and Aleksandra Pawlik, the Software Sustainability Institute's Training Leader. The helpers are Agnieszka Celińska (Pedagogical University of Krakow), Iwona Grelowska (University of Science and Technology AGH), Patrycja Radaczyńska (Creativestyle), and Anna Ślimak (Lunar Logic). Read More ›

Instructor Training Stats
Greg Wilson / 2014-11-22
The Software Carpentry Foundation's board asked me for some stats on instructor training, and I thought other people would find them interesting as well: Read More ›

Adding a Projects Page
Greg Wilson / 2014-11-20
It's long overdue, but we've finally started a page to showcase the projects that our instructors are involved in. If you'd like to add one, please send us a pull request or mail us for instructions. (Turns out, we do pretty cool stuff...) Read More ›

The New Instructor Post-Assessment Questionnaire
Greg Wilson / 2014-11-19
As mentioned two weeks ago, a group led by Daniel Chen have been revising the post-workshop survey for instructors so that we can get a better idea of who's actually teaching what, how long it's taking, and how it's going. The latest draft can be viewed here, and is included below. We'd be grateful for one last round of feedback—in particular, we're concerned about (a) the overall length and (b) how closely the headings are tied to our current curriculum. Please let us know what you think in comments on this post. Read More ›

Interim Board Meeting: Nov 18, 2014
Greg Wilson / 2014-11-18
Software Carpentry Foundation Interim Board Meeting: Nov 18, 2014 Read More ›

Close Enough for Scientific Work
Greg Wilson / 2014-11-18
The discussion around last month's post "Why We Don't Teach Testing (Even Though We'd Like To)" has been one of the most interesting in Software Carpentry's history. Inspired by that, and by discussion at WSSSPE 2.0, we are launching a collaborative book project called Close Enough for Scientific Work in which scientists will show one another how they test their software. Contributions should be aimed at sophomores in science and engineering, and each should be sized to fit a one-hour lecture. While the format of each entry will vary according to its content, we expect most will follow this template: Read More ›

Lessons, the Repository Split, and Translations
Gabriel A. Devenyi / 2014-11-15
Keeping on the roll of the posts about the repo split, templates, and metadata, Software Carpentry now needs to consider how to handle translated lessons. The core Software Carpentry lessons have be translated by bilingual instructors into Korean produced by Victor KC Lee (and friends) and Spanish by Francisco Navarro (and friends). With the upcoming repo split, I think its a good time to examine the various options of how we might handle translations generally. Read More ›

Workshop Summary: Gerstein Science Information Centre, University of Toronto
Pauline Barmby / 2014-11-14
Last week we ran a Python-based SWC workshop at the University of Toronto's Gerstein Science Information Centre. The advertised audience was graduate students, post-docs and other researchers in science, engineering and medicine at the University of Toronto. The workshop had about 20 learners. Most were grad students or postdocs in life sciences. The pre-workshop assessment said that about three-quarters of learners had some programming experience with about half of those having some Python; maybe half of the total had some familiarity with the command line. Almost none had experience with version control or SQL. The instructors were Pauline Barmby, Greg Wilson, and Thomas Guignard; helpers were Sahar Rahmani, Luke Johnston, Daiva Nielsen, Tom Wright, and Fei Xu. Our host was Erica Lenton. Read More ›

Announcing November 2014 Lab Meeting
Greg Wilson / 2014-11-14
Our last online lab meeting for 2014 will be held at 10:00 am Eastern time on Thursday, December 4, and will be repeated at 7:00 pm Eastern on the same day. (We had originally planned to have it on November 27, but that would conflict with American Thanksgiving.) If you're planning to attend, please add yourself to the Etherpad at https://etherpad.mozilla.org/swc-labmeeting-2014-11 so that we have an idea of numbers. We look forward to seeing you in a few weeks. And yes, we're calling it the November lab meeting even though it's in December, because it just wouldn't be programming without an off-by-one error... Read More ›

Replacing the Teaching Blog
Greg Wilson / 2014-11-13
We've been using a WordPress blog to manage instructor training for the past two and a half years, and while it's served us well, it's clear that we've outgrown it. It's also clear (to me at least) that we should get instructors started on GitHub earlier, so replacing WordPress with a GitHub-backed blog seems like it would serve a dual purpose. Our requirements are: Read More ›

Why It Matters
Greg Wilson / 2014-11-11
Sometimes I forget that it isn't obvious why scientists ought to learn to program—or why anyone else ought to. Being more productive, getting a better job... Those are all good reasons, but as Bret Victor points out, if we focus on those, we risk losing sight of what matters most. To explain that, I have to quote Matthew Crawford's thought-provoking (and sometimes infuriating) Shop Class as Soulcraft: Read More ›

Workshop at the University of Virginia
Stephen Turner / 2014-11-11
We pulled off our day-long data analysis bootcamp with hardly a hiccup yesterday. The schedule looked something like this: AM pt 1:intro to AWS & intro to Unix Shell AM pt 2:data analysis in Unix: alignment, quantitation of RNA-seq data PM pt 1:intro to R PM pt 2:data analysis in R: QC, differential expression, etc. Read More ›

Workshop at GeoSim (Potsdam)
Tiziano Zito / 2014-11-11
I had never taught a Software Carpentry workshop before (though I have taught an Advanced Scientific Programming in Python many times), so when the hosts for this one asked if I would be willing to do a full-week workshop, I said they were crazy ;) They convinced me to do it, but I said in that case I could only do that if I can recycle the materials I use for our summer school. They agreed, and I managed to convince a buddy in Berlin to help me with tutoring. Read More ›

Ongoing Learning with User Groups
Noam Ross / 2014-11-10
For the past two years I've run the UC Davis R User's Group (D-RUG). In this post, I'll (1) outline the structure of D-RUG, (2) summarize some lessons learned, and (3) discuss how such users' groups could act to support and complement SWC's workshops. Read More ›

Amdahl's Law and Software Carpentry
Greg Wilson / 2014-11-07
Amdahl's Law says that the speedup you can get by parallelizing a computation is limited by how much of the computational can't be sped up. For example, if 10% of a program's run time is inherently sequential, then even if you have an infinite number of processors, you can't speed it up more than 10 times. The same rule applies to organizations like Software Carpentry. We now have almost 200 certified instructors; even working in pairs, and each teaching only once a year, that's enough to run two workshops a week, and we're training more all the time. But someone has to train them, and match them with workshops, and design new templates for lessons, and talk to potential sponsors, and those central activities are now limiting what we can accomplish. Read More ›

You Should Read Juha Sorva's Thesis
Greg Wilson / 2014-11-06
If you really want to dig deeper into educational research and how it applies to teaching programming, you should grab a copy of Juha Sorva's PhD thesis. The UUhistle system he built is interesting, and the research he did with it is thought-provoking, but what's really great is the summary of educational research in the first third of the thesis. I've extracted that part with his permission, and I can't recommend it highly enough. Read More ›

Instructor Training at TGAC
Greg Wilson / 2014-11-06
Hayley London has written a great summary of the instructor training at TGAC in Norwich in October. Aleksandra Pawlik, Bill Mills, and I enjoyed meeting everyone, and I look forward to returning soon. Read More ›

Why Institutional Partnerships Make So Much Sense
Damien Irving / 2014-11-06
When I first got involved in Software Carpentry, I liked nothing more than to lament the fact that universities don't teach their undergraduates (or early post-graduates) fundamental programming skills. It was one of the main reasons I felt so compelled to help out. By running our two-day workshops, I felt that we were basically picking up the slack until such time that universities woke up to themselves and started teaching this stuff. At this point, I felt that (like many non-profit organisations striving to make the world a better place) Software Carpentry would have achieved its mission by essentially making itself redundant. Fast forward a couple of years and I'm now part of a new University of Melbourne department tasked with the job of teaching programming fundamentals to post-graduates. To my surprise, I've found that in this progressive new world we actually need Software Carpentry more than ever. The interim board of the Software Carpentry Foundation is seeking to form partnerships with organisations like my own, so I thought this was an opportune time to share our experiences. Read More ›

An R Workshop at the University of Sydney
Diego Barneche / 2014-11-04
Last week we ran an R-based SWC bootcamp at University of Sydney. The primary target audience encompassed grad students from the Department of Psychology who had little to no experience in programming. While some had previous limited experience with programming languages (e.g. MATLAB or R), most of them had been exposed only to statistical software such as SPSS. Read More ›

Interim Board Meeting: Nov 4, 2014
Greg Wilson / 2014-11-04
Software Carpentry Foundation Interim Board Meeting: Nov 4, 2014 Read More ›

A 'Joel Test' for Grassroots Programming Groups
Greg Wilson / 2014-11-04
Back during the first dot-com bubble, Joel Spolsky wrote an article titled "The Joel Test: 12 Steps to Better Code" that listed 12 questions you can ask to estimate the maturity of a software development team: Read More ›

Software Carpentry Foundation: FAQ
Greg Wilson / 2014-11-03
As we announced two weeks ago, we are setting up an independent foundation to manage Software Carpentry's continued growth. This week, we passed an important milestone when we signed a fiscal sponsorship agreement with NumFOCUS. There are still lots of details to sort out, but here are brief answers to some of the questions we've received so far: Read More ›

Revamping the Instructor Survey
Daniel Chen / 2014-11-02
In September, Greg Wilson wrote a blog post on "building better teachers" that was inspired in part by an issue brought up by John Blischak on reporting time spent on lessons. In order for us capture this information, we decided to revamp our post-workshop instructor survey. Read More ›

Particle Physicists Pulling Themselves From The Swamp
Peter Steinbach / 2014-10-31
What does it mean to work on a modern particle physics experiment like ATLAS (wikipedia, public) or CMS (wikipedia, public) at the Large Hadron Collider in the 21st century? It's fun, it's collaborating with great and interesting people, it's challenging, it's making you enjoy finding things out, it's what I always wanted to do. Also: it is painful, discouraging, and tends to suck the life out of a young mind. Confused? Let's rewind... Read More ›

Why We Don't Teach Testing (Even Though We'd Like To)
Greg Wilson / 2014-10-30
If you haven't been following Lorena Barba's course on numerical methods in Python, you should. It's a great example of how to use emerging tools to teach more effectively, and if we ever run Software Carpentry online again, we'll do it her way. Yesterday, though, when she posted this notebook, I tweeted, "Beautiful... but where are the unit tests?" In the wake of the discussion that followed, I'd like to explain why we no longer require people to teach testing as part of the Software Carpentry core, and then ask you all a favor. Read More ›

Pandoc and Building Pages
Greg Wilson / 2014-10-29
Long-time readers of this blog and our discussion list will know that I'm unhappy with the choices we have for formatting our lessons. Thanks to a tweet from Karl Broman, I may have an answer. It's outlined below, and I'd be grateful for comments on usability and feasibility. Read More ›

Why Software Matters
Greg Wilson / 2014-10-28
Why does software matter to scientists? It may seem obvious to people who read this blog, but that's like saying that the answer to, "Why opera?" is obvious to the sort of person who pays a month's rent to get a decent seat at Covent Garden. Why does software matter? And why does it matter whether it's written well? Read More ›

Lost in Space
Greg Wilson / 2014-10-27
You probably haven't seen the 1998 movie Lost in Space, or if you have, you've suppressed the memory—it was awful. But I do know one guy who enjoyed it. His name was Joe, and he had worked on the software used to create its special effects. Ten minutes into the film he took out his Walkman (a primitive form of iPod), put on his headphones, and spent the next two hours head-bobbing to a mix of Bob Marley and Smashing Pumpkins. Read More ›

British Library Courses
Greg Wilson / 2014-10-27
I had a chance to catch up with James Baker at the British Library on Friday, and discovered that they're running an amazing series of short classes on digital skills for librarians. With his permission, I've posted their outline below, along with a few excerpts from their FAQ. Some of it is site-specific, but I think a lot would be relevant elsewhere. (Programming in Libraries was developed alongside writing these lessons for the Programming Historian and tweaked for the library audience; interested readers should also check out the practical elements of Managing Personal Digital Research Information work through this public wiki.) If you'd like more information, please mail digitalresearch@bl.uk. Read More ›

A New Lesson Template, Version 2
Greg Wilson / 2014-10-23
Update: this post now includes feedback from participants in the instructor training session run at TGAC on Oct 22-23, 2014. Please see the bottom of this page for their comments. Thanks to everyone for their feedback on the first draft of our new template for lessons. The major suggestions were: Read More ›

Presenting the Novice R Materials and Future Plans for the SWC R Community
John Blischak / 2014-10-20
Approximately seven months after our initial meeting, the SWC R community has developed the first set of R lessons for use both in workshops and for self-directed learning from the SWC website. These novice R lessons are a translation of the current novice Python lessons. Read More ›

Num Wrongs Plus Plus
Tommy Guy / 2014-10-17
I was teaching Git to a room of roughly 25 students on day 2 of a Software Carpentry workshop and we ran into a problem that feels like a case study on the reason it's hard to move science to safer practices. Read More ›

Welcome More New Instructors
Greg Wilson / 2014-10-16
We are very pleased to welcome another new batch of instructors to our team: Read More ›

A Research Software Petition
Greg Wilson / 2014-10-15
"We must accept that software is fundamental to research, or we will lose our ability to make groundbreaking discoveries." If you agree—and I hope you do—then please take a moment to add your name to this petition posted by the Software Sustainability Institute, and then help to spread the word by blogging, tweeting, and telling your friends. Read More ›

Yet Another Template for Lessons
Raniere Silva / 2014-10-14
After the splitting the repository post, Gabriel Devenyi and Greg Wilson wrote some suggestions for how the new lessons repositories should look like (see Gabriel's post about metadata and Greg's post about overall file structure). From my experience at the Mozilla Science Lab sprint I don't like Gabriel's preq metadata because I don't believe it helps very much. I also don't like Greg's proposal to duplicate some files in every Git repository, so here are some changes that I suggest. Read More ›

A Self-Recorded Workshop
Damien Irving / 2014-10-14
Among the many great lessons contained in Greg Wilson's recent post on building better teachers, perhaps one of the most important was that in order to improve our collective teaching standards, we really need to see each other in action: Read More ›

Of Templates and Metadata
Gabriel A. Devenyi / 2014-10-14
As an appendix to the splitting the repository post, Greg recently posted a straw man template for how lessons might be structured after the repo split. He followed up after with more details. There a lot of good ideas there on how we can encourage good structure for lessons and help learners and instructors alike going forward. Read More ›

Interim Board Meeting: Oct 14, 2014
Greg Wilson / 2014-10-14
Software Carpentry Foundation Interim Board Meeting: Oct 14, 2014 Read More ›

A New Template for Lessons
Greg Wilson / 2014-10-14
Note: this post has been superseded by this one. Please post comments and feedback there. We blogged two weeks ago about a new template for workshop websites. It's now time to start thinking about what lessons will look like: as we said at the last lab meeting, we're going to break the current lesson repository into smaller and more manageable pieces, but we need to decide what those pieces will look like first. The post below is our current thoughts; comments and/or follow-on posts about alternatives like those already written by Gabriel Devenyi and Raniere Silva would be very welcome. Read More ›

Announcing the Creation of the Software Carpentry Foundation
Greg Wilson / 2014-10-08
In order to foster Software Carpentry's continued growth, we are pleased to announce that we are creating an independent Software Carpentry Foundation (SCF). Like other non-profit open source foundations, it will decide Software Carpentry's overall scope and direction, manage its finances, and hold its intellectual property. In order to work through the details have assembled an interim board drawn from a wide cross-section of our community: Jenny Bryan (University of British Columbia) Neil Chue Hong (University of Edinburgh / Software Sustainability Institute) Carole Goble (University of Manchester / ELIXIR UK) Josh Greenberg (Sloan Foundation (non-voting)) Katy Huff (University of California Berkeley) Damien Irving (University of Melbourne / Research Platforms) Adam Stone (Lawrence Berkeley National Laboratory) Tracy Teal (Michigan State University / Data Carpentry) Kaitlin Thaney (Mozilla Science Lab) Greg Wilson (Software Carpentry) This group's mandate is to draft the SCF's initial bylaws and get the foundation legal standing, then arrange the transition to the first permanent board some time early in 2015. Until then, we will continue to do what we have always done: teach scientists and engineers how to use computers to do more research in less time and with less pain. Read More ›

ARCHER Software Carpentry workshop at The University of Edinburgh
Mike Jackson / 2014-10-07
ARCHER, the UK's national supercomputing service, offers training in software development and high-performance computing to scientists and researchers across the UK. As part of our training service we are running a 2 day Software Carpentry workshop at EPCC, The University of Edinburgh, UK, on 3-4 December. Read More ›

Ideas to Improve Instructor Training
Azalee Bostroem / 2014-10-05
Have you ever learned something new and then had it appear in other areas of your life? After a summer at SWC thinking about how to train better instructors (and how to be a better teacher myself) I get to try discussion-based teaching this quarter at UC Davis. Read More ›

Studying Impact
Alexandra Simperler / 2014-10-04
I am a Software Sustainability Institute Fellow, and am using my fellowship to work on making software training better. My main interest is in teaching computational chemistry software and making it accessible to a wide range of scientists. Like Software Carpentry, the courses I teach are one-off teaching events: basically, we have a maximum of two days to help your professional development. Together with Greg Wilson, I am conducting research to analyse what impact Software Carpentry workshop have had on participants with respect to their professional skills. In particular, I would like to interview past participants in Software Carpentry workshops to analyse the impact those workshops have had on them. We cannot reimburse you for your time, but if you are willing to take part in an interview, your experience will help us make Software Carpentry better. The interviews can be done via telephone or Skype and can be arranged at a convenient time; we have ethics approval for this study, and every participant will be informed what exactly we do with their data. Please contact me by email if you wish to take part, and pass on this request to other workshop alumni. Thanks in advance, Alexandra Simperler Read More ›

A Reproducible Science Hackathon
Greg Wilson / 2014-10-04
NESCent is organizing a "Reproducible Science Hackathon" where participants will develop material and tools for teaching and facilitating a broad adoption of reproducible science. The deadline for application is next Friday (Oct 10th); see the full announcement for details. Read More ›

Congratulations to the Moore Investigators
Greg Wilson / 2014-10-04
The Gordon and Betty Moore Foundation has just announced $21 million in grants to fourteen investigators in the emerging field of data-driven discovery. Among the recipients are Titus Brown and Ethan White, who have both been key contributors to Software Carpentry. Congratulations to them and to the other award winners—we look forward to working with them in the years to come. Read More ›

Browsercast
Greg Wilson / 2014-10-04
As we mentioned back in April, Gabriel Ivanica spent the summer working on a Google Summer of Code project called Browsercast. His goal was to build a web-native alternative to screencasts; if you'd like to see the result, head over to the demo page, click on the eye icon in the upper left, and press the "play" button. I hope to use it, or something descended from it, to build searchable, accessible narrations of our lessons that will play well on everything from desktops through tablets to mobile phones. If you'd like to help, please fork the project on GitHub and dive in. Read More ›

A New Template for Workshop Websites
Greg Wilson / 2014-10-04
The first step in reorganizing the bc repository is making it easier (much easier) for people to create websites for workshops. The current instructions are almost 3000 words long, and even experienced GitHub users find the process daunting, so we're going to simplify things as much as we can, even if that means not doing things the "right" way. Read More ›

Welcome Our New Instructors
Greg Wilson / 2014-10-03
We are very pleased to welcome a new batch of instructors to our team: Read More ›

Interim Board Meeting: Sep 30, 2014
Greg Wilson / 2014-09-30
Software Carpentry Foundation Interim Board Meeting: Sep 30, 2014 Read More ›

Splitting the Repository
Greg Wilson / 2014-09-29
United Airlines messed up my travel again last weekend, so I finally had a chance to think some more about how Software Carpentry works and how we can make it work better. Having topic maintainers is one improvement; another, which was discussed at this month's lab meeting, is to break the bc repository that holds our lessons and workshop home pages into smaller and more manageable pieces. Read More ›

UCOSP as a Model
Greg Wilson / 2014-09-28
Software Carpentry's two-day workshops are just one of many ways to teach people practical skills. Term-long group projects are another model that I'm very fond of, and earlier this year, the four people who've been running the UCOSP program in Canada wrote a paper about what they've learned. Some of the lessons rhyme with what we've learned from Software Carpentry, but other insights are new. If you know of papers describing lessons learned from other innovative teaching projects, pointers in the comments section would be very welcome. Read More ›

September 2014 Lab Meeting Report
Greg Wilson / 2014-09-26
After a two-month break for a sprint and some holidays, we held another monthly lab meeting this week. About 50 people showed up to talk about issues large and small; the key points are below. Read More ›

Feedback from Imperial College London
Mike Jackson / 2014-09-24
On 16-17 September, EPCC's ARCHER headed down to Imperial College London to run a Software Carpentry bootcamp. My colleague Arno Proeme made his instructor debut, covering version control and Git and good programming practice, while I covered shell hints and tips, automation and Make, and testing. Read More ›

Learning Goals
Warren Code / 2014-09-23
A few weeks ago, Greg Wilson asked me how to better express the learning objectives listed in the Software Carpentry lessons. My main concerns with the existing goals are that they focus too much on specific skills, rather than attitude changes, and is that they are generally stated as lesson descriptions (more like instructor goals or a lesson outline) rather than as learning goals for participants. Read More ›

A Proposal for Topic Maintainers
Greg Wilson / 2014-09-18
I can lift ten pounds. I can even still lift a hundred pounds, though my back won't thank me, but I can't lift a thousand pounds. Similarly, while I was able to review changes to our lessons when only half a dozen people were contributing, there are now almost sixty pull requests queued up, some of which have been waiting for attention for several months. The solution we'd like to try, which is borrowed from other open source projects, is to give specific people authority to maintain specific parts of our core material. These maintainers will not be responsible for developing lessons themselves, though they may of course do so. Instead, their job will be to keep issues and pull requests moving by reviewing, managing discussion, merging into the master branch, and so on. Read More ›

How to Prepare for the Data Incubator
Tianhui Michael Li / 2014-09-17
At The Data Incubator, we receive thousands of applications to join our data science fellowship. Our admissions bar is very high and we are often asked, "What can I do to prepare for the fellowship application process?" Here are five important skills to develop and some resources on how to help you develop them. While we don't expect our applicants to possess all of these skills, most applicants already have a strong background in many of them. Read More ›

Interim Board Meeting: Sep 16, 2014
Greg Wilson / 2014-09-16
Software Carpentry Foundation Interim Board Meeting: Sep 16, 2014 Read More ›

Videos from Stanford
Greg Wilson / 2014-09-12
Stanford University has posted videos from the Software Carpentry workshop held there in August featuring Azalee Bostroem, Chris Lonnen, Dani Traphagen, and Chris Beitel. These are a great place to start if you'd like to do a little informal jugyokenkyu. August 14, 2014 August 15, 2014 Our thanks to Amy Hodge and Stanford University for making these available. Read More ›

September 2014 Lab Meeting
Greg Wilson / 2014-09-12
After taking a break in July for our sprint, and in August because people were on holiday, doing fieldwork, or both, we will resume our monthly online lab meetings this month. As always, we'll hold the meeting twice to accommodate time zones, childcare commitments, and the like; the first round will be 18:30-19:30 Eastern time on Thursday, Sept 25, and the second will be 11:00-12:00 Eastern time on Friday, Sept 26. We'll post connection details and the agenda closer to the time; if you have anything you'd like included, please send us mail. Read More ›

More Thoughts on Better Teachers
Azalee Bostroem / 2014-09-10
Over the last few weeks I read Greg's blog post on Building Better Teachers and watched his Scipy keynote and I've been thinking a lot about how can instructors share their knowledge. Read More ›

Further Thoughts on Building Better Teachers
Justin Kitzes / 2014-09-10
On Greg's recommendation, I just finished reading Building a Better Teacher - it was an interesting book, and a well-written story. It turns out that I completely agree with Greg's assessment of the book's lessons and the challenges facing Software Carpentry, but I very much disagree with his proposed solutions! Read More ›

Software Carpentry Workshop at Universidade of São Paulo
Raniere Silva / 2014-09-09
Last week Alex Viana and I taught two Software Carpentry workshops at the Universidade São Paulo, the biggest public university in Brazil. In both workshops we took them through Bash, Git and Python over the course of two days. Read More ›

An Update on Upcoming Bootcamps
Greg Wilson / 2014-09-06
So it turns out we have more events coming up than I realized: Read More ›

Building Better Teachers
Greg Wilson / 2014-09-04
Some books are intrinsically great. (I've read Going Postal half a dozen times, and enjoyed it just as much at each encounter.) Other books feel great because they hit you at the right time. Elizabeth Green's Building a Better Teacher is one of those: in a little over 300 pages, she takes a bunch of things I've been worrying about for the last two years and assembles them into a coherent whole. The end result is a road map of sorts for making our teaching more effective; the problem is, I don't know if it's a path we can actually follow. Read More ›

Open Source Comes to Campus
Shauna Gordon-McKeon / 2014-09-03
Open Source Comes to Campus helps the next generation of open source contributors get started by teaching one-day workshops aimed at undergraduate and graduate students from all disciplines. This workshop introduces students to the tools and culture of open source development. We familiarize students with tools such as IRC, issue trackers and version control; talk plainly about aspects of open source that might be confusing or intimidating; introduce them to open source professionals who can talk about career and educational opportunities; and help them make their first contributions to open source projects. OSCTC will be hosting events this fall at over a dozen schools, including the City College of San Francisco, Bucknell University, the University of Victoria, Hartnell College, the University of Washington, and the University of Connecticut. We're always looking for volunteers to mentor at our events, and if you want to bring an event to your school, please get in touch. Read More ›

Nature Interview with Kaitlin Thaney
Greg Wilson / 2014-09-03
Nature has just published a short interview about Software Carpentry with Kaitlin Thaney, the director of the Mozilla Science Lab, and two bootcamp attendees (Harvard's Rebecca Perry and USNIDA's Brian Sadacca). As always, version control seems to have been one of their big take-aways, but as Kaitlin says, two days isn't enough for people to become proficient: our bootcamps really are just the first step. Read More ›

Instructor Training at UC Davis in January 2015
Greg Wilson / 2014-09-03
Thanks to support from Prof. Titus Brown, I will be running an intensive in-person version of our instructor training course at UC Davis on January 6-7, 2015. This will be followed on January 8 by a third day of activities focusing on teaching computation and biology; for details, please see Titus's post and keep an eye on this blog for more information as well. Read More ›

Software Carpentry workshop at Universidade Federal do Rio Grande do Sul
Raniere Silva / 2014-08-30
At the end of this week Alex Viana and I taught a Software Carpentry workshop at the Universidade Federal do Rio Grande do Sul. We had around 30 learners of very different experience (from high school students to seniors professors of the university) and took them through Bash, Git and Python over the course of two days. Read More ›

Fall 2014 Bootcamps (So Far)
Greg Wilson / 2014-08-30
Our calendar for the fall is filling up pretty quickly : we've added the following bootcamps in the last couple of weeks, and have more on the way. If you'd like us to run one where you are, please fill in this form and we'll get the ball rolling. Read More ›

The New MATLAB Teaching Materials
Damien Irving / 2014-08-29
When Software Carpentry started running bootcamps back in 2012, Python was used exclusively for the programming lessons. While these lessons were as language agnostic as possible (i.e. the materials focused on transferrable programming concepts as opposed to specifics of the Python language), people soon expressed an interest in running bootcamps using other languages. R very quickly established itself as a regular alternative to Python, but it wasn't until early 2014 that the first ever Matlab bootcamp was held (see here and here for blog posts about the event; the official event page is here). Read More ›

Software Carpentry at Cambridge University
Rob Beagrie / 2014-08-29
At the beginning of this week Thomas Kluyver and I taught a Software Carpentry bootcamp at the University of Cambridge. We had around 35 learners of very different experience and took them through Bash, Git and Python over the course of two days. Read More ›

Software Carpentry at Brazilian Open Science Conference
Raniere Silva / 2014-08-23
Last week, Raniere Silva and Alex Viana attended at Brazilian Open Science Conference where they ran a Git course for Software Carpentry and Alex gave a talk about Software Carpentry (like the one that Damien Irving gave at PyCon Australia). Our workshop was a great time and from the feedbacks every student like it. One of the students said: Very didactic, the instructors showed much care, dedication and attention with the students. and another one Very good, from the basic to complex in a easy way. Read More ›

The Fifth ANGUS Course
Greg Wilson / 2014-08-21
Titus Brown recently blogged a summary of the fifth run of the Analyzing Next Generation Sequencing (ANGUS) course at Michigan State. It includes some interesting observations on what's working and what needs to be improved, and some thoughts on assessment—he'd welcome feedback. Read More ›

Conversations About Teaching
Greg Wilson / 2014-08-18
Over the last few days, there have been four related discussion threads on the Software Carpentry mailing lists about what we use, what we teach, and how we teach it. Together, they highlight what we're doing well and where we need to do better. Read More ›

A MOOC on Practical Numerical Methods with Python
Greg Wilson / 2014-08-14
As announced at SciPy'14 last month Prof. Lorena Barba will be teaching a MOOC titled Practical Numerical Methods with Python this fall, and the course site is now open for registration. Having worked through her excellent 12 Steps to Navier Stokes notebooks, I think this will be a great course, and I urge everyone interested in the subject to check it out. Read More ›

News from Australia
Greg Wilson / 2014-08-13
Damien Irving summed up Software Carpentry activities in Australia in a short talk at PyCon Australia last week, and talked a bit as well about lessons learned. Next stops: Perth and Darwin! Read More ›

Three Bootcamps for Librarians
Cam Macdonell / 2014-08-13
Over the past month and a half I have been fortunate to (co-)lead three bootcamps targeted at librarians, a relatively new audience for software carpentry. These bootcamps have been located in Edmonton, Toronto, and New York, and in the same way that the term "librarian" is very broad encompassing many different disciplines and skill sets, the three bootcamps had their share of similarities and important differences. I'll start with similarities and then discuss each bootcamp's differences. Read More ›

UCL Research Software Development is Hiring
James Hetherington / 2014-08-11
The University College London Research Software Development team is hiring. Read More ›

Inessential Weirdness in Software Carpentry
Greg Wilson / 2014-08-11
Sumana Harihareswara, of the Wikimedia Foundation, has started compiling a list of inessential weirdness in open source. (It was inspired by this article, which made me blush: I was a howler more than once in my youth.) My question is, what inessential weirdnesses are we guilty of that aren't already in Sumana's list? Read More ›

The Research Software Engineer AGM and Hackday
Simon Hettrick / 2014-08-08
The Research Software Engineers community was founded to support the people who develop the software used in research. If you want be a part of the community, come to our AGM and hackday which takes place on 15-16 September at King's College London. It's a free event thanks to our sponsor Maudsley Digital. Read More ›

Sustainability
Greg Wilson / 2014-08-04
I took part in a meeting about sustainable scientific software at last month's SciPy conference. Much of it was taken up with discussion of getting scientists recognition for building software, but there was also some interesting debate about what "sustainability" actually means. After talking to a few people in the software engineering research community, I'd like to propose that: Read More ›

The Real Purpose of Sprints
Aleksandra Pawlik / 2014-08-04
"The real purpose of sprints isn't the code, it's getting to know people..." This is what Greg Wilson told me after the first-ever Mozilla Science Lab global sprint. Indeed, over the course of two days, dozens of people all over the world met virtually and in person to develop and improve Software Carpentry teaching materials, infrastructure and a number of MSL-related projects. Read More ›

Summer Sprint Summary
Greg Wilson / 2014-07-29
Last week, the Mozilla Science Lab hosted its first-ever global sprint. Dozens of people joined in from 18 cities (and a few from home) to work for two days on a wide variety of projects, ranging from new lessons to tools for mining the scientific literature. The MSL blog has all the details, and will have a series of posts about what was done over the next couple of weeks, but here are a few Software Carpentry-specific highlights: Read More ›

Feedback from Cranfield
Mike Jackson / 2014-07-28
On 21-23 July, EPCC's ARCHER training team visited a sun-drenched Cranfield University to run a Software Carpentry bootcamp and Introduction to Scientific Programming in Python. The 3 days combined a traditional bootcamp with a new day course providing an Introduction to Scientific Programming in Python. Read More ›

Using a Package Manager for Lessons and Papers
Greg Wilson / 2014-07-21
I've been musing for a couple of years now about ways in which we could re-purpose off-the-shelf software engineering tools and techniques to serve the needs of teachers. One theme, which I touched on in my SciPy 2014 talk, is to get people to patch shared learning materials in the way they patch Wikipedia articles and open source code. Another is to use package managers like RPM, Homebrew, and Conda to track dependencies between lessons, so that I could say something like conda install suffragette_movement and get a lesson on the struggle for women's voting rights, along with the other lessons and materials it depends on (or updates and links to those other lessons if I already have some of them installed). Read More ›

GSoC Projects at Summer Sprint
Raniere Silva, Piotr Banaszkiewicz, Gabriel Ivanica / 2014-07-21
As you may know, the Mozilla Science Lab's first-ever two-day sprint is in the next few days and all GSoC projects related with Mozilla Science Lab will be part of it. For more information about the plans for the GSoC projects continue reading this post. Read More ›

Summer Sprint FAQ
Greg Wilson / 2014-07-21
The Mozilla Science Lab's first-ever two-day sprint is less than three weeks away, so here's a short FAQ to tell you who can take part and how. Read More ›

SciPy 2014 Talks and Lessons
Greg Wilson / 2014-07-21
Talks from SciPy 2014 are now online, and include several from people associated with Software Carpentry. I particularly enjoyed Lorena Barba's keynote, in which she discussed how she's using the IPython Notebook in a flipped classroom. (I gave a talk too, but if you're a regular reader of this blog, you'll have heard most of it already.) Read More ›

Bootcamps in Cyprus and Jordan
Alan O'Cais / 2014-07-17
Last month, we held two bootcamps in Cyrpus and Jordan organised by the LinkSCEEM project. Both camps were part of larger workshops, with the final two days of the workshop in Cyprus focussed on the OpenMP and OpenACC APIs for shared-memory multi-processing and accelerators. It was held June 10-13 with 35 participants coming from 6 different countries (Cyprus, Greece, Lebanon, Jordan, Israel, and Egypt) with all the non-local participants being funded by the project. Read More ›

ARCHER Software Carpentry Bootcamp at Imperial College London
Mike Jackson / 2014-07-11
ARCHER, the UK's new national supercomputing service, offers training in software development and high-performance computing to scientists and researchers across the UK. As part of our training service we are running a 2 day Software Carpentry bootcamp at Imperial College London, UK, on 16-17 September. Read More ›

Translating Software Carpentry into Portuguese
Raniere Silva / 2014-07-08
We are very pleased to announce that we will run a half day workshop/advertising session at Open Science, Open Issues, an open science conference at Rio de Janeiro, and for that we will translate our Git lesson into Portuguese. If you If you would like to help, please head over to this GitHub repository, which has instructions and contact information. Read More ›

Our First High School Workshop at Rockefeller University
Daniel Chen / 2014-07-07
Camille Avestruz, Ivan Gonzalez, Timothy Cerino, and Daniel Chen all had the great opportunity to teach Software Carpentry's first zero-entry workshop to high school students. We were able to teach at Rockefeller thanks to the scientific foresight of Jeanne Garbarino and the rest of the Rockefeller team along with Arliss Collins, Greg Wilson and the SWC team. Lastly, thanks to Gabriel Perez-Giz for volunteering his time to help during the workshop. The main goal of this workshop was to expose tomorrow's scientists to scientific computing as early as possible. For example, as genomics data for biology continues to grow, we are beginning to see a shift of biologists from the pipetter to the data scientist. Our goal was not to teach everyone all the skills needed so they can dive into retrieving, cleaning, and analyzing genomics or astronomy data the next day, but rather show them what is possible with computers, and expose them for the first time that the GUI may not always be the best tool for the job; and give some foundation of knowledge and concepts for perpetual self learning. We followed the traditional SWC workshop materials, adapting the pace as needed. Bash, Python, and Git were covered. Read More ›

Scientific Groupware Revisited
Greg Wilson / 2014-07-05
16 years ago, Jon Udell wrote a white paper titled "Internet Groupware for Scientific Collaboration" that profoundly changed how I thought about the web. In many ways, it seems as futuristic today as it did then: despite all the technological advances of the last decade and a half, a two-way web built on top of a universal canvas is still mostly a dream. Udell and others are now working on Thali, an attempt to create a truly distributed web. It, and projects like Ward Cunningham's Smallest Federated Wiki, are a more truly open model for science than the de facto centralization typified by Google, Facebook, Twitter, and GitHub, but it's not clear we'll choose the long-term robustness of the former over the short-term convenience of the latter. I asked Jon if I could re-post IGSC here. He said yes, but on reflection, I realized that doing so would run counter to the spirit of what he's been trying to tell us. Instead, I urge you to click on this link and marvel at the everyday miracle that follows. Read More ›

Feedback from the bootcamp at Istituto Nazionale di Fisica Nucleare in Pisa
Aleksandra Pawlik / 2014-06-30
At the beginning of June (3-6) Software Carpentry supported the leaning tower of Pisa. Well, no really. We actually supported the students at Istituto Nazionale di Fisica Nucleare at the University of Pisa in becomming more effective using their computational skills. The instructors were R&eacutemi Emonet and Aleksandra Pawlik. The main organisers and hosts were Chiara Roda and Luca Baldini. Needless to say, running a bootcamp early June in Tuscany does have its perks. The memories of gelati and focaccine di ceci are still very vivid. How can you not love being a Software Carpentry instructor? Read More ›

Summary of June 2014 Lab Meeting
Greg Wilson / 2014-06-27
At our monthly lab meeting yesterday (June 26), we discussed a wide range of topics (though only about half as much as was on the agenda). Details notes and votes are below; the key points are: We would like to do what we can to help the careers of people who volunteer for Software Carpentry. In particular, we'd like to find a way to highlight people who have taught lots of bootcamps, contributed lots of material, or both, without discouraging people who haven't, or getting into a situation in which bootcamp hosts ask, "Why aren't you sending your most experienced people to us?" We've opened a GitHub issue for people who'd like to discuss this topic, and we'd be grateful if you could share experiences with other systems and suggestions about what we could do. We will have a meeting on Thursday, July 3 to discuss plans for the July 22-23 sprint. The time and connection details will be posted here on July 2. People were either neutral or in favor of switching to Python 3. There was concern about compatibility, but it seems that most scientific packages are now Py3-compatible (and that there are some that only work with Py3, and not with Py2). In practice, this will mean: changing what people have on their machine when they leave the bootcamp, and minor modifications to our lessons (e.g., print as a function). By default, Mercurial launches a GUI diff/merge tool when there's a conflict rather than displaying a textual diff. Novices seem to find side-by-side merge easier to work with, which raises the question: should we use a graphical tool for diff and merge when teaching Git as well? Again, most people were either neutral or in favor, subject to ease of installation. We'll try this at a couple of workshops in the coming months and report back. Should we install Matt Davis's ipythonblocks module by default? Nobody was opposed, but it turned out that only a handful of people actually use it in teaching. We'll poll our instructors to find out more about usage before making a final decision. In order to make it easy for people who want to teach regular expressions to do so, we'll integrate the software that runs regexpal.com (which ironically is down at the time of this writing) into the bc repo so that instructors can create a page in their bootcamp site where learners can play around. Jonah Duckles has created a pull request that would change the way installation instructions are managed. Instead of editing _includes/setup.html, instructors would add a list of topics to the header of index.html that Jekyll's templating logic would use to control what is and isn't displayed. Most people were in favor, so we'll merge this once the PR is finished. Many people are still not happy with the way our lesson materials are organized, or with what pre-requisite knowledge we actually expect. We will organize a meeting for next Thursday (July 3) to talk about options, then open a discussion issue on GitHub if we can come up with concrete alternatives. Raniere Silva has been working over the last few weeks to generate EPUB and PDF from our materials. The goal is to allow anyone who wants to produce book-form notes to do so for use in a regular class, to give to learners after a bootcamp, or for self-study. He's been making good progress, but there are still lots of small problems (particularly with the PDF). We'll be hacking on this at the sprint. Read More ›

Our IUSE Proposal Was Rejected
Greg Wilson / 2014-06-27
We got word a few days ago that our proposal to the NSF's Improving Undergraduate STEM Education program had been rejected. The panel summary agreed that software training is a good idea, but were not convinced about our plans to shift from training grad students to undergrads. In particular, they were not convinced that Unix-based workshops would be best for undergrads, and felt that not being embedded in the regular curriculum was a weakness. These are fair criticisms: Most undergraduates only use GUIs and cannot navigate the terminal to save their lives. The perception that Software Carpentry is Unix-specific therefore hurt the proposal, and showed that we didn't clearly explain our focus on underlying concepts. The fact that we wouldn't be part of the regular curriculum is more difficult to address. The fact is, it really is hard to get undergrads to do things that interfere with earning grades, just as it's often hard to get grad students to do anything that doesn't immediately lead to a publication. Some of their concern about impact also seemed to be due to us concentrating on REU students, a self-selected bunch that already are motivated. At the same time, though, the panel did not connect the results from years of Software Carpentry workshops and the expected impact of this effort. This indicates that the proposal did not effectively communicate how well our experience to date has laid the groundwork for efforts like this. We're obviously disappointed by this rejection, but we've some useful lessons, and we hope that they will inspire others to put forward proposals of their own. Read More ›

Reminder: Lab Meeting Tomorrow
Greg Wilson / 2014-06-25
Our monthly online lab meetings will take place tomorrow (Thursday, June 26) at 10:00 am and 7:00 pm Eastern time. Please have a look at this Etherpad to see what we'll be discussing and voting on. We look forward to seeing you there. Read More ›

Translating Software Carpentry into Spanish
Greg Wilson / 2014-06-22
We are very pleased to announce that a group of volunteers have begun translating our lessons into Spanish. If you would like to help, please head over to this GitHub repository, which has instructions and contact information. Our thanks to Fran Navarro, David Perez-Suarez, Javier J. Gutiérrez, Iván González, Daniel Domene y Zuri Bauer, and Jorge Bernabé for organizing this, and to everyone else for lending a hand. Read More ›

Help Us Build an Admin Tool for Bootcamps
Greg Wilson / 2014-06-20
It takes a lot of work behind the scenes to match instructors to bootcamps, keep track of who taught (or learned) where, and manage our mailing lists. Our current cobbled-together tools are at their breaking point, so Atul Varma, a developer at Mozilla, put together a simple browser-on-the-desktop app to help administrators find instructors based on geography and skills. We'd like many other features—in particular, bulk upload of attendance lists from bootcamps—so if you would like to help us out, please clone this repository and send us fixes and additions. Many thanks to Atul for getting us started—we're looking forward to not grepping for instructors any more :-). Read More ›

Reflections on Claremont
Bill Mills / 2014-06-19
The Claremont Colleges bootcamp just wrapped a couple of hours ago, and judging from my pile of happy sticky note responses, I think we nailed it pretty good. Things were a bit fast on the first day of intro shell and Python, but I think we hit something pitch perfect for day two on intermediate Python and Git. As this Python-heavy workshop progressed, some questions started coming into focus from student responses, discussions with instructors and organizers, and examinations of just how to teach this ascendent skill in scientific computing. To be clear: I have never taught our Python material, and I consider myself a fairly casual Pythonista. This is a debate that the masters need to consider, but I'm hoping here to use my agnostic outsider's perspective to help set the conversation up. Read More ›

Teaching Support IT Job at UCL Physics and Astronomy
Ben Waugh / 2014-06-18
We're looking for someone with IT skills, and interests in teaching and physics, to work with us at the Department of Physics and Astronomy at UCL (University College London). This is a system manager job with an emphasis on supporting our teaching, and will involve a wide range of responsibilities, including managing a Linux cluster and interfacing PCs to lab equipment as well as providing first-line support for the university Windows environment. Other responsibilities may include maintaining and developing an IPython notebook server for teaching, and working with teaching staff to apply other innovative tools. The application deadline is midnight on Sunday 6th July. Applicants must already have the right to work in the UK. Read More ›

Engineering-Focused Bootcamp
Jeff Shelton / 2014-06-18
Summary: In advance of a bootcamp to be held for a Midwestern university's mechanical engineering department this fall, I've started reworking the traditional Software Carpentry (SWC) curriculum, and will be creating new teaching materials to accompany the workshop. Content changes are being made to accommodate the instructional needs of engineering students at the sponsoring university, while remaining consistent with the basic tenets and goals of the SWC movement. Comments and suggestions from the community on this matter are welcomed and encouraged, particularly in the area of PowerShell scripting (at which I'm currently a novice.) I'm also looking for instructors that might be willing to help teach such a bootcamp in the Detroit area this fall. Read More ›

Reminder: June 2014 Lab Meeting
Greg Wilson / 2014-06-17
The next Software Carpentry lab meeting will take place next week at 10:00 am and 7:00 pm Eastern time on Thursday, June 26. (My apologies to people in Australia: I promised we'd hold the evening meeting first this time around, but unfortunately that slot conflicts with instructor training.) Our agenda is up on this Etherpad; we'll be discussing plans for the July sprint and voting on several pull requests and proposals. If you'd like to cast a few votes, please take a few minutes before the meeting to look through the items that are on the agenda. Read More ›

An Update on Our Sprint Plans
Greg Wilson / 2014-06-15
Plans for our two-day sprint in July are coming together: full details are being compiled on our Etherpad, but we hope the summary below will give you a taste of what we're hoping to do. Read More ›

A Success Story from KAUST
Tareq Malas / 2014-06-15
Our ACM student chapter at King Abdullah University of Science and Technology (KAUST) is active in supporting the diverse research community at our university. We decided to offer Python tutorials to our community by the end of 2013. Due to no information on the background and interests of KAUST community, we crafted a short survey to help us in tailoring our tutorials. The following image shows the summary of the responses we got: Read More ›

Fixing 14 Repositories
Will Trimble / 2014-06-15
We cloned our workshop repository at the beginning of the workshop at Spelman to give everyone an identical directory tree and contents to explore, and to distribute the sample data files. On the first day of our workshop, about a third of the learners (all running Windows) could not clone the workshop repository: git clone was failing for them with the error: Read More ›

A Double Handful of Bootcamps
Greg Wilson / 2014-06-15
The next couple of days are going to be busy, with bootcamps in Amman and Toronto, and at Duke, UC Davis, and Claremont Colleges. The week after will be just a crowded: we'll be at the universities of Reading and Notre Dame, the Canadian Applied and Industrial Mathematics Society, Rockefeller University in New York, and an event for women in science and engineering in Philadelphia. As always, we're grateful to our hosts and instructors for making all of this possible. Read More ›

Registration Open for Instructor Training in Norwich in October
Greg Wilson / 2014-06-09
As we announced last week, we are running a two-day version of our instructor training course at The Genome Analysis Centre in Norwich, UK, on October 22-23, 2014. Registration is now open, and we look hope to see you there. Read More ›

Planning Our Summer Sprint
Greg Wilson / 2014-06-09
As a warm-up for our lab meeting on June 26, we have some more information to share about our two-day sprint in July. Please see our Etherpad for details, including more about sites and projects. Read More ›

ARCHER Software Carpentry Bootcamp and Introduction to Scientific Programming in Python
Mike Jackson / 2014-06-09
ARCHER, the UK's new national supercomputing service, offers training in software development and high-performance computing to scientists and researchers across the UK. As part of our training service we are running a 3 day Software Carpentry bootcamp and Introduction to Scientific Programming in Python at Cranfield University, UK, on 21-23 July. Read More ›

Announcing June 2014 Lab Meeting
Greg Wilson / 2014-06-09
The next of our monthly online lab meetings will take place on Thursday, June 26, at 10:00 am and 7:00 pm Eastern time. (As always, we'll hold it twice to accommodate people's work schedules and time zones.) We will update this Etherpad with agenda items as the date approaches. Read More ›

Research Computing Facilitator Jobs in Wisconsin
Greg Wilson / 2014-06-05
The Advanced Computing Infrastructure at the University of Wisconsin-Madison seeks to hire two Research Computing Facilitators (RCFs). The ideal RCF will apply a background of academic research and the use of computation to assist campus researchers in enhancing their research endeavors through the use of on- and off-campus compute resources. Additionally, RCFs should possess the interpersonal skills to interface with researchers (faculty, graduate students, etc.) and campus technical staff from a variety of organizations and compute-related services, including the campus's large-scale computing center. Primary activities include consultations with researchers and the production of education/outreach materials distributed on the web and in person. Experience teaching research computing topics, such as those in Software Carpentry bootcamps, will be a plus. Please see the full description for more details. All activities will be bolstered by participation in the NSF ACI-REF program, where a primary goal is to build an inter-campus network of facilitators and technical staff. Read More ›

Keeping the Bootcamp Fun Alive!
Jeff Hollister / 2014-06-05
In the later part of 2013, I started the process of co-hosting a Bootcamp with Peter August and Judith Swift both from the University of Rhode Island (URI). During our discussions it became quite clear that as fantastic as the bootcamps are, it would be even better if we could extend the opportunities for learning. We tossed around several ideas but eventually decided we would try a special topics course at URI. Thus, Scientific Computing and Programming for Coastal Resource Managment was born. We opened the course up to any of the bootcamp participants who had an interest; official enrollment in the course was not required. In the end we had about 10 participants who attendend regularly, including a few students who enrolled for credit. Pete August (URI), Adam Smith (URI), and myself co-taught the course. Read More ›

Introducing Scientists to Testing and Code Review
Greg Wilson / 2014-06-05
As part of our two-day sprint in July, the khmer project will be offering a mentored open source contributathon. This will provide an opportunity for people interested in trying out the "GitHub flow" model, in which contributions are submitted for review using a pull request. Since the project has lots of unit tests and fairly high code coverage, people can also see how testing and code coverage interact with software development in practice. For more information, please see Titus Brown's blog post. Read More ›

Instructor Training in Norwich, October 2014
Greg Wilson / 2014-06-05
We are pleased to announce that we will be running a two-day version of our instructor training course at The Genome Analysis Centre in Norwich, UK, on October 22-23, 2014. Registration details will be available shortly, but if you are interested in taking part, please mark your calendar and keep an eye on this blog, or the announcement page, for details. Our thanks to the good folks at TGAC for making this possible—we look forward to an engaging couple of days. Read More ›

Collected Links
Greg Wilson / 2014-06-05
A lot of people have been writing about Software Carpentry on other blogs of late. Here, in no particular order, are a few of their posts: Damien Irving: A vision for data analysis in the weather and climate sciences Deb Paul: Tales from A Data Carpentry Workshop: In Demand Tracy Teal: Inaugural Data Carpentry Workshop Oxana Sachenkova: Software Carpentry at SciLifeLab, Sweden Rob Davey: From the workshop to the workstation: Software Carpentry training from a bioinformatics perspective Stephen Turner: Collaborative lesson development with GitHub Karin Lagesen: The one where I went to Sweden Damien Irving (again): Software Carpentry for Biomedical Imaging Read More ›

Teaching Librarians in Montreal
Dhavide Aruliah / 2014-05-28
Preston Holmes, Jessica Hamrick, Luke Lee, and I helped deliver a Software Carpentry bootcamp during the PyCon sprints in Montreal in April 2014. The audience consisted of roughly 35 librarians coming mostly from the Montreal area. Read More ›

Learning to Teach Never Ends
Aleksandra Pawlik / 2014-05-28
A month ago I took part in the first face-to-face Software Carpentry Instructors Training run by Greg Wilson from Software Carpentry and Warren Code from University of British Columbia. Unlike most of 40 participants, I had already completed the instructors course which Greg regularly runs online. My primary aim to attend the face-to-face training was to observe and learn how to run it because the plan is that the Software Sustainability Institute (which I work for) will support running such training in the future. So I attended the training in Toronto as a certified instructor and also as a Software Carpentry co-admin in the UK. Wearing different hats allowed me to look at the event from different perspectives and this is probably why it took me so long to write this post. Read More ›

Announcing Two More WiSE Bootcamps
Greg Wilson / 2014-05-26
Thanks to generous support from Waterfront International and Stevens Capital Management, we are pleased to announce that we are running two bootcamps for women in science and engineering this summer: one in Philadelphia on June 24-25, and the other in Toronto on July 8-9. Both bootcamps are open to female graduate students, post-docs, and faculty in science, engineering, and medicine, particularly those interested in analyzing large data, and build on previous WiSE bootcamps in Boston in 2013 and Berkeley earlier this year. Read More ›

Summary of May 2014 Lab Meeting
Greg Wilson / 2014-05-24
Our monthly online lab meeting took place this past Thursday (May 22), and for the first time it included voting on pull requests and other issues. All the notes from the Etherpad are included below, but the high points are: The Mozilla Science Lab is hiring a developer and a community manager. People and projects that would like to take part in our global sprint on July 22-23 are invited to sign up on this Etherpad. We won't just work on Software Carpentry curriculum and tooling: related projects are welcome to use this opportunity to bring their communities together as well. Arliss Collins and others will try to simplify the workflow for creating and managing bootcamps. (Right now, we rely on five different online systems, and most of the administrators we work with at host institutions can't make heads or tails of them.) If you would like to help, please let us know. We will add people who want to be helpers at bootcamps to our instructors mailing list rather than creating yet another list for reaching them. We voted on the following pull requests: Extra lessons will go under existing directories (e.g., novice/git/) rather than in a top-level extras directory. We won't try to standardize usage of "parameter" and "argument", since most instructors use them idiosyncratically and/or interchangeably. We will incorporate the new lessons on Mercurial as soon as they're done. We will merge the lessons on scikit-learn, Python string formatting, common Python error messages, and setting up SSH keys for GitHub. We won't include the lesson on tmux—people felt it was too specialized—and will ask that the lesson on text data mining in the shell be re-worked. We also voted on the following proposals for new lessons: Using Excel properly, using Make to manage data pipelines, and regular expressions were all approved. The draft lesson on creating and syndicating data on the web was deferred (only a few people had looked at it). People liked the idea of lessons on statistics with Pandas and managing geospatial data, but we will need volunteers to take the lead. Our next lab meeting, on June 26, will primarily be devoted to planning for our July 22-23 sprint. We look forward to seeing lots of you at both. Read More ›

Data Science Study Invitation
Greg Wilson / 2014-05-24
Katie Kuksenok, a graduate student at the University of Washington, is interviewing academic researchers who do data science to explore the barriers and challenges they face. If you would be willing to take 20-30 minutes to be interviewed, please get in touch: among other things, her findings will be used to help improve curriculum for projects like Software Carpentry. Read More ›

Lab Meeting Reminder
Greg Wilson / 2014-05-21
Our monthly lab meeting will take place on Thursday, May 22 at 10:00 am and again at 7:00 pm, Eastern time. There's a lot to discuss, so please take a moment beforehand to go through the lab meeting Etherpad and see if there are any pull requests or enhancement proposals that you are particularly interested in. We look forward to seeing you online. Read More ›

Behind the Scenes
Greg Wilson / 2014-05-20
A lot of people work behind the scenes to organize bootcamps, keep our website going (and readable), and generally make it possble for the rest of us to do what we do. I'd therefore like to offer some long-overdue thanks to Amy Brown, Arliss Collins, Abby Cabunoc, Ivan Gonzalez, Jon Pipitone, David Rio, and Raniere Silva for all their hard work: we wouldn't have come this far without them, and we're very grateful for everything they've done. Read More ›

A Lot of Bootcamps in the Works
Greg Wilson / 2014-05-20
Last week, we ran our first bootcamp in Brazil—many thanks to Raniere Silva for this writeup about what worked and what didn't. We'll visit several other new countries over the next three months, as well as running a bunch of other events—we'll post again as registration opens up for these: Read More ›

Our First Data Carpentry Workshop
Karen Cranston / 2014-05-14
Update: for more information on Data Carpentry, please see their web site. On May 8 and 9, 2014, 4 instructors, 4 assistants, and 27 learners filed into the largest meeting space at the National Evolutionary Synthesis Center (NESCent) for the inaugural Data Carpentry bootcamp. Data Carpentry is modeled on Software Carpentry, but focuses on tools and practices for more productively managing and manipulating data. The inaugural group of learners for this bootcamp was very diverse. They included graduate students, postdocs, faculty and staff, from three of the largest local research universities (Duke University, University of North Carolina, and North Carolina State University). Over 55% of the attendees were women and research areas ranged from evolutionary biology and ecology to microbial ecology, fungal phylogenomics, marine biology, and environmental engineering. One participant was even a library scientist from Duke Library. Read More ›

Job Openings at the Mozilla Science Lab
Greg Wilson / 2014-05-14
The Mozilla Science Lab is looking for a community manager to build and scale existing community outreach efforts, and for a developer to lead technical prototyping efforts and engage with our community about technical projects. Possible office locations for these positions include Brooklyn, Toronto, London, Vancouver, and San Francisco, but we will consider remote working opportunities for the right candidate. For more information, or to apply, please see these postings. Read More ›

Technical Training Officer (LVT Training Officer) position at TGAC
Aleksandra Pawlik / 2014-05-14
The Genome Analysis Centre (TGAC) based at Norwich, UK is looking for a Linux and Virtualisation Technical Training Officer who will join the Training and Outreach Team. Read More ›

Agenda for This Month's Lab Meeting
Greg Wilson / 2014-05-13
The next Software Carpentry lab meeting will be held at 10:00 Eastern on Thursday, May 22, and repeated at 19:00 Eastern on the same day. (You can use this site to translate those times into your time zone.) We will be voting on several small additions to the lesson material—please see the Etherpad for the list—and discussing plans for future additions, and the multi-site sprint in July. Conference call details will be on the Etherpad on Thursday morning; please add other agenda items to the pad, and we look forward to seeing you all then. Read More ›

A Training Veteran Weighs In
Christina Koch / 2014-05-08
I'm not sure if I qualify as a "instructional training veteran", but having participated in both the online Software Carpentry (SWC) instructor training, and two versions of the Instructional Skills Workshop (ISW) (on which the latest "live" SWC training was modeled) at the University of British Columbia, I'd like to share a few comments comparing my experience with both and what I've valued from each. Feel free to add your own impressions of the SWC or other instructional training in the comments at the end. Read More ›

Knocking on the Future's Door
Greg Wilson / 2014-05-08
Once again I feel like I'm knocking on the future's door but nobody's answering. The task we set ourselves seemed simple: produce a nicely-formatted PDF of the Version 5 lessons to give learners as a reference (and to print as a book to give instructors when they finish their training). Fifty years after the creation of the first computer typesetting systems, you'd think this would be easy. It's not, and the reasons why highlight yet again why so many scientists would rather keep playing the kazoo than learn to play the violin. Read More ›

Assessment Results: First Batch
Jory Schossau / 2014-05-08
As many of our instructors and participants know Software Carpentry has been giving surveys for a number of months at every possible workshop. This takes a surprising amount of coordination and attention, much of which we owe to our wonderful two administrators: Amy Brown and Arliss Collins. The surveys are being administered before and after each workshop in an effort to gather difference data which could answer questions such as "How useful was this workshop to participants?", "Who benefits most from these workshops?", or "What ways could we improve the workshop?" We can also use this information to ask about the professional demographics such as "What fields are most represented by our participants?" and "What's the typical amount of programming experience?" both of which could help inform a tailored workshop experience to better suit the audience. Read More ›

Playing the Kazoo
Greg Wilson / 2014-05-05
Yesterday, Matt Davis quoted Peter Wang as saying, "A violin is to a kazoo as Python is to Excel." To which I replied, "Exactly: anyone who wants to make music can play a kazoo right away without days of training." The difference between these two points of view lies at the heart of Software Carpentry. As I said in a post two years ago: Read More ›

A Multi-Site Sprint in July
Greg Wilson / 2014-05-05
We'll be holding our first-ever global sprint on July 22-23, 2014. This event will be modeled on Random Hacks of Kindness: people will work with friends and colleagues at sites around the globe, then hand off to participants west of them as their days end and others' begin. We will set up video conferencing between the various locations and a show-and-tell at the end (and yes, there will be stickers and t-shirts). We have booked space for the sprint at the Mozilla offices in Paris, London, Toronto, Vancouver, and San Francisco. If you aren't in one of those cities, but are willing to help organize in your area, please add yourself to this Etherpad. We'll hash out the what and how at the next lab meeting—it's a community event, so we'd like the community to choose what to sprint on—but please get the date in your calendar: it just wouldn't be a party without you. Read More ›

How to Improve Instructor Training
Greg Wilson / 2014-05-02
We ran a three-day intensive version of our instructor training course in Toronto earlier this week. The 40 attendees seemed to find it useful, and I'm very grateful to UBC's Warren Code for co-teaching, and to the University of Toronto's Jennifer Campbell for her presentation on MOOCs and flipped classrooms, but there are a few things we'll do differently next time around. They're listed below in no particular order; comments from attendees (and everyone else, particularly people who've been through the online instructor training) would be very welcome. Read More ›

Wise as Athena...
Greg Wilson / 2014-05-01
Katy Huff has written an article for the Berkeley Science Review about our recent bootcamp for women in science and engineering in California. It's a good summary of what Software Carpentry is all about as well—please check it out. Read More ›

PyCon 2014 Videos
Greg Wilson / 2014-04-27
A double handful of people associated with Software Carpentry gave talks at PyCon 2014 in Montreal two weeks ago. Thanks to Sheila Miguez and Will Kahn-Greene (who run the excellent PyVideo site), you can now view their presentations: Read More ›

April 2014 Lab Meeting
Greg Wilson / 2014-04-25
We held another online lab meeting yesterday, which covered a fairly wide range of topics. The notes are below; comments about what we missed, and suggestions for the next lab meeting, are very welcome. Read More ›

Position: Systems Integration Developer at UW-Madison
Lauren Michael / 2014-04-24
The Center for High Throughput Computing (CHTC) at the University of Wisconsin-Madison provides compute resources and services to campus researchers at UW-Madison, and also develops the HTCondor computational scheduling software, which is used all over the world. We are seeking to hire a Systems Integration Developer, who will help us to accelerate the work of researchers through computational middleware scripting and the design of workflow tools. Please see the online posting at http://www.ohr.wisc.edu/WebListing/Unclassified/PVLSummary.aspx?pvl_num=78987 for details. Read More ›

Mr. Biczo Was Right
Greg Wilson / 2014-04-23
I didn't have nearly enough time to enjoy everything that was going on at PyCon 2014 last week. One event I particularly regret missing was a sprint organized in part by the folks at Scrapinghub. They got a bunch of people to write little scrapers to go through old conference websites, pull out speakers' names, and run them through a gender identification library in order to plot changes in gender balance per conference over time. The code is all available on GitHub, and Gayane Petrosyan (currently at the Hacker School in New York City) has started plotting some of the results. Read More ›

Math Authoring Gap and MathUI
Raniere Silva / 2014-04-23
In a previous post Greg Wilson wrote about the writing gap between people who value the "I can see what I'm doing" of Microsoft Word and those who care more about the "I can tell what I did" of version control. For STEM (science, technology, engineering and math) folks there is also the gap related to math expressions (you can use LaTeX to deal with math expressions but for beginners who spent around 20 years using Microsoft Word moving to LaTeX can be challenging). Do you know that some people are working to solve the gap with math expressions? Read More ›

Import Lesson
Greg Wilson / 2014-04-23
Lorena Barba has done it again. Having created a wonderful 12-step introduction to the Navier-Stokes equations using the IPython Notebook, she has now published AeroPython, which teaches the use of potential flow for aerodynamic analysis via the panel method. You don't have to know what that is to appreciate the beauty of what she has built—I certainly don't, at least not yet—but I'd like to explore something that isn't in those notebooks. Before doing that, though, I need to introduce an acronym that never caught on. Read More ›

GSoC Projects for 2014
Greg Wilson / 2014-04-22
We're very pleased to announce that two students will be working on Google Summer of Code projects we put forward, and that one of our regular contributors has received a GSoC as well. The projects are: Read More ›

Software Carpentry bootcamp at GARNet
Aleksandra Pawlik, Christina Koch / 2014-04-22
The second week of April the University of Warwick in the UK hosted its first Software Carpentry bootcamp. The bootcamp was organised by GARNet which is a UK-based research network for the UK Arabidopsis and wider plant research community. GARNet facilitates collaboration and interaction between different researchers and supports skills development. Read More ›

Office Hours for Code as a Research Object
Kaitlin Thaney / 2014-04-22
Looking to learn more about how to get a DOI for your code? Interested in the underlying technology used to hook up GitHub (a code hosting service) and figshare (an open data repository)? Join us for our online office hours on Thursday, April 24. We'll be shedding light on the technical build done as part of our "code as a research object" project. We'll be holding two sessions, one at 11 am ET for those in North America, Europe, and Africa, and another at 4 pm ET for our Australasian colleagues. These sessions are open to all, and are your chance to ask your implementation questions, gain a better understanding of how the prototype works, and peek under the hood at the codebase. Have a question? Add it to the etherpad, where you can also find useful background on the project as well as dial-in information. We hope you'll join us. Read More ›

Workshops at SESYNC
Greg Wilson / 2014-04-19
Applications are now open for the 2014 Computational Summer Institute at the National Socio-Environment Synthesis Center (SESYNC). Small teams of researchers are invited to apply for spaces at the one-week institute which will be held July 7-11 in Annapolis, Maryland. The workshop will offer participants hands-on training in managing the lifecycle of their data and code with a focus on using open source tools, including R. Topics will include Read More ›

Workshop at University of Southern Denmark, Odense
Steven Koenig / 2014-04-18
The workshop at University of Southern Denmark, Odense was my first workshop for Software Carpentry, the same is true for Luis as well. We had taught before, Luis more than me, but it was still exciting for me to see how this workshop would go given that we never taught together before. While I taught bash, regular expressions and Git, Luis taught Python, unit testing and SQL. Initially, make was also on that list, but due to time constraints was kicked out to make room for a second Git lecture. Read More ›

Changing the Channel
Greg Wilson / 2014-04-18
A lot of open source projects use an antiquated-but-reliable chat system called IRC for long-running conversations. We've had an IRC channel for Software Carpentry for a while, but it's never had very much traffic. It also hasn't done anything to help people get involved in the open science community at large. We are therefore retiring it, and encouraging people—all kinds of people, not just instructors and bootcamp alumni—to join us on the Mozilla Science Lab IRC channel instead. We hope this will become a place where people can swap tips on teaching programming to scientists, discuss new forms of publication, seek help on specific research problems, and everything in between. Read More ›

Do Not Be Worried
Greg Wilson / 2014-04-16
Sage advice for our instructors and learners from an eight-year-old: Read More ›

Summarizing Our Instructors' Skills
Greg Wilson / 2014-04-15
We've been asking bootcamp participants to tell us about themselves for a while now, so it seems only fair to share some information about our instructors. First, of the 82 instructors who responded to our survey two weeks ago, how many are comfortable teaching which topic to novices, to intermediates, or not at all? Read More ›

Bridging the Writing Gap
Greg Wilson / 2014-04-06
A few months ago, we had an interesting discussion about what Software Carpentry should teach about writing and publishing in the 21st Century. One thing that came through loud and clear was the gulf between people who value the "I can see what I'm doing" of Microsoft Word and those who care more about the "I can tell what I did" of version control1. Sites like Authorea, writeLaTeX, and ShareLaTeX are trying to bridge that gulf by giving people a WYSIWYG authoring tool in the browser that uses LaTeX as a storage format. This is pretty exciting: since it (potentially) allows collaborators to interact in whichever mode they prefer, it allows people to transition from one to the other instead of requiring them to make a great leap sideways. Both sites currently allow people to save work on site or in Dropbox. It would be very cool if they also allowed people to save work in online version control repositories such as GitHub. Someone who isn't comfortable with version control could simply select "Save..." to push their changes, while someone who's already mastered pull requests and merging could interact that way, so that once again, the system could help people transition gradually from one mode to the other. Read More ›

Does Continuous Publication Require Continuous Attention?
Greg Wilson / 2014-04-05
I read this post by Martin Fenner a couple of weeks ago. His thesis is that scientific publication is still very much a manual process, which makes publications relatively infrequent (and fairly painful) events. Instead, we ought to strive for continuous delivery: production of the "paper" (including release of associated code and data) should be fully automated so that authors can ship whenever they want with relatively little effort. Continuous delivery is popular among software developers, who frequently argue it's more efficient using diagrams like this: Read More ›

Uniting the Narrative
Bill Mills / 2014-04-04
The Victoria workshop that just wrapped a few hours ago was my first stab at teaching SWC live, and I have to say - few things feel better in teaching than the tiny victory of having material that is right on pitch. I've had the lurking notion for years that the hilariously overshot lectures ('I need to cover the syllabus in half time so I can go on sabbatical'), conference talks ('One slide per 30 seconds for a solid hour seems reasonable') and lab presentations ('A 5 by 5 grid of histograms on one slide will communicate ALL the science!') were the tragic offspring of time constraints and one-upmanship, and in desperate need of a pan-academia slide deck disarmament treaty. I always pushed my students to strip it down, just hit the highlights and leave the details for the paper, because I always suspected that that little bit of entertainer's savvy was how we could really reach people: not by fact-bombing the audience into a glassy torpor, but by electrifying their curiosity just enough that the energy that propelled them forward was their own. Read More ›

Summary of March 2014 Meeting to Discuss Novice R Material
John Blischak / 2014-04-04
Last week we had our first meeting to discuss the development of the novice R material. Our goal was to determine our plan for collaboratively creating the lessons. Read More ›

Announcing NBDiff
Greg Wilson / 2014-03-30
I am very pleased to announce the first release of NBDiff, a tool for diffing and merging IPython Notebooks that is now available on GitHub and PyPI. NBDiff was created by a team of students at Concordia University in Montréal; our thanks and congratulations to Shurouq Abusalah, Tavish Armstrong, Marwa Malti, Lina Nouh, Boris Pipev, Selena Sachdeva, and Richard Tang. For more information, see the NBDiff web site. Read More ›

Updating Our Checklists
Greg Wilson / 2014-03-29
We have updated our checklists for bootcamp administrators, hosts, instructors (and lead instructors), and helpers. Further improvements are welcome, and we'd appreciate comments on this post suggesting items we could put in a checklist for learners. Read More ›

One of Our Inspirations
Greg Wilson / 2014-03-27
Last week's post about our original logo reminded me that I'd never blogged about one of the books that inspired Software Carpentry in the first place: Rubin Landau and Paul Fink's A Scientist's and Engineer's Guide to Workstations and Supercomputers: Coping with Unix, RISC, Vectors, and Programming. You'd have add a few zeroes to the speeds, bandwidths, and data sizes they discuss to bring it up to date, but other than that, it's held up surprisingly (or depressingly) well. Read More ›

Building a Minimal Online Presence
Greg Wilson / 2014-03-27
Titus Brown and Ethan White led a half-hour discussion today of the minimal online presence people in academia ought to have. Notes from the call are below; to make a point-form story even pointier, please make yourself findable and shareable. Read More ›

Changing Our Core Curriculum
Greg Wilson / 2014-03-27
We currently say that people must teach task automation, structured programming, version control, and unit testing in order to use the Software Carpentry name. However, many bootcamps either don't teach unit testing at all, or only cover it briefly, while many of us teach data management using SQL, which also seems like a core lab skill for scientific computing. We asked instructors this week whether we should formalize the change, and they responded as follows: Read More ›

Announcing Our Next Lab Meeting
Greg Wilson / 2014-03-27
Our next online lab meeting will be held on Thursday, April 24, 2014. As always, we'll run it twice to accommodate different time zones; please keep an eye on this blog, our Twitter account, and the mailing lists for connection details. Read More ›

What Tools Do You Use to Get Your Job Done?
David Rio / 2014-03-23
Some of you may have heard of usesthis.com. In this site people are interviewed about the tools they use to get their job done. The format is very straight forward, participants have to answer these questions: Read More ›

Not on the Shelves
Greg Wilson / 2014-03-23
Every few years, I indulge in a bit of sympathetic magic by writing reviews of books that don't actually exist in the hope that it will inspire someone to write them. Previous versions written in 1997, 2003, and 2009 led to Beautiful Code, Making Software, The Architecture of Open Source Applications, and a few others books as well. I'd welcome comments on what isn't in this list that you really wish you could read. Read More ›

Empirical Software Engineering Papers
Greg Wilson / 2014-03-19
When I teach scientists programming, I frequently cite empirical studies in software engineering to back up my claims about various tools and practices making people more productive. No good, short survey of those papers exists—writing one has been on my to-do list for several years—but I hope the pointers below will be a useful substitute. Read More ›

Our Original Logo
Greg Wilson / 2014-03-18
How old is Software Carpentry? So old that the only surviving copy of our original logo is an unanimated GIF: Read More ›

Data Science Workshops in Seattle
Greg Wilson / 2014-03-18
Via Sumana Harihareswara: the Community Data Science Workshops are a series of project-based workshops being held at the University of Washington for anyone interested in learning how to use programming and data science tools to ask and answer questions about online communities like Wikipedia, Twitter, free and open source software, and civic media. The workshops are for people with no previous programming experience. The goal is to bring together both researchers and academics as well as participants and leaders in online communities. The workshops will all be free of charge. Participants from outside UW are encouraged to apply. There will be three workshops held from 9am-4pm on three Saturdays in April and May. Each session will involve a period for lecture and technical demonstrations in the morning. This will be followed by a lunch graciously provided by the eSciences Institute at UW. The rest of the day will be followed by group work on programming and data science projects supported by more experienced mentors. For more information, see the full announcement. Read More ›

PyCon is Just a Month Away
Diana Clark / 2014-03-17
With PyCon just under a month away, there are a few things we want to make sure everyone knows about: Read More ›

You and Jimi Hendrix
Greg Wilson / 2014-03-14
I had a discussion a couple of weeks ago about software development tools and processes with some undergraduate students I'm mentoring. They asked why I'm so finicky about putting things under version control, writing unit tests, and creating tickets to keep track of what still needs to be done. The short answer is, because that's what Jimi Hendrix would have done. Read More ›

A Letter from John von Neumann
Greg Wilson / 2014-03-14
Plans to standardize and publish codes of various groups have been made in the past, and they have not been very successful so far. The difficulty is that most people who have been active in this field seem to believe that it is easier to write a new code than to understand an old one. This is probably exaggerated, but it is certainly true that the process of understanding a code practically involves redoing it de novo. The situation is not very unlike the one which existed in formal logics over a long initial period, where every new author invented a new symbolism. It took several decades until a few of these found wider acceptance, at least within limited groups. In the case of computing machine codes, the situation is even more difficult, since all logics refer, at least ideally, to the same substratum, whereas the machine codes frequently refer to physically different machines. I think, nevertheless, that if a competent mathematician...is interested in working on this problem, he ought to be encouraged. The task may turn out to be a thankless one, but it is always possible that a competent and energetic man will in the end come up with something useful. My personal doubts are limited to the near future, and even if it will take a non-trivial number of years to produce something, there is no harm in starting early. — John von Neumann, in a letter to Marston Morse, April 23, 1952 Read More ›

Software Carpentry at TGAC
Aleksandra Pawlik / 2014-03-14
In February The Genome Analysis Centre (TGAC) in Norwich hosted their first Software Carpentry bootcamp. TGAC is keen to intensively develop their training programme and facilities and Vicky Schneider who leads the Scientific Training, Education & Learning Programme strongly supports Software Carpentry, we are looking forward to more bootcamps in Norwich, UK. Read More ›

Everything Old is New Again
Greg Wilson / 2014-03-14
Yesterday, the New York Times R&D Lab announced streamtools, a web-based graphical tool for working with streams of data. It lets users create dataflow systems to re-mix and process the streams they believe most data will consist of in the near future. Streamtools looks nice, but I do have some questions: Read More ›

Collaborative Lesson Development - Why Not?
Justin Kitzes / 2014-03-14
A few weeks ago, Greg Wilson asked me: Why is there so little open, collaborative development of lesson plans and curricula? Is there something that makes teaching different from coding (e.g., open source software) and from writing (e.g., Wikipedia)? A dozen emails later, I can't claim that we're much closer to a definitive answer, but we have come up with a working hypothesis. We'd be very interested in feedback. Read More ›

John Hunter Technology Fellowship 2014
Greg Wilson / 2014-03-12
The John Hunter Technology Fellowship aims to bridge the gap between academia and real-world, open-source scientific computing projects by providing a capstone experience for individuals coming from a scientific educational background. The program consists of a 6 month project-based training program for postdoctoral scientists or senior graduate students. Fellows work on scientific computing open source projects under the guidance of mentors who are leading scientists and software engineers. The aim of the Fellowship is to enable Fellows to develop the skills needed to contribute to cutting-edge open source software projects while at the same time advancing or supporting the research program they and their mentor are involved in. The Fellow receives an award of $33,000, which is meant to support the Fellow for 6 months of full-time work; The Fellowship can be started in the July 2014 - Jan 2015 time frame, and applications are due May 15, 2014. For more information, and to apply, please see the full announcement Read More ›

Reproducibility Workshop at XSEDE
Greg Wilson / 2014-03-07
The reproducibility@XSEDE workshop is a full-day event scheduled for Monday, July 14, 2014 in Atlanta, Georgia. The workshop will take place in conjunction with XSEDE14, the annual conference of the Extreme Science and Engineering Discovery Environment, and will feature an interactive, open-ended, discussion-oriented agenda focused on reproducibility in large-scale computational science. We hope to help promote a culture of reproducibility within the broad community of stakeholders connected to computation-enabled research, and expect our work to lead to recommendations address to this community. This event will build on the 2009 Yale Data and Code Sharing Roundtable, which culminated in a declaration "demanding a resolution to the credibility crisis from the lack of reproducible research in computational science". To find out more, please see the workshop website, or send questions to reproducibility@xsede.org. Read More ›

Anatole France, Updated
Greg Wilson / 2014-03-07
Anatole France (1844-1924) Then The law, in its majestic equality, forbids the rich and poor alike to sleep under bridges, to beg in the streets, and to steal bread. Now The Internet, in its majestic equality, allows every scientist to analyze massive data sets using web services and cloud computing. Read More ›

Learn How to Teach People to Program
Greg Wilson / 2014-03-04
We are very pleased to announce that we will be running two one-day workshops on how to teach programming during the PyCon sprints in Montreal on April 14 and 15, 2014. The workshops will be led by Greg Wilson, who will show participants how to use basic concepts from educational psychology and instructional design when teaching beginners how to code. The event is open to everyone, not just to PyCon attendees; please register here, or mail us if you have any questions. Read More ›

A Workshop for Librarians at PyCon
Greg Wilson / 2014-03-04
We are very pleased to announce that we will be running a two-day workshop on programming skills for librarians during the PyCon sprints in Montreal on April 14 and 15, 2014. The workshop will introduce participants to core skills for managing and manipulating data, including task automation, version control, modular programming, and databases. The event is open to everyone, not just to PyCon attendees; please see this page for more information, register here, mail us if you have any questions, and help spread the word to your friends and colleagues. Read More ›

SSI Collaborations Workshop and Hackday
Aleksandra Pawlik / 2014-03-03
The Software Sustainability Institute is running the annual Collaborations Workshop (CW14) on 26-28 March in Oxford. The event brings together researchers, software developers, managers and funders to explore important ideas in software and research and to plant the seed of interdisciplinary collaborations. The theme this year is "Software in your reproducible research" and we also have a hackday! Read More ›

Summary of Feb 2014 Lab Meeting
Greg Wilson / 2014-03-03
We held a lab meeting last week. 48 people attended in two sessions (to accommodate different time zones); the main points are summarized below. Read More ›

Software Carpentry on the CBC
Greg Wilson / 2014-03-01
Greg Wilson was recently interviewed by CBC Radio's "Spark" program: you can listen to the interview online, or catch it live this Sunday at 1:00 pm. Read More ›

The Open Scoop Challenge
Greg Wilson / 2014-02-25
Sooner or later, in every discussion of open science, someone will say there's a risk of being scooped if you share your code or data with your peers. I think this is pure FUD: in all the years I've worked with scientists, I have never met anyone this has actually happened to. But absence of proof is not proof of absence, so I'd like to issue a challenge. If someone has ever published a result you were going to by taking advantage of software or data that you made publicly available, I'll send you a Software Carpentry t-shirt. You'll need to provide specifics, but I won't share those details with anyone without your permission. Please mail me if you'd like to chat. Read More ›

Software Carpentry: the University Course
Damien Irving / 2014-02-24
I recently submitted my final assignment for the Specialist Certificate in Teaching for Graduate Researchers. For this final task, we were asked to design a curriculum document for a university subject that we would like to see taught in the future. The first thing that came to mind was Software Carpentry, so here's my take on what a bootcamp would look like, if it were to become an actual university course. Read More ›

Lab Meeting (Feb 2014)
Greg Wilson / 2014-02-23
We're having our first lab meeting of 2014 this week. As usual, we'll hold it twice to accommodate different time zones: once at 20:00 Eastern on Wednesday, February 26, and again at 11:00 Eastern on Thursday, February 27. Connection details will be posted on the meeting's Etherpad, and everyone is welcome both to attend and to add agenda items to the 'pad beforehand. Read More ›

From Training to Engagement
Greg Wilson / 2014-02-21
I was interviewed about Software Carpentry earlier this week, and the interviewer's second question was, "Don't scientists all learn how to program these days as part of their education?" The answer, even today, is "no": the average scientist might know more about calculus and statistics than someone who did a degree in marketing or graphic design, but she probably doesn't know any more about how to build software and share data on the web. Brent Gorda and I started Software Carpentry in 1998 to fix that. Our goal was to teach our colleagues the equivalent of basic lab skills for scientific computing so that they could get more done in less time and with less pain. As the project grew, I realized that this problem wasn't specific to scientists: almost everyone who uses the web spends hours or days doing things that a few simple programs could do for them faster and more reliably. Read More ›

Lessons Learned Has Been Published
Greg Wilson / 2014-02-19
I'm pleased to announce that "Software Carpentry: Lessons Learned" has been published on F1000Research. A paper like this is necessarily incomplete, and so is any acknowledgments list, but I'd like to thank the following for their feedback: Read More ›

Our Biggest Event Ever
Greg Wilson / 2014-02-13
If you have friends or colleagues in or near Montreal, please let them know that we're running our largest bootcamp ever on April 14-15 in conjunction with PyCon 2014. Material will include our usual topics: the Unix shell (and how to automate repetitive tasks); Git and GitHub (and how to use version control to track and share work); Python or R (and how to grow a program in a structured, testable, reusable way); and databases (and the difference between structured and unstructured data). but we'll split people between several rooms based on how much they already know so that each group can move at its own pace. Register now! Read More ›

rOpenSci Hackathon
Karthik Ram / 2014-02-12
The rOpenSci team has been cranking out a large number of software tools over the past several months. As regular blog readers are aware, our software packages provide programmatic access to a diverse and extensive trove of scientific data. More recently we've expanded our efforts to build more general purpose and cross-domain tools. These include tools for reading, writing, integrating and publishing data, a unit testing platform for data, and a mapping engine that can visualize various kinds of spatial data. Many of our projects are inspired by ad hoc discussions with other scientists and software developers both online (often on Twitter and GitHub) and offline. Several of these folks are now regular contributors to the project. To foster more such collaborations and drive new software innovations, we are excited to announce our first developer meeting next month at GitHub's headquarters in San Francisco. This meeting is made possible by support from the Alfred P. Sloan foundation and GitHub. Read More ›

An Online Peer Instruction Tool
Greg Wilson / 2014-02-09
Peer instruction is a teaching technique originally developed by Eric Mazur and colleagues in the early 1990s. Study after study has shown that peer instructions works better than conventional lecturing, but to the best of my knowledge, no online learning platform directly supports peer instruction. I'd like to fix that, and I hope some of our readers would like to help. Read More ›

Wrapping Up Round 7 (and a Reminder About Instructor Training)
Greg Wilson / 2014-02-09
We had the wrap-up meetings for Round 7 of instructor training this past Thursday. The full summary is on the training blog, but the key points are: We need to include a lesson early in the course on the structure of the materials in our repository, and how to configure things for a workshop. We'll do a trial run on Thursday, February 20, in the usual time slots (10:00, 14:00, and 19:00 Eastern); please have a look at the Etherpad for connection details on the day. People want a chance to watch more experienced teachers in action. One possibility would be to set up something like SmarterCookie or Edthena, where teachers upload videos of themselves in action so that other people can comment on what they do and how they do it. We will think about ways and means and see what we can set up in the next couple of weeks. Our thanks once again to everyone who stayed with us through what turned out to be a longer-than-usual run of the course: I look forward to teaching with you all some day soon. Finally, there are still seats available for the live three-day version of the instructor training course, which will run in Toronto on April 28-30, 2014. If you'd like to learn more about teaching in general, and teaching programming in specific, please sign up. Read More ›

Keeping Track of Problems
Greg Wilson / 2014-02-05
Setup and configuration is often the biggest challenge bootcamp participants face, particularly on Windows. Our installation instructions are pretty good, but to make them better, Justin Kitzes has started building a wiki page on GitHub to keep track of what's gone wrong and what fixes have worked. Many thanks to Justin for getting the ball rolling—contributions would be very welcome. Read More ›

Workshops at the Data Science Centers
Greg Wilson / 2014-01-30
We are pleased to announce that we will be running workshops simultaneously at the data science centers at UC Berkeley, New York University, and the University of Washington on March 17-18, 2014. These workshops will introduce participants to the skills they should master before they tackle anything with "cloud" or "peta" in its name, including task automation, version control, modular programming, and structured data management. Novice and intermediate learners alike are welcome; for more information on topics and other details, please mail us or check out the registration pages for the three sites: Read More ›

Workshop for Women in Science and Engineering: April 14-15 at LBL
Greg Wilson / 2014-01-28
We're very pleased to announce that registration is now open for our second workshop for Women In Science and Engineering, which will be held at Lawrence Berkeley National Laboratory on April 14-15, 2014. We have a stellar roster of instructors and room for over 100 learners, so please sign up early and pass the word on to your friends and colleagues. Read More ›

Workshops at PyCon in Montreal This April
Greg Wilson / 2014-01-28
We are pleased to announce that registration is now open for our largest event ever: on April 14-15, 2014, in association with PyCon 2014 in Montreal, we will run three Software Carpentry workshops in parallel, along with a master class in next-generation sequencing for bioinformaticians and an introduction to R for Python programmers. (Note that these workshops are during the two days after PyCon, so you can attend both.) Read More ›

Teaching Online (Sort Of) in 2014
Greg Wilson / 2014-01-28
We have tried several times to teach Software Carpentry online, but the results have always been disappointing. We are therefore going to try something different this spring. Instead of running a purely online course for several hours a week, we are going to run regular two-day workshops in four selected cities with a mix of in-person and remote instructors. Read More ›

Research Transparency Job at UC Berkeley
Greg Wilson / 2014-01-26
Interested in improving the standards of rigor in empirical social science research? Eager to collaborate with leading economists, political scientists and psychologists to promote research transparency? Wishing to stay abreast of new advances in empirical research methods and transparency software development? The Berkeley Initiative for Transparency in the Social Sciences (BITSS) is looking for a Program Associate to support the initiative's evaluation and outreach efforts. The candidate is expected to engage actively with social science researchers to raise awareness of new and emerging tools for research transparency. If this sounds like fun, apply now. Read More ›

Feedback from the First MATLAB Bootcamp
Aleksandra Pawlik / 2014-01-24
Jan 28: Michael Croucher has also written a good post about what worked and what didn't at this bootcamp. Last week Software Carpentry paired up with MATLAB to run a bootcamp at the University of Manchester. Apart from shell and version control, the remaining core topics (verification and testing, modular programming with algorithm design) were taught using MATLAB. Read More ›

Why Not a MOOC?
Greg Wilson / 2014-01-19
We got mail yesterday asking us whether we were going to run Software Carpentry as a MOOC. The short answer is, "No." The full answer has several parts: Read More ›

Wrapping Up in Narrangasett
Ivan Gonzalez / 2014-01-18
I've just came back from the Software Carpentry bootcamp at Narrangasett, RI, held at the Coastal Institute of the University of Rhode Island and the US Enviromental Protection Agency. I taught with Patrick Fuller and Jeff Hollister, who was also the organizer. The local helpers were Betty Kreakie and Bryan Milstead. The bootcamp was R-based and lasted two and a half days. The people at the Coastal Institute are using this bootcamp as the starting point for a class on "Computing for Natural Resources" during the next semester, so the extra half-day was used to talk about this coming class and to add a lesson on data visualization using R. Besides the standard syllabus, we extended a bit the lessons on R, added a short lesson on databases, and made a longer than usual session on testing. Read More ›

Feedback from the First Cambridge R Bootcamp
Aleksandra Pawlik / 2014-01-18
The first R Software Carpentry bootcamp in Cambridge took place on January 7 and 8, 2014. Hosted by the Centre for Mathematical Sciences, the bootcamp was organised by two SSI Fellows, Stephen Eglen and Laurent Gatto. There were 24 participants and a long waiting list of those who wanted to learn R, version control and make. Read More ›

Introducing the Image Novice Module
Michael Hansen / 2014-01-16
The image novice module provides a simple image manipulation interface for beginners that allows for easy loading, manipulating, and saving of image files. It has been developed collaboratively with Greg Wilson, Tony Yu, and Stephan van der Walt. Below, I show how to use the module for some basic image manipulation. Read More ›

Publishing on the Web
Greg Wilson / 2014-01-15
Back in December, I asked a question on the Software Carpentry issue track: What should we teach about writing/publishing papers in a webby world? It led to a lively discussion about how the web is changing scientific publishing, the tools people can use to take advantage of those changes, and the gap between what's possible and what publishers (and senior academics) will actually accept. Many commenters dove immediately into the details of specific tools, but Justin Kitzes nailed it when he said: Read More ›

From 0 to 1 to 10
Greg Wilson / 2014-01-15
Andromeda Yelton has written a great blog post about her experiences teaching a bootcamp for librarians in Toronto earlier this week. It includes this insightful observation: Read More ›

Code Review, Round 2
Greg Wilson / 2014-01-14
Short version: we will launch a second pilot study of code review for scientific software in February 2014, in which experienced mentors will teach small groups of scientists how and when to do reviews. We are currently looking for both mentors and groups to take part. Read More ›

It's Not Just the Tools that Differ Between the Two Cultures
Paul Wilson / 2014-01-10
In the conversation that followed Philip Guo's blog post on the two cultures of computing, most on focused comparing the tools used by those two cultures, as discussed by Philip to create a contract. Some even wondered if Software Carpentry instructors should perhaps become more versed in MS Excel to meet potential students where they are. In so doing, we lost the fact that even with the same tools, the two cultures would likely use them differently. It's not about the tools, but about the users. Read More ›

Job Opportunity at University College London
Amy Brown / 2014-01-09
The University College London Research Software Development Team is recruiting a research software developer. Their ideal candidate will have experience creating and maintaining research software in one or more fields, and will "design, extend, refactor and maintain scientific software accross all subject areas". Sound like you? Have a look at James Hetherington's post about the position. Good luck! Read More ›

Test-Driven Development in Scientific Computing
Jeff Carver / 2014-01-07
Aziz Nanthaamornphong and Dr. Jeffrey Carver at the University of Alabama are conducting a survey of software development practices among computational scientists. This survey seeks to understand test-driven development in computational science and engineering domains, and should take approximately 15 minutes to complete. The survey can be found at: http://goo.gl/9T2BIq This survey study has been approved by the University of Alabama Institutional Review Board (IRB); for questions contact Jeffrey Carver. Read More ›

Mental Models and Vicious Circles
Greg Wilson / 2014-01-07
A few days ago, Konrad Hinsen asked this on our discussion list: Is anyone aware of teaching methods that aim at either developing or verifying the students' mental model of some non-trivial procedure? If so, have they been used for teaching the use of computers? Read More ›

'Best Practices' Has Been Published
Greg Wilson / 2014-01-07
I'm very pleased to announce that our paper Best Practices for Scientific Computing has been published in PLOS Biology. It presents 24 specific things scientists can do to get more done in less time with less pain: Read More ›

We Need More of These
Greg Wilson / 2014-01-07
Michael Nygard's Release It! is one of my favorite books about the practical side of computing—I gave it five stars when it first came out, and while it's too advanced for most of the scientists we want to teach, there's something from it that I'd like to share: the diagram titled "Interaction of Patterns and Antipatterns" from page 43: Read More ›

Introducing Arliss Collins
Amy Brown / 2014-01-06
The incredible expansion of Software Carpentry over the last few months has been exciting and a little overwhelming, so we're pleased to introduce the newest member of the Mozilla Science Lab team, Arliss Collins. Arliss's background is in engineering and project management, and she'll be working with me, Kaitlin and Greg on bootcamp management, infrastructure development and maintenance, and communications. Read More ›

Tools, Conversations, and Cultures
Greg Wilson / 2013-12-31
Philip Guo is best known these days for The Ph.D. Grind, but I first met him through his Online Python Tutor. He helped teach a bootcamp for librarians last August, and recently wrote a blog post about the two cultures of computing that he encountered there. On one side he sees users who treat software as a tool for getting things done; on the other, he sees programmers who hold conversations with their software. The former use stand-alone GUIs backed by binary file formats (like Word and Excel), while the latter use command-line interfaces and text (like the shell, LaTeX, and Python). He writes: Read More ›

Our Store Is Open
Greg Wilson / 2013-12-31
Show your Software Carpentry pride with a shirt, sticker, button, or coffee mug from our CafePress store. And Happy Hogmanay: may 2014 be kind to you and yours. Read More ›

Catch and Hold
Greg Wilson / 2013-12-27
I'm a big fan of Mark Guzdial's work on computing education. Last week, he tweeted this: Google is made of people who succeed in current CS teaching model. Hard for them to realize that it's wrong for most people. #GoogleCSFirst Read More ›

Oxford, One Year On
Greg Wilson / 2013-12-25
Philip Fowler recently posted a nice study of the impact last year's Software Carpentry bootcamp at Oxford had on its participants. The key finding for me is that 10 of the 13 respondents to his survey agreed or strongly agreed with the statement, "A year on, the workshop really encouraged me to change how I do my research." If you'd like to help us gather similar information from participants at other bootcamps, please get in touch. Read More ›

So How Is Instructor Training Going?
Greg Wilson / 2013-12-19
We recently wrapped up the latest round of instructor training, which makes this a good time to look at how well the program is doing. Here are stats on the first six rounds: Read More ›

Andromeda's Advice
Greg Wilson / 2013-12-19
We're very pleased to announce that Andromeda Yelton will be coming to Toronto in mid-January to help teach a bootcamp for librarians. Her advice on how to do this is online, along with her reflections on what she's learned herself. There's lots of good stuff in both, and we're looking forward to lots of new ideas. Read More ›

Code as a Research Object
Kaitlin Thaney / 2013-12-11
The Mozilla Science Lab's newest project extends our existing work around code as a research object, exploring how we can better integrate code and scientific software into the scholarly workflow. The project will build and test a bridge to allow users to push code from their GitHub repository to figshare, providing a Digital Object Identifier for the code. We will also be working on a best practice standard (like MIAME for code) so that each research object has sufficient documentation to make it possible to meaningfully use. For more information, please see the full post on the MSL website. And please join us for our December 12 community call, where we'll be talking with Ed Lazowska about the newly-announced data science centers, and with Arfon Smith and Mark Hahnel about this project. Read More ›

There Ought to Be a Badge
Greg Wilson / 2013-12-10
I feel like there should be a Software Carpentry badge for drawing one of these for someone for the first time. — Amy Brown on Twitter Read More ›

Release 2013.11
Aron Ahmadia / 2013-12-10
We are pleased to announce the release of v2013.11 of our lesson materials. This release represents nearly 2 years of collaborative lesson development from 54 contributors, and features lessons for using the Unix shell to automate repetitive tasks, modular programming in Python and R, personal and collaborative version control with Git, unit testing, and working with databases. Contributors include: Read More ›

News from the SSI
Greg Wilson / 2013-12-09
Two announcements from the Software Sustainability Institute caught our eye this week: Their next Collaborations Workshop will be held on March 26-27, 2014, in Oxford. This annual gathering brings together researchers, software developers, managers, funders and more to explore important ideas in software and research and to plant the seeds of interdisciplinary collaborations. Sixteen new SSI Fellows have been selected: congratulations to them all! Read More ›

Mozilla Science Lab Community Call for December 2013
Kaitlin Thaney / 2013-12-09
The last community call of the season for the Mozilla Science Lab will take place this Thursday, December 12. The call is open to the public and will start at 11 am ET. Call in details can be found on the call etherpad and on the wiki. Read More ›

Two to the Fifth New Instructors
Greg Wilson / 2013-12-05
We are very pleased to welcome 32 new instructors from six different countries to the Software Carpentry team: we look forward to working with them all in 2014 and beyond. Joshua Ainsley Camille Avestruz Philipp Bayer Nichole Bennett Cliburn Chan Emily Davenport Neal Davis Gabriel A. Devenyi Jonah Duckles Jordan Fish Julian Garcia Molly Gibson Ivan Gonzalez Joshua Herr James Hetherington Chris Holdgraf Damien Irving Ted Kirkpatrick Christina Koch Igor Kozlov Luke Lee Matthew Lightman Yuxi Luo David Perez-Suarez Bill Rowell Martin Schilling Raniere Silva Rachel Slaybaugh Shoaib Sufi Gayathri Swaminathan Amanda Whitlock April Wright Read More ›

Advanced Python for Biologists
Martin Jones / 2013-12-05
When I was writing Python for Biologists (see my previous guest post) I was extremely ruthless about leaving material out. I wanted the book to cover pretty much the same set of language features that I teach in my instructor-led introductory programming courses, which meant I had to limit the scope of the book to what an average, motivated beginner could take in in a full-time week of study. This restriction meant that, while I think the overall balance of material in Python for Biologists is still the most useful set for novice programmers, there are big topics that didn't make the cut. There's no mention of object-oriented or functional programming, nothing about Python's extremely elegant comprehension syntax, and nothing on recursion. As I was writing, I made a list of all these omissions, with the promise to myself that I would write about them some day. Read More ›

Feedback from Edinburgh
Mike Jackson / 2013-12-05
On the 3rd of December, Software Carpentry returned to Edinburgh with EPCC hosting a bootcamp as part of its involvement in both the PRACE Advanced Training Centre and The Software Sustainability Institute. Read More ›

Software and Research Session at AGU 2013
Aleksandra Pawlik / 2013-12-02
A Town Hall Session on "Software and Research" proposed by four Software Sustainability Institute Fellows will be held at the American Geophysical Union (AGU) Fall Meeting in San Francisco. If you are around, come and join the session on Thursday, 12 December 2013, at 12:30 PM in Moscone West Room 2004. Read More ›

WiSE Bootcamp at Lawrence Berkeley National Laboratory
Greg Wilson / 2013-11-29
We are very pleased to announce that our second bootcamp for women in science and engineering will be held at Lawrence Berkeley National Laboratory on April 14-15, 2014. This event is a sequel to our first WiSE bootcamp in Boston this past June, and we hope to once again fill three rooms with over 120 learners. If you are interested, please mark the dates in your calendar—registration will open in January. Read More ›

DiRAC Driving Test Comes to Edinburgh
Nick Brown / 2013-11-28
My colleague Mike Jackson recently posted about the DiRAC driving test. DiRAC is the UK's integrated supercomputing facility for theoretical modelling and HPC-based research in particle physics, astronomy and cosmology and is used by numerous researchers with diverse backgrounds. Whilst much of their work is very different, one commonality is that it often requires in-depth technical and software engineering techniques. The idea of the driving test was therefore to ensure that all users have the required knowledge for effective use of the consortium's machines. Read More ›

Things I Wish Someone Had Told Me About Scientific Computing
Lynne Williams / 2013-11-26
When I set out on my path into neuroscience, the idea of scientific computing was not even something that came up in passing. However, with little warning, I started spending my days at the computer writing code to analyse brain data. Cleaning neuroimaging data is a multi-step process and I wanted to bring out the most from the data. To do that, I thought, would require using the best bits from a slew of neuroimaging preprocessing tools. Some were better at removing extraneous noise, while others were better at image morphing. But entering the commands one by one was tedious and error-prone and the formats for the different functions were not always compatible. There had to be a better way. So, writing some kind of code became my only option. Read More ›

Registration Now Open for Instructor Training Course
Greg Wilson / 2013-11-25
As announced last week, we are offering a live version of our instructor training course on April 28-30, 2014, at the Mozilla office in Toronto. Registration is now open, and costs only $80 (though participants must cover their own travel and accommodation). Over the course of three days, 36 participants will be introduced to ideas from educational psychology and instructional design, and shown how to use them to teach programming to scientists (and everyone else). It will be the ninth offering of the course, but the first done live, and we hope to see many of you there. Read More ›

Centre for Doctoral Training at Southampton
Greg Wilson / 2013-11-24
Hot on the heels of the announcement of three new data science centers in the US comes this: £350 million for PhD training centers in the United Kingdom. One of them, at the University of Southampton, will focus on next-generational computational modelling, and we're looking forward to working with Hans Fangohr, Ian Hawke, Seth Bullock, and everyone else involved. Congratulations! Read More ›

The Art of Cold Calling (Updated)
/ 2013-11-21
This is an update to an earlier post about how to approach potential bootcamp hosts. Newly-trained instructors have asked how we go about approaching potential workshop hosts. The short answer is, however we can. The longer answer is, we collect names from journal articles, our Twitter followers, people we bump into at conference, or (increasingly) people who've been through bootcamps, then send emails like the one below. But there's an art to it: Read More ›

What to Say at a Bootcamp, After It's All Said and Done
Damien Irving / 2013-11-19
I'm going to be teaching my first ever bootcamp in the coming days. My biggest fear (and there are many!) is that the participants will walk out of the room on the final afternoon and go straight back to their old habits, never taking the time to incorporate what they've learned into their daily workflow. In an attempt to avoid this eventuality, I've planned a rousing concluding address to explain why the content taught at a Software Carpentry bootcamp is so important. It goes something like this... Read More ›

How Software Carpentry Helped Me Write a Paper
Joshua Ryan Smith / 2013-11-19
A paper I wrote titled Increasing the Efficiency of a Thermionic Engine Using a Negative Electron Affinity Collector was recently published in the Journal of Applied Physics. I found that applying some of the best practices advocated by the Software Carpentry project eliminated much of the drudgery and tedium which leads to manually induced error in developing the software and composing the manuscript. As a result, I was able to focus more on the scientific problems instead of on tedious bookkeeping-type issues. Read More ›

Thanks from Woods Hole
Greg Wilson / 2013-11-17
We ran a bootcamp last week at Woods Hole Oceanographic Institute. It seems to have gone well: I just want to say that Will and Ross are rock stars. They may have started something big here in Woods Hole, a sort of revolution. They have taught us how to step back from our machines, think about what we are doing, and join forces with our neighbors, We will return to our respective projects on Monday morning in a more efficient and collaborative way. Thank you Software Carpentry. — James Manning Read More ›

Moving Forward with Assessment: Interviews
Jory Schossau / 2013-11-17
We would like to quantify realistic impact after the dust settles from a bootcamp. In particular, we'd like to know how (or whether) Software Carpentry has improved scientists' workflow three months or more after the bootcamp, on the theory that if people are still using things then, they must have found them useful. Our most recent steps toward this goal include obtaining Internal Review Board for Human Subjects Research (IRB) approval through Michigan State University and beginning to recruit volunteers for interviewing at the three month mark. Read More ›

Workshop at SIGCSE 2014
Greg Wilson / 2013-11-16
We are very pleased to announce that Fernando Pérez, Peter Norvig, and Greg Wilson will be running a three-hour workshop on teaching with the IPython Notebook at SIGCSE 2014 in Atlanta on March 5, 2014. We'll assemble notes in this GitHub repository, and we look forward to lots of stimulating discussion. Read More ›

Instructor Training in Three Days
Greg Wilson / 2013-11-16
Update: see this post for information on registration. We are pleased to announce that we will be running a live version of our online instructor training course on April 28-30, 2014, at the Mozilla office in Toronto. Over those three days, 30 participants will be introduced to ideas from educational psychology and instructional design, and shown how to use them to teach programming to scientists (and others). The course will be free, though we will require a deposit to discourage no-shows and attendees will need to cover their own travel and accommodation costs. Registration will start in January—please keep an eye on this blog for news. Read More ›

Creating a Forum
Greg Wilson / 2013-11-16
Every few months, someone asks why we don't have some kind of forum or bulletin board for Software Carpentry. The answer is that we've tried setting one up, but there's never been enough traffic to keep it going. We're a lot larger now than we used to be, though, so we decided at this week's lab meeting to try again starting in January. Half a dozen people have volunteered to be stewards, and we've opened an issue on GitHub to discuss the features people want, experiences with other forums, and so on. Please add comments to that issue to let us know what you think. Read More ›

Citing Us In Your CV
Greg Wilson / 2013-11-16
People volunteer to teach Software Carpentry bootcamps for many reasons. We're always grateful that they do, and would like them to get all the credit they deserve. Here are some of the ways instructors currently mention us in their CVs: Read More ›

Women in Tech Workshop at PyData NYC
Greg Wilson / 2013-11-13
Software Carpentry has always focused on scientists at the graduate level and above, but getting people into the pipeline is even more important as getting them through it. Last week, thanks to backing from NumFOCUS and others, a group of volunteers helped do just that by running Python workshops for high school girls in conjunction with PyData NYC. Judging from Julia Evans' writeup, it was a great success. We look forward to seeing some of their alumni in our classes some day... Read More ›

Data Science Centers at UCB, UW, and NYU
Greg Wilson / 2013-11-13
Yesterday, the Gordon and Betty Moore Foundation and the Alfred P. Sloan Foundation (who are Software Carpentry's main backer) announced $37.8M to support new data science centers at UC Berkeley, the University of Washington, and New York University. As UCB's Fernando Perez says in his blog post, it's taken a lot of work to get this off the ground, but the impact will be enormous. We hope to help. Drew Conway's Venn Diagram shows "Hacking Skills" as one of the three principal components of data science: Read More ›

Report on the PLOS/Mozilla Code Review Pilot
Greg Wilson / 2013-11-12
Marian Petre and I have written a report on the pilot study of scientific code review done jointly by PLOS and Mozilla; you can read this summary on the Mozilla Science Lab blog, or grab the full report from arXiv. Our thanks to everyone who participated; we hope to be able to do a longer and more in-depth pilot early in 2014. Read More ›

Our First Bootcamp in Adelaide
Philipp Bayer / 2013-11-08
We ran our first bootcamp at the University of Adelaide from the 24th to the 26th of September 2013. Teachers and helpers came from all over the place: Ben Morris from the University of North Carolina acted as Principal Instructor, along with Jerico Revote from Monash University, Diego Barneche from Macquarie University, Philipp Bayer from the University of Queensland, Nicholas Crouch from Flinders University, and Nathan Watson-Haigh from the Australian Centre for Plant Functional Genomics (ACPFG), who also acted as the main organizer (thanks for that!). Read More ›

Software Carpentry's Scope
Greg Wilson / 2013-11-02
To help people figure out what's in and what out as we're reorganizing our material, I've drafted an explicit description of our scope inspired in part by AstroPy's idea of an "affiliated" package, which is, "...an astronomy-related Python package that is not part of the astropy core package, but has requested to be included as part of the Astropy project's community." Domain-specific tools for machine learning and computational fluid dynamics aren't built into LAPACK or NumPy: they're stored in separate repositories and released in their own packages. Similarly, domain-specific lessons don't belong in Software Carpentry. Instead, they should build on our lessons, just as computer vision libraries build on matrix and image libraries. Our examples should be drawn from biology, physics, economics, and similar disciplines in order to appeal to our audience, but everything we include must be accessible to anyone who's done freshman science. Read More ›

Reorganizing
Greg Wilson / 2013-11-02
TL;DR: We're going to: split our lesson materials for absolute beginners and people with some previous experience; build some tools to manage those materials; update and clarify the guidelines for instructor training, Software Carpentry's scope, making contributions, and use of our name and logo; and use GitHub issues and blog posts to manage discussion of details rather than our mailing list. Read More ›

November 2013 Lab Meeting
Greg Wilson / 2013-11-02
The first Mozilla Science Lab meeting is happening at 11:00 am Eastern time on Thursday, November 14; its agenda and connection details are on this Etherpad, and you're all invited. Our monthly lab meeting will start immediately after the MSL meeting (hopefully around noon Eastern time, but it depends how long the MSL meeting runs). As always, we'll run our meeting again at 7:00 pm on the same day to accommodate different time zones, and our agenda is on this Etherpad. We look forward to seeing you then! Read More ›

You Keep Using That Word
Greg Wilson / 2013-10-17
The first Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE) is going to be held in conjunction with Supercomputing 2013 on Sunday, November 17. After reading the papers on the workshop site, my reaction was: Read More ›

The State of Open Science
Greg Wilson / 2013-10-17
As part of the biannual Mozilla Summit a couple of weeks ago, John Jensen presented a report on the health of the open web. The details were fascinating, and are summed up in a graphic showing how the various layers of openness that we all depend on are doing. I've reproduced it on the left, and summarized my view of how open science is doing by those same measures on the right: Open Web User Choice& Control SocialActivity EconomicActivity Trust Diversity of Services Innovation Content Freedom Interoperability Access Open Science User Choice& Control SocialActivity EconomicActivity Trust Diversity of Services Innovation Content Freedom Interoperability Access Read More ›

Python for Biologists
Martin Jones / 2013-10-14
Programmers, at least as found in scientific research institutions, generally fall into two groups. There are people with a computer science background, for whom programming is a way of Thinking About The World, and people with a scientific background, for whom programming is a way of Getting Things Done. Read More ›

Curriculum Design
Greg Wilson / 2013-10-14
I spoke with three different people about curriculum design last week, so this seems like a good time to summarize what I know about doing that properly, how we've actually been doing it for Software Carpentry, why the two are different, and what we hope to do in future. To set the stage, I need to talk about the medical program at McMaster University in Ontario. Read More ›

Enrolment Figures (Fall 2013)
Greg Wilson / 2013-10-09
Our total enrolment (or enrollment, if you're American) continues to grow steadily: if we stay on the curve below, we'll have our four thousandth learner by the end of this year. Read More ›

A new UK administrator for Software Carpentry
Mike Jackson / 2013-10-08
We are pleased to welcome Luke Tweddle of EPCC, The University of Edinburgh, to Software Carpentry. Luke is taking over as UK administrator from Mike Jackson on behalf of The Software Sustainability Institute. In his role, Luke will help UK researchers organise bootcamps for their research groups, institutions and communities; help the local organisers of bootcamps create and customize content; attract instructors and helpers; manage registration; advise on publicity; and provide support in all aspects of organising a bootcamp. If you'd like help organising a bootcamp, or if you're interested in becoming an instructor or helper, please get in touch. Read More ›

October 2013 Lab Meeting
Greg Wilson / 2013-10-04
25 people attended our monthly (ish) online lab meeting yesterday, during which we announced and discussed a wide variety of topics. The detailed summary is below the fold, but the key action items are: Please take a moment to read Kaitlin Thaney's post about what's going on with the Mozilla Science Lab. We hope to announce some new projects at MozFest in London in a couple of weeks, and more shortly after that. There's been a flurry of pull requests and comments on lesson material in the past couple of weeks, which is great news for our sustainability. To keep the momentum going, please either send at least one pull request on the teaching materials, or make at least one comment on someone else's pull request, by Monday Oct 14. In aid of this, please get in touch if you're willing to spend a bit of time in the next couple of weeks helping instructors-in-training learn a bit more about GitHub and pull requests. Finally, please each introduce us to one new potential bootcamp host: we have a bunch of new instructors, so we're now able to take on more load. Read More ›

Our Biggest Bootcamp Ever at PyCon 2014
Greg Wilson / 2013-10-04
Thanks to the generous support of the Python Software Foundation, we will be running our biggest bootcamp ever in conjunction with PyCon 2014 in Montreal on April 14-15, 2014. We will have two rooms with seating for 50 people and two more with seating for 80, so our plan is to run three regular bootcamps for people of varying skill levels, and one more "master class" for scientists who are ready to learn more advanced topics. Read More ›

Steven Koenig: What I've Learned
Steven Koenig / 2013-10-01
I have been a PhD student at Technische Universität München, Germany since May 2012. My research interest is biopolymers: how to produce them, and what to do with them. Where computers come into play? Literally everywhere. And what I need Software Carpentry for? Literally everything. Read More ›

Bootcamp Student Composition by Domain
Anthony Scopatz / 2013-09-30
Recently I was challenged to show that computational science is important to biologists, and, more importantly, that biologists care about learning how to develop quality software. I am not a biologist. While I find the subject fascinating I cannot make any broad claims on the topic or hope to speak for all biologists. So I turned to Software Carpentry. My experience teaching Software Carpentry and related bootcamps has been that biologists are always a significant portion of the student population. In order to demonstrate this I asked to have access to anonymized answers to part of the pre-assessment survey. Specifically, I was interested in the "Which of the following most closely aligns with your primary discipline?" question. With a big assist from Amy Brown I obtained the data I was looking for. Read More ›

The Future: Today
Greg Wilson / 2013-09-27
Two papers appeared on my radar this week that give a taste of the kind of science we want everyone to be able to do in five years' time. The first, by Aide et al, is "Real-time bioacoustics monitoring and automated species identification"; the second, by Omberg et al, is "Enabling transparent and collaborative computational analysis of 12 tumor types within The Cancer Genome Atlas". They're both excellent pieces of 21st Century science, and they caught my eye because: They're open access: I was able to read them on the off chance that they'd be interesting. They show how a few scientists with serious software skills can accelerate the research of hundreds of others, provided those others have a few skills themselves. Read More ›

Code and Data for the Social Sciences
Greg Wilson / 2013-09-26
Matthew Gentzkow and Jesse Shapiro have written an excellent guide "Code and Data in the Social Sciences". It's short (only 38 pages), very readable, and full of practical advice for scientists of all stripes: Read More ›

Michael Hansen: What I've Learned
Michael Hansen / 2013-09-25
My first exposure to Software Carpentry was during a bootcamp at Indiana University in March of 2012. I volunteered to be a helper, and found myself really enjoying helping motivated students learn how to incorporate the material into their own research. I've since volunteered as an instructor at a handful of other bootcamps (including one at IU again this past summer). In this post, I'll briefly discuss some "lessons learned" from Software Carpentry. Read More ›

SSI Fellowship Programme 2014
Aleksandra Pawlik / 2013-09-24
The Software Sustainability Institute is offering Fellowships with £3000 funding for travel, collaboration and running events, including Software Carpentry bootcamps. The Fellowship Programme 2014 recognises outstanding UK-based researchers who use and develop software. Online applications are open until September 27th, 2013 at 5:00 PM BST. Read More ›

How Much Testing Is Enough?
Greg Wilson / 2013-09-24
When we teach testing, we are sometimes asked, "How much is enough?" The short answer is, "It depends." To help construct a longer answer, we'd like to find out how much testing scientists do in other contexts. For example, the Cosmic Origins Spectrograph (COS) and the Space Telescope Imaging Spectrograph (STIS) are two of the instruments on the Hubble Space Telescope. During Cycle 21 (which starts this month, and runs for about a year), they will spend roughly 25% and 67% of their time respectively on calibration: Read More ›

Sarah Supp: What I've Learned
Sarah Supp / 2013-09-21
I have to admit, that when I first decided I was going to learn programming skills, it was for academic survival. I was decidedly not excited about taking a course. As a graduate student, I was lucky that Utah State University had a Programming for Biologists class, taught by Ethan White. Most universities still lack the infrastructure to teach these skills to scientists, outside of signing up for courses in the computer science department. Within the first week, I learned that programming is really like playing a series of logic games, and that it could actually be quite fun. Aside from practical skills, the most important thing I learned was not to be intimidated by computational problems. Read More ›

Lex Nederbragt: What I've Learned
Lex Nederbragt / 2013-09-21
I am a biologist with no formal training in Computational Science. A couple of years ago, the increasing size of my data forced me to stop using Excel, and switch to the Unix command line and scripting for my analyses. In the process, I learned what I needed by mostly by just doing it, but also from books, websites, and of course google. Almost exactly one year ago, I attended a Software Carpentry bootcamp. I had heard about Software Carpentry and its bootcamps through twitter, started following their blog and became convinced that this was something I wanted to attend. At some point, I fired off an email to Software Carpentry asking what it would take to have a bootcamp at our university, the University of Oslo in Norway. The answer came down to 'get us a room and advertise the event, and we'll provide teachers'. This, in fact, was what happened, and as teachers, we got Mr Software Carpentry himself, Greg Wilson, who taught together with a local teacher (Hans Petter Langtangen). Read More ›

Software Skills and Hummingbird Diversity
Sarah Supp / 2013-09-21
I am an ecologist. Traditionally, many ecologists spend part of the year outdoors, in the field, to collect data, and then spend the rest of the year analyzing that data and writing papers. With new sensors that passively collect audio, weather, or geolocation data, even a single field season can yield massive amounts of data. Add to that the pressing need for more long-term data (aggregating multiple years) and the problem of managing and analyzing the data becomes increasingly difficult for scientists without basic computational skill sets or strong computational collaborators. Read More ›

Scientific Computing at AIMS
Anthony Scopatz / 2013-09-15
Close on the heels of another bootcamp in Madison, WI, for the past two weeks I had the pleasure of teaching a 30 hour scientific computing course at the African Institute for Mathematical Sciences (AIMS) in Cape Town, South Africa. The course was an expansion and refactor of Software Carpentry material to fit the AIMS need. You can find the repository of source material here. I could gush about how I fell in love with Cape Town, the unexpected preponderance of vegan cafes in Muizenberg, the sea, the sand, the mountains, and la dolce vita I am otherwise used to from Santa Barbara. I will not. AIMS is about the students. Let's examine them after a brief tour of AIMS itself. Read More ›

Konrad Hinsen: What I've Learned
Konrad Hinsen / 2013-09-11
All of my research has been based on computation, starting with my Master's thesis (1989) on the dynamics of colloidal suspensions. Back then, computational physics concentrated on simple systems, and scientists usually wrote their own simple software from scratch. When I started to work on biomolecular simulations five years later, I discovered the sad state of non-trivial scientific software, written by generations of PhD students and postdocs who learned Fortran "on the job" and had the prime goal of getting their research project done with a minimum of programming effort. Read More ›

Diego Barneche: What I've Learned
Diego Barneche / 2013-09-09
I have been a PhD student @ Macquarie University since late 2011. I'm broadly interested in global patterns of biological diversity, macroevolution, statistical and mathematical modelling as well as programming for science (mostly R based but hoping to get into Python very soon). At present I'm focused on quantitative niche-based approaches that can better bridge macroecology and macroevolution, mainly through theoretical approaches such as the Metabolic Theory of Ecology and stochastic population dynamics. I've recently become an instructor for the Software Carpentry group which taught me how to combine teaching, which is an old passion, and improving my programming skills and writing better scientific code, which is something newer. Read More ›

Teaching Librarians at Harvard
Greg Wilson / 2013-09-02
Last weekend, we ran a bootcamp at the Harvard-Smithsonian Center for Astronomy aimed primarily at librarians. It was a bit of a new venture for us, and there were some hiccups (some things were taught too fast, some instructors used too much jargon), but overall it seems to have gone well, and we hope to do more like this soon. Gerry Walden: Bootcamp...for Librarians! Philip Guo: Teaching Librarians Programming Matthew Rutley: Teaching Python at Harvard with Software Carpentry Read More ›

Share Your Code With the Molecular Ecologist Blog
Kimberly Gilbert / 2013-09-02
Almost everyone has written or needed code to accomplish a task. Whether it is to automate a process or create the perfect figure, these scripts are useful, yet do not always merit publication. However, sharing these more widely would save the rest of the community from repeating the effort. To help get all this code into the public sphere, the Molecular Ecologist blog is starting a new series of posts: anyone can send in a piece describing their code, and we'll put it up on the blog. The code itself will be placed on our GitHub page for easier access. Submissions can be sent to molecularecologist@gmail.com and viewed at https://github.com/TheMolecularEcologist. Or if you are already on GitHub, join our organization to submit and maintain your code in that manner. More detailed information can be found on the blog itself. Read More ›

Our Plan for the Science Lab
Greg Wilson / 2013-09-02
Kaitlin Thaney, the recently-appointed Director of the Mozilla Science Lab, recently posted some thoughts about the lab's mission and plans. Long story short, its focus is to improve communication and interoperability in open science—paraphrasing William Gibson, the future is already here, but the pieces don't play nicely with each other yet, so the lab is going to help make and strengthen social and technical connections. If you have thoughts on how to accelerate this, we'd like to hear from you. Read More ›

Introducing Citation Files
Greg Wilson / 2013-09-02
Robin Wilson, of the University of Southampton, recently posted a note on the Software Sustainability Institute's blog about CITATION files. In brief, he (and we) would like to encourage scientific programmers to put a plain text file called CITATION in the root directory of each project, and to use it to tell readers how best to cite that software. The example Robin gives is: Read More ›

Open Access Button Hackathon is Sept 7-8
Greg Wilson / 2013-08-30
Millions of people a day are denied access to the research they both need and paid for because of paywalls. It doesn't have to be like this, but we need your help to change it. We're making a tool to help change the system called the Open Access Button. The button is a browser-based tool which tracks every time someone is denied access to a paper. We then display this, along with the person's location, profession, and story on a real time, worldwide, interactive map of the problem. It gets better though. While creating pressure to open up scholarly and scientific research, we help people work within the current broken system by helping them get access to the paper they need. We started building a prototype at the BMJ Hack Weekend, and came third place. But we're not finished yet and our launch is coming up fast! To help build it we're hosting a hackathon on September 7-8 of September in London. If you're a developer, have an eye for design or both we'd love to see you. If you're not in the UK, you can join in from anywhere in the world—just sign up. And if you want any more information about the project, email us at oabutton@gmail.com or read more about the project. Read More ›

Jonathan Dursi Joins Compute Canada
Greg Wilson / 2013-08-30
Jonathan Dursi, a long-time supporter of Software Carpentry and one of our first instructors, has been named Interim Chief Technology Officer for Compute Canada. Congratulations! Read More ›

Two Cheers for GitHub
Greg Wilson / 2013-08-23
There's been a micro-flurry of excitement in the scientific world about GitHub's recent announcement that they will now render tabular text files (i.e., CSV and TSV) on their site. Coming on the heels of their support for GeoJSON, this is still more evidence that they're serious about becoming a platform for working with data. So why only two cheers? Because they're going about it in the wrong way, and as a consequence, they'll deliver a lot less value than they could, a lot later than they could. Read More ›

What Makes Good Code Good at INTECOL13
Mike Jackson / 2013-08-23
On 21st August, I attended INTECOL13 in London's Docklands. The 11th International Congress of Ecology, hosted by the British Ecological Society and INTECOL, is one of the world's premier conference for ecologists. At the invitation of Matthew Smith from the BES Computational Ecology Specialist Interest Group and Microsoft Research, and Greg Wilson from Software Carpentry, I ran the latest instalment of The Software Sustainability Institute's "what makes good code good" discussions... Read More ›

Instructor Training Statistics
Greg Wilson / 2013-08-23
We started running an online training course for Software Carpentry instructors last fall. It's been going pretty well: 72 people signed up for the four rounds we ran between September 2012 and March 2013; 41 of them completed. 39 enrolled in the fifth round that started in May 2013; we expect one third to complete it over the next couple of weeks. 23 enrolled in the sixth round that started this month. An average completion rate of 40% is pretty low compared to most university courses, but it compares favorably with the 5-10% completion rates of most MOOCs, and 9 of the 29 who dropped out of the first four rounds re-enrolled in a later round. Almost everyone who tells us why they dropped out blames real life getting in the way, but there's undoubtedly sampling bias in that: some people just disappear and stop responding to email. The curriculum is now pretty stable (though we'll continue to tweak it based on feedback from learners and lab meetings), and feedback from people who've gone through it has mostly been positive. If you'd like to take part in a future round, please get in touch. Read More ›

August 2013 Lab Meeting
Greg Wilson / 2013-08-23
We held another of our semi-regular online lab meetings yesterday, complete with a theme song. We ran the meeting twice to accommodate different time zones; those present were: Read More ›

Video Interviews from SESYNC Workshop
Greg Wilson / 2013-08-22
The folks at SESYNC interviewed Titus Brown, Kaitlin Thaney, and Greg Wilson while they were in town for a workshop a few weeks ago, and the video is now up on their web site. It's a nice, short summary of the problem we're trying to solve and the approaches we're taking. Read More ›

Creating Assessments and What to do With the Data
Jory Schossau / 2013-08-20
Our new assessment questionnaires are up for your review, so I thought I'd talk a little bit about the work behind the scenes, and some lessons learned. Over the past couple months I've had the opportunity to interview bootcamp participants who attended the Michigan State University Software Carpentry Bootcamp a little over one year ago, May of 2012. The goal for interviews were two-fold: develop assessment questionnaires, and gather data. Read More ›

Bootcamp Questionnaires
Greg Wilson / 2013-08-19
Thanks to some hard work by Jory Schossau and others, we have developed three standard questionnaires for bootcamp participants: Pre-assessment for Learners Post-assessment for Learners Post-assessment for Instructors We hope to deploy these questionnaires starting in September, so we would be very grateful for feedback. Do you understand what we're asking? Do the responses provided include the answer you would give? As an instructor, would you find it useful to know these things about your learners before a bootcamp started? And what have we missed entirely? Read More ›

August 2013 Lab Meeting
Greg Wilson / 2013-08-19
We will have an online lab meeting on Thursday, August 22, 2013 to discuss changes to the website, plans for the coming few months, and common questions that really ought to be answered in the FAQ or somewhere else. Everyone is welcome to join in; to do so, head over to this Etherpad at either 11:00 or 19:00 Toronto time, and hop on the conference call described there as well. (We're holding the same meeting twice to accommodate different time zones; please try to sign in a few minutes before the top of the hour so that we can start more or less on time. Please also check the World Clock Meeting Planner to see when the times listed are for you.) Read More ›

Our New Operations Guide
Greg Wilson / 2013-08-18
Amy Brown, our indefatigable administrator, has put together some checklists for people involved in organizing and running bootcamps: Read More ›

Summary of Host Survey
Greg Wilson / 2013-08-14
Earlier this summer, we asked people who had hosted Software Carpentry bootcamps to give us some retrospective feedback. Their answers are summarized below, and we'll be adjusting our operations in response. How much time did you put into organizing your bootcamp? Answers were split evenly between "a day or two" and "several days". What was the total cost? With the exception of our Australian double-header (where travel costs were clearly an outlier), the answers were in the $700-3000 range. Would you like to host another one? 22 said "yes", 3 said "no", and 6 can't decide yet. How do you feel about asking bootcamps to contribute $1500 toward central overheads? I support this approach12 I support this approach, but not at this price5 I am neutral7 I am opposed to this approach3 What level and format would you prefer? Complete beginners / 2 days7 Complete beginners / 3 days5 Intermediates / 2 days13 Intermediates / 3 days2 Other3 Read More ›

What We Cover in Instructor Training
Greg Wilson / 2013-08-13
The sixth round of our online training class for Software Carpentry instructors is kicking off this week with 28 participants. Our main text is How Learning Works, an up-to-date summary of research results in educational psychology and instructional design, along with the chapter Mark Guzdial wrote for Making Software explaining why teaching programming is hard. Over the next 14 weeks, participants will work through both, and do the following: Read More ›

Report on the Indiana Bootcamp
Greg Wilson / 2013-07-26
Mike Hansen, Jeff Shelton, and Aleksandra Pawlik posted a summary of their recent bootcamp at Indiana University on the Software Sustainability Institute blog, which we've reposted below. It includes some reflections on what works and what doesn't when using IPython Notebooks for teaching, and similarly with virtual machines. Read More ›

Miscellaneous Videos
Greg Wilson / 2013-07-20
People have been talking about Software Carpentry in a bunch of different venues recently. Katy Huff and Matt Davis did a tutorial at SciPy'13 that was captured in these three videos; Damien Irving gave a talk at PyCon Australia; Luis Figueira talked about learning how to learn; and Ariel Rokem and Shreyas Cholia talked about their experiences teaching in New Zealand in a recent Mozilla community call. (You need to enter a name—any will do—on the launch page for the recording, and then scroll ahead manually to 00:04:54 to hear them because the Flash player won't let me hyperlink directly to a particular time mark. The irony of both hasn't escaped us...) Read More ›

Welcome Our New Instructors
Greg Wilson / 2013-07-19
We're very pleased to welcome a dozen new instructors to the Software Carpentry team: Diego Barneche John Blischak Daniel Falster Rich FitzJohn Steven Koenig Ben Morris Randy Olson Karthik Ram Jory Schossau Will Trimble Bogdan Vera Jens von der Linden Read More ›

The Fourteenth Anniversary
Greg Wilson / 2013-07-19
Fourteen years ago this October, I had the good fortune to attend a one-day workshop titled Open Source/Open Science at Brookhaven National Laboratory. A lot of the tools we work with have changed since then, but the central message remains the same: when it comes to science, the opposite of "open" isn't "closed" — it's "broken". Read More ›

DiRAC Driving Test Ready to Roll
Mike Jackson / 2013-07-18
Since June 2012, The Software Sustainability Institute and Software Carpentry have been working with the DiRAC consortium to develop a "driving test" or basic software skills aptitude test. The test is now ready to be rolled out across DiRAC. Read More ›

Data Science Workflows
Greg Wilson / 2013-07-18
Half a dozen of us got together yesterday morning to chat about The Bad Data Handbook, what the curriculum for a Software Carpentry-style bootcamp for data scientists ought to be, and a bunch of other things. The Etherpad is unfortunately down right now, so you can't read David Flanders' excellent notes, but as an outcome, I'd like to ask you all to do a bit of homework. Read More ›

Feedback from Bath
Mike Jackson / 2013-07-18
This week saw Software Carpentry head to Bath in England's South-West. The bootcamp was organised by Alex Chartier, as part of his role as a fellow of The Software Sustainability Institute. Chris Woods from the University of Bristol, joined me as co-instructor, making his bootcamp instructing debut. Read More ›

Biological Computing User Stories
Greg Wilson / 2013-07-17
Three years ago, when we rebooted Software Carpentry, we wrote some brief descriptions of our intended audience and how we thought we could help them. One of the exercises we at the SESYNC meeting last week (described briefly here) was to write some similar before-and-after stories for biologists who do computing. The descriptions we got are listed below; some of my takeaways are: Pretty much everyone's starting point is Excel... The data hand-off problem is as important as the data processing problem. The stories themselves are listed below—we'd like to hear what resonates and what's missing. Read More ›

Computational Competence for Biologists
Greg Wilson / 2013-07-16
On July 8 and 9, I had the pleasure of taking part in a two-day workshop at SESYNC to discuss what we ought to teach biologists about computing. It was a relatively small meeting, but the participants spanned the range from computer scientists and systems engineers to bioinformaticians, field biologists, and a few odd ducks like me. Read More ›

eResearch New Zealand 2013 Bootcamp Roundup
Shreyas Cholia / 2013-07-14
New Zealand is an amazing place - incredibly beautiful, with a very friendly vibe. We (Ariel Rokem and Shreyas Cholia) had the privilege of traveling all the way from California, to run a bootcamp down in Christchurch, as part of the eResearch NZ 2013 conference. Read More ›

The Oslo Bootcamp
Karin Lagesen / 2013-07-10
Karin Lagesen writes: The bootcamp took place at the University Library at the main campus of the University of Oslo. We had set a limit at 40 people, and we had a total of 32 attendees signing up. Sign-up was in the beginning quite slow, and we were actually at one time wondering whether we would have to cancel the event. But, as we got into June sign-ups picked up. We had decided to ask people to pay $20 to sign up (which was subsequently converted into coffee) and this might have caused people to not sign up until they were sure they could join. We ended up with 27 actual attendees the first day and 25 on the second day. The attendees were for the most part related to biology in some way—not strange considering we had advertised it in connection with the Galaxy 2013 community conference and on several bioinformatics related email lists. Also, most of them were either doing their PhDs or working on a post-doc. Both Lex and I feared lots of issues with installation, but even though the pre-bootcamp survey showed that almost half of the attendees were bringing Windows laptops things went surprisingly smoothly. Read More ›

Workshop for e-Infrastructure Trainers
Greg Wilson / 2013-07-06
Simon Hettrick, of the Software Sustainability Institute, recently posted this: If you have an interest in training researchers how to use computing, software or data, you should attend the workshop for e-Infrastructure trainers on 14 August 2013 at the newly opened Hartree Centre. For more information, please see the full announcement. Read More ›

WiSE Bootcamp Roundup
Greg Wilson / 2013-07-05
It's been over a week since our first bootcamp for women in science and engineering wrapped up in Boston, and feedback has been coming in pretty steadily. In no particular order: Read More ›

Sloan Foundation Proposal Round 2
Greg Wilson / 2013-07-05
From our "better late than never" department: we submitted a proposal to the Sloan Foundation last August to create what we're now calling the Mozilla Science Lab. As regular readers already know, it was accepted; we're now more than six months into the project, so in keeping with our own earlier practice, and with the examples of ImpactStory, rOpenSci, NISO, and Dr. Holly Blik, I'm posting the proposal here. I hope it's useful to other people seeking funding for ventures like ours, and I'd be happy to answer questions. Read More ›

After WISE
Terri Yu / 2013-06-27
Our first bootcamp for women in science and engineering is just wrapping up in Boston. It's gone very well—we'll be posting a summary soon, but in the meanwhile, here are a few things attendees can do as follow-up: Read More ›

Software Carpentry: Lessons Learned
Greg Wilson / 2013-06-20
With contributions from: Azalee Bostroem (Space Telescope Science Institute) Chris Cannam (Queen Mary, University of London) Stephen Crouch (Software Sustainability Institute) Matt Davis (Space Telescope Science Institute) Luis Figueira (Queen Mary, University of London) Richard "Tommy" Guy (Wave Accounting) Edmund Hart (University of British Columbia) Neil Chue Hong (Software Sustainability Institute) Katy Huff (University of Wisconsin) Michael Jackson (Edinburgh Parallel Computing Centre) W. Trevor King (Drexel University) Justin Kitzes (University of California, Berkeley) Stephen McGough (University of Newcastle) Lex Nederbragt (University of Oslo) Tracy Teal (Michigan State University) Ben Waugh (University College London) Lynne J. Williams (Rotman Research Institute) Ethan White (Utah State University) Abstract Over the last 15 years, Software Carpentry has evolved from a week-long training course at the US national laboratories into a worldwide volunteer effort to raise standards in scientific computing. This article explains what we have learned along the way the challenges we now face, and our plans for the future. Read More ›

The Twelve Bar Blues of Open Science
Greg Wilson / 2013-06-19
Most musicians can play along with a twelve-bar blues once they know what the key and tempo are. Many kinds of scientific work are equally well structured: the results aren't predictable—it wouldn't be research if they were—but the equipment setup, sample preparation, note-taking, statistics, and write-up are (mostly) structured in ways that other scientists are familiar with. This lets them pick up each other's projects more quickly; it also gives scientists more time to do what's unique about a particular project because they don't have to spend time thinking about the things that aren't. Read More ›

UMass Amherst Bootcamp: Perspective from a Helper
Terri Yu / 2013-06-18
Background A few weeks ago, I helped out with the Software Carpentry bootcamp at UMass Amherst. It was my first time being a helper. Previously, my exposure to Software Carpentry was the MOOC style online Spring 2011 class, which I took during physics graduate school. I was among the 5-10% of students to finish the course. The low completion rate was one of the reasons Software Carpentry switched to doing live, on-site bootcamps. A couple years later, I got in touch with Greg Wilson and he mentioned that there was a bootcamp coming up in my area. He invited me to join as a helper, I agreed, and I suddenly found myself on several mailing lists. I had no idea what to expect, but the bootcamp turned out to be quite fun and I met many interesting people—including a core Python developer. Post bootcamp, I had many thoughts about the logistics and teaching. The staff invited me to write these up in a blog post. Read More ›

Salk Institute Feedback
Greg Wilson / 2013-06-17
Preston Holmes recently posted a detailed analysis of his experiences helping out at our Salk Institute bootcamp. It contains a lot of insights, and is well worth reading. Read More ›

Bootcamp in Bristol, September 12-13, 2013
Mike Jackson / 2013-06-15
Our bootcamp at the University of Bristol on September 12-13, 2013, is now open for registration. Please register here if you would like to take part. Read More ›

Announcing the Mozilla Science Lab
Greg Wilson / 2013-06-14
We're very excited to announce the launch of the Mozilla Science Lab, a new initiative that will help researchers around the world use the open web to shape science's future. The Science Lab will foster dialog between the open web community and scientific researchers. Together they'll share ideas, tools, and best practices for using next-generation web solutions to solve real problems in science, and explore ways to make research more agile and collaborative. Read More ›

June 2013 Lab Meeting
Greg Wilson / 2013-06-13
Earlier this week, Software Carpentry had its first online lab meeting since October 2012. In attendance were: Read More ›

North Carolina Bootcamps
Ben Morris / 2013-06-09
Last month, we held two Software Carpentry workshops back-to-back in Durham, North Carolina, one at the National Evolutionary Synthesis Center and a second organized by Cliburn Chan for the biostats group at Duke just three days later. These workshops were taught by Jenny Bryan, Elliott Hauser, and myself with Karen Cranston teaching as well at NESCent. The workshops were both somewhat atypical: NESCent was a smaller workshop (only 11 participants) who work together and share an interest in evolutionary biology, and at both NESCent and Duke we presented in R rather than Python. Overall, both workshops were very successful, and I learned a lot, both about R and teaching a Software Carpentry workshop. Read More ›

Running Bootcamps
Greg Wilson / 2013-06-07
It's been eighteen months since we started running two-day bootcamps. We've grown rapidly: Read More ›

Thoughts on an Advanced Bootcamp at Boulder
Ted Hart / 2013-06-07
Alex Viana and myself recently wrapped up a bootcamp at the National Center for Atmosphere Research (NCAR) in Boulder, CO. It was a lesson in the advantages of being light on your feet when it comes to teaching for Software Carpentry. Our normal audience tends to be scientists with varying degrees of familiarity with the tools that we teach. This time we taught about 20 people all of whom were experience software engineers, some of whom had probably been programming longer than Alex or I had been alive. We had planned on starting the morning with some shell and then moving into Git. A quick survey of the room found that everyone was well versed in the shell so we abandoned that lesson all together and went straight to teaching Git. Read More ›

Amsterdam Bootcamp
Onno Broekmans / 2013-06-06
What started out as an unexpected proposal from Greg became very much alive on May 2nd, as 40 physicists poured into the classroom at VU University Amsterdam and started up their laptops. After introducing the teachers (Stefano Cozzini and Justin Ely) and the helpers, we were off: the very first Software Carpentry bootcamp in the Netherlands! In the months leading up to the bootcamp, I had been surprised at the sense of timeliness the bootcamp generated. People here clearly see the need for computational training, most importantly young scientists themselves—available tickets were mostly gone within a week. Read More ›

Feedback and Experiences from Southampton
Robin Wilson / 2013-06-05
I have recently organised a bootcamp at the University of Southampton, as part of my Software Sustainability Institute fellowship. I hadn't actually attended a bootcamp before (I was meant to, but unfortunately had to cancel) so it was a little nerve-wracking doing all of the organising - it all worked out well in the end though. Read More ›

Software Carpentry at INTECOL13
Mike Jackson / 2013-06-02
On 21st August 2013, we'll be attending INTECOL13 in London's Docklands. The 11th International Congress of Ecology, hosted by the British Ecological Society and INTECOL, is one of the world's premier conference for ecologists. At the invitation of Matthew Smith of the BES Computational Ecology Specialist Interest Group and Microsoft Research, Software Carpentry and The Software Sustainability Institute have a talk and a session. Read More ›

From a Helper to an Instructor
Aleksandra Pawlik / 2013-06-02
This blog post is actually a bit overdue as the first time I instructed at a Software Carpentry bootcamp was over a month ago in Manchester. However, as now I had an opportunity to teach twice (recently in Krakow), this gives me more credit to share my experience of taking a step up from being a helper to being an instructor. Read More ›

The Great Licenceathon
Greg Wilson / 2013-05-30
The Software Sustainability Institute's Simon Hettrick writes: Intellectual Property (IP) is a thorny issue in academia, generally because people don't know who owns what. There's a lot of hearsay, rumour and—frankly—misunderstanding about IP ownership, so we've decided to run an experiment. And we're going to need some guinea pigs… … To get a better understanding of how academia deals with IP ownership, we're running The Great Licenceathon. This will see people at different organisations working out who owns their IP. They'll then blog about their experiences and we'll pick through those blogs to see whether there's some common lessons we can learn. … If you want to know who owns your IP, and you have time to ask a few questions at your organisation, then please get in touch. Read More ›

Krakow Bootcamp Experience
Aleksandra Pawlik / 2013-05-30
On the weekend of 18th and 19th May Krakow saw the first Software Carpentry bootcamp. The main organiser was Damian Marchewka from the PhD students' association at the Jagiellonian University. Karin Lagesen came from Oslo to co-teach with me, and four local helpers: Eryk Ciepiela, Maciej Czuchry, Klemens Noga from ACC Cyfronet and Leszek Tarkowski from Infotraining provided great support throughout the whole bootcamp. There were 28 attendees - most of them postgraduate students and several faculty members. They represented a range of disciplines from mathematics and theoretical physics to biology, genetics and medicine. Read More ›

What Does Victory Look Like?
Greg Wilson / 2013-05-26
A lot of changes are happening to science as I write this. Crowdsourcing, open reviews, citeable data, new ways to measure contributions, automatically tracking the provenance of every calculation—each has emerged from its chrysalis and is just waiting for its wings to dry so that it can take flight. Our job is to give scientists the skills they need to nurture these ideas. For the last post in this series, I'd therefore like to talk about what victory will look like—i.e., how scientists' lives will be different if and when the things we teach become as routine as doing statistics. My vision is simple: Read More ›

What Does Done Look Like?
Greg Wilson / 2013-05-26
After recent posts about where we are, our infrastructure, and our plans for the summer break, it seems like a good time to raise our sights and ask how we want Software Carpentry to run once it's all grown up, and what the practice of science will look like when every grad student in the world goes through our training. (Pause for maniacal supervillain laugh.) Read More ›

Our Infrastructure
Greg Wilson / 2013-05-25
As described in these posts from May, and October 2012 (which build on this one from April 2012), we use a complicated collection of tools to manage our work: Read More ›

Where We Are (More or Less)
Greg Wilson / 2013-05-24
In January 2012, John Cook posted this to his widely-read blog: In a review of linear programming solvers from 1987 to 2002, Bob Bixby says that solvers benefited as much from algorithm improvements as from Moore's law: "Three orders of magnitude in machine speed and three orders of magnitude in algorithmic speed add up to six orders of magnitude in solving power. A model that might have taken a year to solve 10 years ago can now solve in less than 30 seconds." Read More ›

Planning for the Break
Greg Wilson / 2013-05-24
We've had a busy nine months: 55 bootcamps since we restarted bootcamps last September, with 19 more in the next two months. We're taking a break then to catch our breath: we only have one event scheduled between mid-July and the end of August, which will give us time to reorganize our online materials and figure out what we're going to do in 2013-14. The list below outlines things that I'd like to see done during that break. I'm sure you have others; I'd be grateful if you'd add them as comments, along with your thoughts on things we shouldn't do, or should do differently. Read More ›

Feedback from the Oxford DTCs
Jonathan Cooper / 2013-05-24
Having helped at two previous bootcamps (Oxford last year and UCL last month), a couple of weeks ago I had my first attempt at organising one at the University of Oxford, primarily aimed at students from the Doctoral Training Centres. Mario Antonioletti and Shoaib Sufi from the Software Sustainability Institute kindly did most of the instructing (I only ran two sessions), and Mario has already written a blog post with his perspective. Here I want to report the feedback we received from attendees and give some of my thoughts on the experience. Read More ›

Browsercast
Greg Wilson / 2013-05-24
David Wolever (a freelance developer here in Toronto) has released the first version of BrowserCast, an IPython Notebook plugin that allows you to synchronize a voice-over soundtrack with a step-by-step reveal of a notebook, thereby creating a screencast-style presentation in the browser. He has created two traditional (video) screencasts showing why BrowserCast is cool and how to create a presentation with BrowserCast, but if you really want to get a feel for it, you can use the bookmarklet that lets you give it a try with your own notebooks, or grab the plugin from GitHub and take it for a test drive. Read More ›

Wrapping Up at UC Davis
Greg Wilson / 2013-05-17
Jenna Lang has posted a great wrap-up on the bootcamp at UC Davis — with Python cookies! Read More ›

Experiences with the Oxford DTCs
Mike Jackson / 2013-05-17
Mario Antonioletti has posted his experiences on being a first-time instructor at our bootcamp for the Oxford doctoral training centres, our second in Oxford, last week. Read More ›

Stanford Bootcamp Recap
Ariel Rokem / 2013-05-16
On May 6th-7th, we hosted a bootcamp at Stanford University. Participants were students, post-docs and staff affiliated with the Center for Neurobiological and Cognitive Imaging and the Neuroscience Graduate Program and both units provided support for the workshop. Bob Dougherty, the research director at the CNI, helped raise a substantial portion of the funds to support the bootcamp (coffee!) and Prof. Miriam Goodman, from the Department of Molecular and Cellular Physiology, helped to get the neuroscience program on board and to secure additional funding from that program. Prof. Goodman was a participant in a SWC bootcamp at Berkeley last year and was eager to get SWC to Stanford. Instructors were Paul Ivanov, Bernhard Konrad and myself. Read More ›

Announcing Hack4ac
Greg Wilson / 2013-05-16
Hack4ac is a one-day hackathon in London, England, on July 6. Its goals are: Demonstrate the value of the CC-BY licence within academia. We are interested in supporting innovations around and on top of the literature. Reach out to academics who are keen to learn or improve their programming skills to better their research. We're especially interested in academics who have never coded before. It looks like fun—if you're in the area and interested in getting involved, they'd be happy to have you there. Read More ›

A Mention in Science Careers
Greg Wilson / 2013-05-14
Vijee Venkatraman has written a good article for Science Careers titled "When All Science Becomes Data Science", which mentions Software Carpentry. Read More ›

Git vs. Subversion and Feedback in General
Greg Wilson / 2013-05-10
Software Carpentry's mission is to help scientists teach other scientists how to be better programmers. If we want to do that successfully, we need to be scientists ourselves. In particular, we need to base what we teach on evidence, not anecdotes or personal preferences. For example: we taught Git at the Toronto bootcamp last week, and once again I think our learners would have absorbed more if we'd taught Subversion. Why? Well, take a look at this diagram by Oliver Steele (which I found on this page written by Nick Quaranto): Read More ›

Make It Easier to (Re)use Your Data
Greg Wilson / 2013-05-03
Software Carpentry has focused on computing for most of its 14 years (primarily because that's what I'm most familiar with) but it's increasingly clear that we need to tackle other parts of the research cycle. One is the new ideas clustered around publication, discovery and metrics, which I'll discuss in a future post. The other is data management; we only touch on the topic right now, but it's as important to most scientists as crunching numbers, and how best to do it is changing rapidly. Luckily, a few of our friends have written a guide for the perplexed: Read More ›

More Detailed Feeback from Melbourne
Greg Wilson / 2013-05-03
The hosts of our February bootcamp at the AMOS conference in Melbourne have collected some more detailed feedback from participants. I'm pleased that two thirds thought the content was just right, and even more pleased that 83% thought version control "must be taught". Read More ›

Translucent Badges
Greg Wilson / 2013-05-02
Digital badges are a hot meme right now. They let anyone, anywhere, issue credentials that are finer-grained than degree certificates or driver's licenses. Want people to know that you can change the oil in a car? There's a badge for that. Or that you can speak conversational Frisian? There's a badge for that too. And "backpack" sites make it easy to aggregate badges, including the ones that show you've created content for Software Carpentry, that you're qualified to teach it, or that you've helped to organize and run a bootcamp. Read More ›

A Rational Computing Process: How and Why to Fake It
Greg Wilson / 2013-05-02
Parnas and Clement's 1986 paper "A Rational Design Process: How and Why to Fake It" [1] is one of the most widely read in the history of software engineering. In it, they argued that designing software according to some particular process isn't what matters; what does is creating documentation after the fact to make it look as though a rational process was followed so that other people can retrace the designers' thinking without heroic effort. Acknowledging the messiness of reality wasn't new: a century ago, Poincaré wrote that most mathematicians figured out the proof after they had figured out the answer, and everyone knows that the description of the experiment that's put in the paper is almost never how the experiment was actually done [2]: Read More ›

Pre-Assessment Results
Greg Wilson / 2013-04-30
Of the 29 people who responded to a brief questionnaire before a recent bootcamp, we have: 18graduate students 4postdocs 4staff 1faculty member 1general public 1special student They describe their current expertise as: 1master 10have written a few programs 17have written a little code 1has no programming experience Here's how they thought they could do on some simple tasks: Write a short program to read a file containing columns of numbers separated by commas, average the non-negative values in the second and fifth columns, and print the results. 11could do it easily 15could probably struggle through 3wouldn't know where to start In a directory with 1000 text files, create a list of all files that contain the word "Drosophila", and redirect the output to a file called results.txt. 12could do it easily 10could struggle through 7wouldn't know where to start Check out a working copy of a project from version control, add a file called paper.txt, and commit the change. 3could do it easily 4could struggle through 22wouldn't know where to start A tab-delimited file has two columns: the date, and the highest temperature on that day. Produce a graph showing the average highest temperature for each month. 19could do it easily 0could struggle through 10wouldn't know where to start Read More ›

An Update on Cumulative Enrolment
Greg Wilson / 2013-04-29
It's been a busy few months, and the next three promise to be busier still. Somewhere in there we helped our two thousandth learner, and if everything goes well, we'll reach 2500 by mid-summer. Many thanks to the entire team for their hard work. Read More ›

Sound Software Competition
Luis Figueira / 2013-04-27
We are very excited to announce the first competition for SoundSoftware.ac.uk prizes for reproducibility in audio and music research. Our goal is to promote the development and release of sustainable and reusable software and datasets alongside published research. In a few of our recent papers and presentations we've seen that much audio and music research work is published without the accompanying software implementations. One reason is a lack of self-confidence - the fear that one's code is not good enough to share. (You can read more about this in our ICASSP 2012 paper, Towards Software Reuse in Audio and Music Research and in the Institute's post Haters Gonna Hate - why you shouldn't be ashamed of releasing your code). We believe that one way of countering this fear is to promote the idea that sharing is a worthwhile goal in itself. With this initiative we aim to reward researchers who strive for reproducibility in their work and, most importantly, to raise awareness of the importance of publishing both the software and the data to allow other researchers to build on a researcher's publications. We see this as a chance to discuss the issues around trust in research, such as reproducible publication, open access, open source and licensing. The prizes we're offering vary, but typically we're going to provide article processing charges to make a publication open access, or travel bursaries for researchers to attend conferences or workshops to present their work. Although the SoundSoftware project is UK-based, researchers from other countries can also take part - in which case SoundSoftware can help with travel costs to a UK workshop or conference, or a visit to a UK institution. The deadline for submitting an entry is the 19 May 2013. Entries will be analysed by an external panel who will look at ease of reproducibility, quality of sustainability planning and potential to enable high quality research in the UK audio and music research community. These prizes are a new initiative for us, but we hope to run them regularly. If you have published your research work so that others can reproduce your research outputs, or if you know fellow researchers from the audio and music community that do so, you can win a prize! More information about the prizes can be found on the SoundSoftware website. Read More ›

Bootcamp Recap: Middle East and South Africa
Aron Ahmadia / 2013-04-24
Although running a Software Carpentry Bootcamp is a rewarding experience on its own, sometimes we get the opportunity to get to travel somewhere really interesting as volunteer educators. This March, I put on bootcamps at the American University of Beirut in Lebanon, King Abdullah University of Science and Technology in Saudi Arabia, and Stellenbosch University in South Africa. For all three bootcamps, I taught as primary instructor, with a total of approximately 12 hours of instruction spread out over 2 days. I adapted content from Software Carpentry as well as previous classes and lectures I've taught in scientific computing, reproducibility, and software development. The first two workshops were taught in teaching laboratories, but more than half the students brought their own Linux and OS X laptops. I was surprised by the operating system mix at Stellenbosch University: almost all of the students brought Windows laptops! Fortunately, we had an almost seamless experience using Continuum's Anaconda package to provide out-of-the-box Python with NumPy, SciPy, matplotlib, and (of course), IPython. Another tool that was invaluable for me in teaching the bootcamps was Etherpad. We created a different one for each of the bootcamps, and it was especially useful when students didn't have a clear view of the screen or ended up slightly behind the others. My advice for anybody running a bootcamp abroad is: plan ahead, especially by installing software in teaching laboratories recruit local helpers stay flexible! Bonus! Software Carpentry in the Laboratory Ulrich Buttner and the New Microfluidics Thrust Area Lab brought a fun problem to work on at second bootcamp, taking live camera data and using image processing to translate pictures into extremely sensitive pressure measurements. Pictures and code from solving the problem are available here. Acknowledgements I owe a big thanks to the hosts and volunteers for each workshop: American University of Beirut: George Turkiyyah (host) and Mike Hamam (IT support). King Abdullah University of Science and Technology: David Ketcheson (host), Lisandro Dalcin (guest instructor), Enas Yunis (volunteer), and Damian San Roman Alergi (photos). Stellenbosch University: Stéfan van der Walt (host and guest instructor). Read More ›

Manchester Once Again
Greg Wilson / 2013-04-24
Mike Jackson has posted a summary of Software Carpentry's second bootcamp in Manchester this month. We're hoping to visit again before the year's end—please keep an eye on this blog for announcements. Read More ›

Software Carpentry at SciPy 2013
Matt Davis / 2013-04-23
Several members of Software Carpentry's vast network will be traveling to Austin, TX in June for SciPy 2013. We are especially excited that Katy Huff and Matt Davis will be teaching a tutorial titled “Version Control and Unit Testing for Scientific Software.” The tutorial is aimed at beginners and the only prerequisite experience will be basic use of Python and the shell. Matt and Katy will cover why testing and version control are good ideas for working scientists and show how to get started with each. Here's the tutorial abstract: Writing software can be a frustrating process but developers have come up with ways to make it less stressful and error prone. Version control saves the history of your project and makes it easier for multiple people to participate in development. Unit testing and testing frameworks help ensure the correctness of your code and help you find errors by quickly executing and testing your entire code base. These tools can save you time and stress and are valuable to anyone writing software of any description. This collaborative, hands-on tutorial will cover version control with Git plus writing and running unit tests in Python (and IPython!) using the nose testing framework. Attendees should be comfortable with the basics of Python and the command line but no experience with scientific Python is necessary. If you're coming to SciPy and you'd like to meet with a representative of Software Carpentry to talk about helping us or organizing a bootcamp at your institution, please get in touch! Read More ›

Spreadsheets, Retractions, and Bias
Greg Wilson / 2013-04-19
Just in case there's any misunderstanding: I'm not suggesting that scientists should use Excel. Now, with that out of the way… Guy Deutscher's wonderful book Through the Language Glass devotes several pages to the Matses people of South America: Their language…compels them to make distinctions of mind-blowing subtlety whenever they report events… [Verbs have] a system of distinctions that linguists call "evidentiality", and as it happens, the Matses system…is the most elaborate that has ever been reported… Whenever Matses speakers use a verb, they are obliged to specify—like the finickiest of lawyers—exactly how they came to know about the facts they are reporting… There are separate verbal forms depending on whether you are reporting direct experience (you saw someone passing by with your own eyes), something inferred from evidence (you saw footprints on the sand), conjecture (people always pass by at that time of day), or hearsay (your neighbor told you he had seen someone passing by). If a statement is reported with the incorrect evidentiality form, it is considered a lie. I thought about this in the wake of reactions to two reports this week of errors in scientific work. The first was the discovery by Herndon and others of mistakes in Reinhart & Rogoff's widely-quoted analysis of the relationship between debt and growth—mistakes that were due in part to bugs in an Excel spreadsheet. Quite a few of the scientists I follow on Twitter and elsewhere responded by saying, "See? I told you scientists shouldn't use Excel!" This week's other report didn't get nearly as much attention (which is fair, since it hasn't been used to justify macroeconomic policy decisions that have impoverished millions). But over in Science, Ferrari et al. point out that an incorrect normalization step in a calculation by Conrad et al. produced results that are wrong by three orders of magnitude. What's interesting to me is that none of the comments I've seen about that incident have suggested that scientists shouldn't use R or Perl (or whatever Conrad et al. used—I couldn't see it specified in their paper). It brings to mind this classic XKCD cartoon: but with spreadsheets on the receiving end of the prejudice. At this point, I'd like you to take a deep breath and re-read the disclaimer at the start of this post. As I said there, I'm not suggesting that scientists should use Excel. In fact, I've spent a fair bit of the last 15 years teaching them alternatives that I think are better. What I want to point out is that we don't actually have any data about whether people make fewer, the same, or more errors using spreadsheets or programs. The closest we come is Raymond Panko's catalog of the errors people make with spreadsheets (thanks to Andrew Ko for the pointer), but as Mark Guzdial said to me in email, "In general, language-comparison studies (treating Excel as a language here) are really hard to do. How do you compare an expert Excel spreadsheet builder to an expert Scheme programmer to an expert MATLAB programmer? Do they know how to do comparable things at comparable levels?" It's plausible that spreadsheets are more error-prone because information is more likely to be duplicated, or because they put too much information on the screen (all… those… cells…). It's equally plausible, though, that programs are more error-prone because they hide too much, and overtax people by requiring them to reason about state transformations in order to "see" values. I personally believe that if spreadsheet users do make more errors, it's probably not because of the tool per se, but because they generally know less about computing than people who write code (because you have to learn more in order to get anything done by coding). But if I had to say any of these things in Matses, I'd be obliged to use the verb inflection meaning, "An internally consistent story for which I have no actual proof." Proof is something we'd like to have more of in Software Carpentry: scientists tend to pay closer attention if you have some, and on the whole, we'd rather teach things that are provably useful than things that aren't. Some people still argue that proving X better than Y is impossible in programming because there's too much variation between different people, but I don't believe them, if only because it's possible to prove that some things work better than others in education. (The study group we run for people who want to become instructors looks at some of this evidence.) Empirical studies of programmers are still fewer and smaller, but we actually do know a few things, and are learning more all the time. If Software Carpentry does nothing else, I'd like it to teach scientists to respond to claims like "spreadsheets are more error-prone than programming" by demanding the evidence: not the anecdote or the "just so" story, but the evidence. Because after all, if we want to live in this world: shouldn't we put our own house in order first? Read More ›

Feedback from Arizona
Amy Brown / 2013-04-19
Julie Messier, a plant ecologist and doctoral candidate at the University of Arizona who organized the recent bootcamp there, has posted about the bootcamp on her blog. Read More ›

Feedback from UC Berkeley
Justin Kitzes / 2013-04-16
Last weekend we held a bootcamp at the University of California, Berkeley, that was targeted for environmental scientists and ecologists. We had a great group of attendees (31 registered, 28 at start, and 24 at end) and instructors/helpers, and, as a major bonus, had no major technical or logistical emergencies to deal with (but see below for some minor ones). Before the workshop, we had participants fill out a brief pre-workshop survey that we set up on Google Drive. The idea was to gather some initial impressions that would help us tailor the workshop content. We learned a few particularly useful tidbits from this exercise. First, about half the attendees had tried Python before and half had never used it. Second, nearly a third had used some form of version control before (big surprise), and half of these had used Git. Third, the most requested topics, by far, were related to importing and manipulating different types of data and to integrating Python with R and/or ArcGIS. As for feedback, we once again had the students use the Etherpad at the end of each day. On the first day, we asked three questions: What was Good, What was Bad, and What Lingering Questions were left. This last question turned out to be very useful, as the instructors and helpers dutifully wrote lengthy responses to all of these questions on the morning of the second day. You can see how this played out on the raw text archive of the Etherpad. We then repeated the Good and Bad questions at the end of the second day. Below is the feedback summarized across both days. We asked each attendee to try hard to come up with a unique comment each day, and to also add +1 to other comments that they agreed with. The Good Well-structured / Good organization +12 Very good to have helpers to avoid getting stuck +11 Self-contained examples and exercises +10 [Ed: We provide "empty" IPython notebooks for live coding and also "complete" versions, with all the answers, for later reference.] Learning about folder stuctures and good workflow habits was incredibly helpful!! +10 [Ed: we capped the workshop with a "reproducible workflow" lesson that required students to use all the skills they had learned] This should be extended to a full-semester course, all 1st-year grad students should have to take it +6 Pace +5 Excellent overview of many topics +5 Interactive +3 Intro to git - i feel like i have what i need to go for it +3 Good clear explanations +2 Etherpad +2 Great to see what is possible to do with the tools we've learned +2 [Ed: For some inspiration, I showed a bigger project at the end that applied the skills we taught in a more complex setting.] Emphasis on the goals of producing reproducible work efficiently rather than on specific programming skills +2 Smart, experienced, helpful teachers +2 It filled in a lot of holes that I had from learning everything on my own +1 Great to actually set up the SSH for github or bitbucket +1 [Ed: unfortunately, most people had some sort of problem getting this set up - in the future, we suggest using https in a bootcamp setting.] Bash help was really clear Places to go for more info were provided Understanding the links between the utility of Python and my research The Bad or Confusing Small desks +8 [Ed: We were in a "standard" classroom with individual chairs and attached desks.] The bash/unix part was slow and thin, python part was thick and (too) fast +8 Learning everything on a linux virtual machine on a windows PC is not super efficient in the long run, since i will either be running this on a pc or will learn how to actually use linux +5 [Ed: we steered all Windows users to a Linux VM since the instructors didn't really feel capable of supporting or troubleshooting a Windows stack.] Feels like we've spent a lot of time but just scratched the surface of programming in python (might be helpful to also have a step 2, i.e. a next level course some time soon) +3 Too fast overall +3 Maybe separate into two weekends or three days? A little bit too intense +3 Maybe it could be useful to have something read before the workshop so we could all be in a similar level +2 My brain hurts (eyes too) +1 The room is kind of cold? Need more coffee? +1 Hard to process everything at once +1 Unsure how much I can alter my workflow now... +1 If you have any issues along the way it is hard to catch back up +1 By the time we got to testing i was brain dead....would be nice to do that earlier +1 Not bad, but would be helpful -- a brief tutorial on piping I/O between programs +1 Some terminology is used that we might not be clear on. not always clear which terms are key to understand and which aren't +1 Should set up guest additions and shared folders for virtual machines before hand on windows More overview of other langs and why/when to use python Need more time for exercises A good overview of file manipulation operations in bash; covering examples of common workflows would be helpful... Too much time spent on troubleshooting installations; perhaps provide pipelines for each person to test their installations prior to arrival? The range of abilities is probably frustrating for people on both ends of the spectrum, too slow or too quick Thanks to instructors Karthik Ram and Geoff Oxberry and helpers Matthew Brett and Jessica Hamrick. Read More ›

Feedback from the EGI Forum
Mike Jackson / 2013-04-16
At last week's EGI Forum in Manchester, we ran a day of bootcamp "highlights". These were 1.5 hour taster sessions drawn from bootcamp sessions. Our sessions covered, Using version control to record provenance and collaborate more easily. Using testing to help ensure your software, and results, are correct. Data management using a NoSQL database to manage your data more easily. We had 15-20 attendees for each session, mostly from a computer science or software development background, and in systems support or technical roles. The interests of the attendees were on the specific technologies covered in each session (Git, Python and nosetests, MongoDB) rather than the concepts underlying these, as most attendees were already familiar with these. The attendees viewed the sessions as useful. Their comments included, Good Bad Keep up the good work Depending on the audience you could merge in more git commands The content was useful The speed of the hands on part was a bit slow (for someone knowing unix and having used other version control systems) [Know] what a NoSQL database is Would have appreciated more inputs as to: when should I choose MongoDB or any other NoSQL db, as compared to a relational db: typical use cases; configuration of MongoDB; performances expected: should I declare some indexes, how, why? How to set up a ""map and reduce"" configuration of MongoDB? Hands-on very clear Useful to start up very quickly to use MongoDB in python The attendees were given SSH logins to a virtual machine with the required tools pre-installed. As we were co-located with a conference, we wanted to allow attendees to drop-in without any advance preparation. Use of a VM avoided any installation woes, though, for our third session, we had to pair up the attendees as the VM began to deny SSH connections as it interpreted too many multiple SSH logins as a possible hack attack (a component called 'fail2ban' was the culprit). I found that using 'nano', rather than my preferred editor 'emacs', acted as a good brake so I was less inclined to commit the instructor sin of making things happen by "magic". While the sessions were very tightly-constrained time-wise - in effect only about 1h10 for "live coding" - I felt it was a good way to convey one or two key concepts and messages and to give attendees the flavour of what to expect if they attend a bootcamp - sort of like a movie trailer! Read More ›

Installation Revisited
Greg Wilson / 2013-04-08
Regular readers (and anyone who has attended a bootcamp) will know that getting learners' laptops set up is the single biggest headache we have. Titus Brown has just posted a discussion of what goes wrong, and some ways we could try to improve things. If you have suggestions to contribute, please add comments there. Read More ›

Evaluation Revisited
Greg Wilson / 2013-04-08
Caitlyn Pickens (a graduate student at Michigan State who is interning with us this summer) has just posted an article about the validity of different kinds of assessment. Caitlyn's goal is to build some tools we can use to gauge the impact Software Carpentry is having—like almost everything involving human beings, it turns out to be more complicated than it first appears, and her post helps lay out the background knowledge needed to understand why. Read More ›

A Bootcamp in Toronto May 9-10, 2013
Greg Wilson / 2013-04-08
We are offering a bootcamp at Mozilla's Toronto office on May 9-10, 2013. Please register here if you would like to take part. Read More ›

Announcing a Bootcamp for Women in Science and Engineering
Greg Wilson / 2013-04-07
On June 24-25, 2013, Software Carpentry will run a computing skills bootcamp in Boston for women in science, engineering, and medicine. With three rooms and six instructors, it will be one of the biggest events we've ever done. bootcamps alternate short tutorials with hands-on practical exercises. Learners are taught tools and concepts they can use immediately to increase their productivity and improve confidence in their results. Topics covered include the Unix shell, version control, basic Python programming, testing, and debugging—the core skills needed to write, test and manage research software. This bootcamp is open to women at all stages of their research careers, from graduate students, post-docs, and faculty to staff scientists at hospitals and in the public, private, and non-profit sectors. Registration is only $20; to sign up, or find out more, please visit the registration page or email team@carpentries.org. And if you would like to volunteer to help during practical sessions on either or both days, please contact us as well. This bootcamp has been made possibly by support from the Alfred P. Sloan Foundation, the Mozilla Foundation, Microsoft, Intel, the King Abdullah University of Science and Technology, the Python Software Foundation, NumFOCUS, and several generous individuals. Armed with a single introductory C++ course, I did a master's degree and several years of consulting work on spatial simulation models before taking Software Carpentry. Long, slow, frustrating experiences left me well prepared to appreciate this course. What has changed? I work more quickly, and re-use my own code; I find more errors, and spend less time fixing them; I trust my results more; I don't mind revisiting and revising old work; collaborators and potential employers are more impressed; and I'm happier. — Josie Hughes develops models of mountain pine beetles and other outbreaking forest insects Read More ›

An Image Analysis Success Story
Frank Pennekamp / 2013-04-05
I am a alumni of the bootcamp in Paris in June 2012. I have been working on a paper on digital image analysis for experimental studies in ecology and evolution working with small-scale model organisms such as aquatic microbes, insects or fish. Besides explaining the principles of image analysis, we provide scripts in Python, R and ImageJ so people can directly use the methods. The paper has now been published, and I'd like to share our success with the Software Carpentry community. Implementing image analysis in laboratory-based experimental systems for ecology and evolution: a hands-on guide The paper explains the basic ideas behind digital image analysis such as different approaches for image segmentation and how to create an automated and validated workflow to extract object counts and morphology measurements on thousands of images. This facilitates quite a bit the work of biologists which otherwise spend hours counting and measuring objects manually on a microscope. Additionally, we provide scripts written in Python, R and the macro language of ImageJ so everyone can embrace the presented methods and perform image analysis on a set of test images (and then on his own set of images). A Software Carpentry bootcamp introduced me to Python programming in general and some of the libraries used (scikits-image) and I also enjoyed quite a bit of your tutorials which helped me to write the scripts. Overlay of One Test Image to Count and Measure Tetrahymena thermophila Cells Read More ›

Connecting Bootcamp Content to Motivation and Best Practices
Paul Wilson / 2013-04-03
As our next Software Carpentry bootcamp quickly approaches, we have been struggling with many of the usual questions of what to teach and what not to teach. We expect a broad array of skills and experience (pre-assessment is yet to be completed), and want to make sure that all the learners leave with a clear motivation to build on what they learn. As we perused other bootcamp agendas and curricula, we were concerned about so much focus on programming and Python in particular. In particular, there has been much discussion about what the real goal of this part of a Software Carpentry bootcamp is and should be. There is not enough time to teach true novices how to program, it is presumptuous to assume that people are coming to learn Python—it's not a Python bootcamp, after all—and we are drawing from a broad cross-section of people across a large research campus. But there must be a reason that every bootcamp dips its toe into teaching programming. Then it occurred to me that we have already spent a lot of effort to refine the motivation and connect the practices and the goals—it's all in the Best Practices paper. There is even more there than we can do in a single bootcamp, but upon reflection it provides a clear narrative connecting the "why" and the "how". Even better, it provides some catchier and slightly enigmatic titles for the different sections of the bootcamp curriculum. I was concerned that a dry list of topics such as "Python variables", "Python data structures", "Python flow control", and "Python functions" would turn some people away, as it looks too much like just-another-programming class. Instead, invoking the language from the paper, we can cover topics like "Write Code for People" in which we discuss variables and data structures, but without forgetting our motivation for choosing good variable names and convenient data structures. Or we can have "Don't Repeat Yourself", in which we introduce the ideas of modularizing your own code by refactoring into frequently used functions, and seeking third-party modules that do what you want. Our instruction of testing will be covered in a learning module called "Plan for Mistakes". Our instructors are now challenged to make their material evoke these themes as students work through the exercises, focusing not just on how to write a simple function in Python, but why to do it and how to do it in a way that meets this motivation head on. We'll report back in a month to tell you how it went! Read More ›

Using the IPython Notebook as a Teaching Tool
Greg Wilson / 2013-03-24
I had a fruitful discussion with Jon Pipitone today about using the IPython Notebook for teaching. Long story short, there are several possible approaches, but we can see problems with each. To set the stage, here are the two "pure" models most widely used for teaching programming today: Frozen: the instructor embeds snippets of code in slides (which can be in LaTeX, PowerPoint, HTML, or some other format). Advantages: Easy to interleave code and commentary. Easy to draw on top of code to highlight, make connections with commentary, etc. Easy for other people to re-use. Easy to add presenters' notes. Disadvantages: The code isn't immediately executable, so some combination of manual checking and tooling has to be used to ensure that the output shown is up-to-date with the code, that the code shown is up-to-date with any hand-out files, etc. It's really easy for instructors to race through slides too fast for learners can follow. "Watch and listen" is a passive learning model, and the more passive learners are, the less they learn. Live Coding: the instructor types into an interpreter as she's teaching; the only thing on the screen is her code and its output, and everything else is delivered verbally. Advantages: Allows responsive improvisation: the instructor can answer "what if?" questions much more easily. Constrains the speed of presentation (somewhat). Facilitates "sideways knowledge transfer", e.g., learners can pick up keyboard shortcuts and other "hows" of coding, etc. Learners learn more if they are typing in code as they follow along. Disadvantages: Learners now have to type and watch at the same time; the former often distracts from the latter (particularly if they make typing mistakes that they can't spot themselves, so that they wind up with a triple burden of watching, typing, and debugging simultaneously). Learners walk away with just the code, not what was said about it, and code alone can be hard to re-understand. It discourages the use of diagrams (instructors can't doodle directly in the Notebook the way they would on a whiteboard, and "let me import this now" is clumsy). There's no obvious place to store the presenters' guide. With practice, preparation, and the right A/V setup, instructors can use a hybrid model: Studio Show: the instructor displays point-form notes on one screen and live coding on another. Pre-planned code examples are stored in a file; the instructor usually copies and pastes from there into the interpreter, but improvises interactively in response to questions. Students are either given the same file of planned code examples for copying and pasting on their machines, or something like Etherpad is used to give the same functionality. Advantages: Gives instructors scaffolding ("here's what to teach next"). Supports improvisation while allowing easy re-synchronization (instructor and learners can get back on track when/as needed). Easy to show diagrams along with sample code. An obvious place to store presenters' notes. Facilitates sideways knowledge transfer. Disadvantages: Requires a double screen. (There isn't enough real estate to show code and slides side-by-side on a regular screen; toggling back and forth between slides and code is very distracting.) Allows the instructor to race ahead (but if learners can easily copy/paste code, this isn't as much of a problem as it is with the Frozen model). Unfortunately, the requirement for two independent screens makes Studio Show impossible in most situations: in the last two years, I've only been able to do this twice. Could the IPython Notebook give us something like the Studio model on a single screen? Here are some options: Frozen with Replay: the instructor has a notebook in which point-form notes are interleaved with code cells. As she lectures, she re-executes the code cells to show that they produce the output shown. Advantages: Easy for other people to re-use. Easy to check that the code samples are in sync with the commentary and the output shown (just "run all"). Easy to keep diagrams beside code and commentary. Disadvantages: No obvious place to add presenters' notes, since everything in the notebook is visible to everyone. (However, the next release of the Notebook should allow authors to add CSS classes to cells. Once that lands, we'll be able to do show/hide buttons as an add-on, which will address this.) Easy for instructors to race through things faster than learners can follow (since they're not typing, just re-executing). This is a minor issue compared to the next two problems. Makes "what if?" risky, because every execution of every cell modifies the server process's state, and a single divergence from the planned path can invalidate every subsequent cell's output. This can be addressed by putting chunks of "reset" code into the notebook to get the interpreter's state back to where it needs to be before each example, but: that's an extra burden on learners, who have a hard time distinguishing "core example" code from "getting us back in order" code (particularly when the latter is usually not actually necessary); and there's the risk that learners will come away thinking that "reset" code is actually necessary, and will include it in their programs (because after all, that's what they've seen). It makes learning a passive experience once again: learners are hitting "shift-enter" once in a while, instead of just watching, but that's not much of a difference from just watching. Live Coding II: start with an empty notebook and start typing in code as learners follow along. Advantages: Works better than a conventional "ASCII in, ASCII out" interpreter: pretty-printed input interleaved with blocks of output, inline rendering of graphs and images, and extras like Matt Davis's blocks are a big step forward. Disadvantages: as with command-line live coding, learners have to type and watch, wind up with just the code (not the commentary), and there's no obvious place to put the presenter's guide. It also discourages the use of diagrams. Live coding is hands-down the better of these two approaches: it does put more of a burden on the instructor (who has to remember the "chord changes" that are coming up) and on the learners (who have to keep up with the typing), but the interactivity makes it a clear win. The question is, how can we improve it? Sync With Instructor: at the press of a button, the learners' notebooks are replaced by clones of the current state of the instructor's notebook. Advantages: Lets learners (rather than instructors) choose between "follow on autopilot" or "type along" (or mix the two). Easy for a learner to catch up if she has fallen behind. Disadvantages: Requires significant engineering effort (as in, unlikely to arrive this year). Doesn't address the diagrams/presenters' notes problem. Gradual Reveal: pre-load both instructors' and learners' notebooks with notes, code, and diagrams, but have a "show next" button to reveal the next cell. Advantages: Learners get everything: notes, diagrams, code, etc. (And so do instructors.) Learners are able to type along, do exercises inline, etc. (with the caveat below). Disadvantages: Once again, any "what if?" can invalidates all subsequent cells. However, there's at least the possibility of deleting the offending cell(s) and doing "run all" to resynchronize. This might work particularly well if "run all" only re-ran cells that have been revealed: learners could try an exercise in the last cell of their notebook, and if they stray too far from the intended path, delete that cell and "run all" to re-sync before the instructor moves on. Lots of Little Notebooks: have one idea per notebook, with no more than half a dozen snippets of code. Advantages: Shorter dependency chains, so less need to reset/resync. Makes it easier for instructors to improvise: they can skip over mini-notebooks if their audience already knows the material or they're running short of time. We can do it now without any changes to the Notebook. Disadvantages: Doesn't address the other issues I've raised: how much is pre-loaded, where do instructors' notes go, etc. The fundamental tension here is between using the notebook as a laboratory tool, and using it as a replacement for either or both of live coding and PowerPoint and its imitators. There's no doubt in my mind that it's better than text-only interpreters for the former, but it still has a ways to go before it's credible as competition for the latter, or as a single-tool replacement for the combination of the two. I'd welcome your thoughts on where to go from here. Note: PowerPoint has lots of detractors, but I think most of their criticism is misguided. Edward Tufte and others say that by encouraging point-form presentations, PowerPoint also encourages bland idiocy, but in my mind, that's like blaming fountain pens for bad poetry: any tool can be abused, and it isn't PowerPoint's fault if people don't use its whiteboarding capabilities. Many other people dislike it because it's closed-source, not web-native, and doesn't play nicely with version control. These criticisms are true, but the alternatives that most proponents of this point of view offer—some based on LaTeX, most based on HTML, CSS, and Javascript—are much more strongly biased toward the point-form idiocy Tufte et al criticize than PowerPoint ever was. Yes, you can use an external tool to draw a diagram, export it as an SVG or PNG, and link to that from your slideshow, but most non-fanatics can see that PowerPoint is proof that going the long way around the houses like that isn't actually necessary. If we want people to take the Notebook (or anything else) as a credible alternative to today's specialized slideshow tools, that's the ease of use we have to match or beat. Read More ›

Testing Image Processing
Greg Wilson / 2013-03-17
Testing has always been part of Software Carpentry, but it's also always been one of our weak spots. We explain that testing can't possibly uncover all the mistakes in a piece of software, but is useful anyway, then talk about unit testing and test-driven development. Separately, in the extended program design example, we demonstrate how to refactor code to make it more testable. What we don't do is show people how to test the science-y bits of scientific software. More specifically, our current material doesn't contain a single example showing how to check the correctness of something that does floating-point. You won't find much mention of this in books and articles aimed at mainstream programmers either: most just say, "Oh, round-off," then tell you to use an almostEquals assertion with a tolerance, without telling you how to decide what the tolerance should be, or what to do when your result is a vector or matrix rather than a single scalar value. I'd like to fix this, but there's a constraint: whatever examples we use must be comprehensible to everyone we're trying to reach. That rules out anything that depends on knowing how gamma functions are supposed to behave, or what approximations can be used to give upper and lower bounds on advection in fluids with high Reynolds numbers. What might work is simple image processing: It's easy to see what's going on (though using this for our examples does create even higher barriers for the visually impaired). There are a lot of simple algorithms to test that can go wrong in interesting, plausible ways. We're planning to shift our intro to Python to be media-based anyway (using Matt Davis's ipythonblocks and Mike Hansen's novice submodule for scikit-image). People can learn something useful while they're learning about testing. How do experts test image processing code? According to Steve Eddins, who writes image processing code at The MathWorks and blogged about a new testing framework for MATLAB a few days ago: Whenever there is a floating-point computation that is then quantized to produce an output image, comparing actual versus expected can be tricky. I had to learn to deal with this early in my MathWorks software developer days. Two common scenarios in which this occurs: Rounding a floating-point computation to produce an integer-valued output image Thresholding a floating-point computation to produce a binary image (such as many edge detection methods) The problem is that floating-point round-off differences can turn a floating-point value that should be a 0.5 or exactly equal to the threshold into a value that's a tiny bit below. For testing, this means that the actual and expected images are exactly the same...except for a small number of pixels that are off by one. In a situation like this, the actual image can change because you changed the compiler's optimization flags, used a different compiler, used a different processor, used a multithreaded algorithm with dynamic allocation of work to the different threads, etc. So to compare actual against expected, I wrote a test assertion function that passes if the actual is the same as the expected except for a small percentage of pixels that are allowed to be different by 1. All right, but how do you decide how many is "a small percentage"? Quoting Steve again: There isn't a general rule. With filtering, for example, some choices of filter coefficients could lead to a lot of "int + 0.5" values; other coefficients might result in few or none. I start with either an exact equality test or a floating-point tolerance test, depending on the computation. If there are some off-by-one values, I spot-check them to verify whether they are caused by a floating-point round-off plus quantization issue. If it all looks good, then I set the tolerance based on what's happening in that particular test case and move on. If you tied me down and forced me to pick a typical number, I'd say 1%. Perhaps not a very satisfying answer... To which I replied: This is a great answer, because it mirrors what scientists do with physical lab experiments: Get a result. Go through the differences between actual and expected to see if they can explain/understand "why". Make a note of their tolerances for future re-use. As we say in our classes, programs ought to be treated like any other kind of experimental apparatus. My question now is, what rules of thumb do you have for testing the science-y bits of your code? We'd welcome replies as comments or email. Read More ›

Cumulative Enrollment
Greg Wilson / 2013-03-17
One measure of how well we're doing is the number of people we've helped. Here's what the last 15 months have looked like: Read More ›

Snowstorms and Blackouts in Virginia
Greg Wilson / 2013-03-15
Despite inclement weather, last week's bootcamp at the University of Virginia went well. According to Steve Crouch, 33 researchers spent two days learning some useful computing skills from Steve, Carlos Anderson, and Ben Morris. Many thanks to Stephen Turner for hosting it—we hope to be back soon. Read More ›

New Camps Coming Up
Amy Brown / 2013-03-14
We have arranged plenty of bootcamps in the last few weeks; here's what's coming up: Site Dates Registration Status American University of Beirut Mar 20–21 not yet open King Abdullah University of Science and Technology Mar 24–25 not yet open University College London Apr 4 and 8 not yet open EGI Forum, Manchester Apr 11 open University of Wisconsin - Madison Apr 29–30 not yet open University of Oxford May 9–10 restricted NESCent May 16–17 not yet open Duke University May 20–21 not yet open University of Massachusetts Amherst May 23–24 not yet open University of Alberta May 30–31 open Tufts University June 3–4 not yet open University of Southampton June 3–4 restricted University of Oklahoma July 1–2 restricted Indiana University July 11–12 not yet open Utah State University July 16–17 not yet open University of Notre Dame July 18–19 not yet open University of Iowa September 5–6 not yet open And of course we're always looking for more eager researchers who want to organize a camp at their institution. If you're interested, please contact us for more information about what's involved. Read More ›

Second Round at Lawrence Berkeley
Justin Kitzes / 2013-03-13
Last week we finished up our second workshop at Lawrence Berkeley Lab. This time around, we tried a more free-form version of the feedback exercise in which students live-typed their feedback into an Etherpad at the end of each day. This went very well, as it gave us time to gather unique comments, get a rough idea of the amount of support for each comment, and to respond to some of the comments in real time. We asked three questions: what was good, what was bad/confusing, and what we didn't do or talk about that we should have. The Good Introduction to python and how to install things with command lines Understanding python objects, classes, functions, methods, etc. Thinking about how to structure the directories for a project Hearing how regular Python users use Python (+1) Just getting finally to write some .py code! Real world example-driven exercises Knowledge of helpful data types in Python iPython notebook and Etherpad are amazing (+1) The party on the Etherpad !!!!! (+1 +1) [Ed: the students had some fun "passing notes" in the Etherpad] Patient, fast, specific help (+1) An explanation of git from people who use git (+1 +1 +1) Inside structure of git You put a lot of things in my brain in a very short time (+1) Standalone scripts is my new goal for my code! Start thinking about how to test code for reusability Danish and coffee / snacks / food juice The Bad or Confusing Python version issues, especially multiple Python versions on same system (+1 +1 +1) Relationship between Anaconda/IPython/Enthought/system Python Use of iPython Notebook vs console, roles in real-world code development (+1) Glossary/handout on Python would have been helpful for beginners Most of the Unix tutorial was extremely basic (+1 +1 +1) [Ed: There were two follow ups to this comment in which students said they needed the basic tutorial badly.] Worrying about whether to abandon perl and R for python - then I'd have to start the learning curve all over. Too fast for beginners, but noticed that others were bored (+1) Not enough 'signposting' during talks/lessons It's over now :\ Things We Didn't Do or Talk About But Should Have Online "getting started with Unix" tutorial before class (+1 +1 +1) Making and annotating/labelling plots, more Matplotlib (+1 +1) Discussion of computational and memory efficiency "Best practices" for developing and maintaining code (+1 +1 +1 +1) Fitting data (+1 +1) How to share code, etc. with those who don't use version control Working with specific types of data (ie, DNA) More about modules and importing, common bugs, etc. Is there a better way to find help than google and stackoverflow? Overhead when working with extremeley large data sets Thanks to Shreyas Cholia and to instructors Matthew Brett, Cindee Madison, and Ariel Rokem and helpers Paul Ivanov and Matt Terry. Read More ›

A New Testing Framework for MATLAB
Greg Wilson / 2013-03-12
Steve Eddins just announced a new unit testing framework for MATLAB. Based in part on lessons learned from his earlier mUnit framework, it has everything you'd want from a modern framework: setup and teardown, analysis, and a bag full of helpful assertions. If you're using MATLAB, you should definitely check it out. Read More ›

First Round at Lawrence Berkeley
Greg Wilson / 2013-03-05
We just wrapped up a two-day workshop at the Lawrence Berkeley National Laboratory, and another is due to start tomorrow. Here's what worked and what didn't from the first one: Good Bad Doing examples ourselves Liked Python examples Interactive Greg kinda knows this stuff Like the history lessons etc. Went through a lot of things Liked IPython Notebook Like intro to Git Intro to shell Examples are incremental Good mix of theory and hands-on Sticky notes Everything is open source Liked keeping functions small Liked red-green-refactor Little more structure please Lost with how to use Git Wanted a cheat sheet Wanted more Python Not a break Monday aft. Not clear how to apply all tips to all things Wanted more structure Monday aft. (and I was teaching it) Wanted cookies in the afternoon Too many things at once Poor choice of room Need a book or handouts Didn't see Python interact with R and MATLAB More specific to specific groups Bringing Python back to the shell etc. How to save terminal session/notebook/etc. Many thanks to Shreyas Cholia, Adam Stone, Nina Lucido, Geoff Oxberry, Matthew Brett, Paul Ivanov, and all our learners for making it a success. Read More ›

Teaching with ipythonblocks at UW
Matt Davis / 2013-03-02
The recent University of Washington bootcamp was my first chance to try teaching Python with ipythonblocks, and it was a great success! The students reported that the immediate feedback from ipythonblocks was a great way to see what their code was doing. We were able to move quickly through topics like for-loops and if-statements in relative comfort, and even cover slicing with ease. I had always hoped that the opportunities for creativity with ipythonblocks would appeal to students. At the UW bootcamp we did observe people, especially those who grasped the concepts quickly, striking off on their own to experiment with larger grids and more complex algorithms for varying block color. I think this is a great advantage of the visual nature of ipythonblocks; there is only such much creative fun to be had playing with lists, but far more possibilities with what is essentially a grid of pixels. My only regret is that we ran out of time so the students didn't get to make Starry Night or practice with files and functions as much as I'd have liked. I think it wouldn't be too hard to expand the lesson I'd planned so that we spent all day playing with ipythonblocks, but of course we'd lose time for other things. Read More ›

Washington Went Well
Greg Wilson / 2013-03-01
Last week, we put on the largest Software Carpentry bootcamp ever at the University of Washington: three rooms, six instructors, ten helpers, and 93 students spent two days together learning about Python, Git, the Unix shell, and a bunch of other useful things. Many thanks to Prof. Bill Howe for making it all possible—we look forward to coming back soon. Read More ›

Feedback from UW Room B
Matt Davis / 2013-03-01
Doug Latornell and Matt Davis taught in "Room B" at the recent University of Washington bootcamp. It was a great group of very enthusiastic students! Thanks also to our very capable helpers Jake Vanderplas and David Leen! Here's our traditional good/bad feedback: Good Bad Learning about Python and NumPy, especially the practice exercises.Too much shell stuff (already shell proficient). We talked about best practice issues in software development.Difficult to keep up with instructor when switching between screens. Great holistic crash course on Python and git. Did well with a range of expertise the room.Too much typing while trying to listen. Overall great intro to many great topics.Would have preferred basic Python scripting instead of NumPy intro. The one-on-one help and the exercises.matplotlib and NumPy were a little dry. Pace was quick, almost too fast, but not quite!The Python part was rushed and we didn't have time to see all of the code. Intuitive approach.Command line tutorial was slow (but that was probably necessary for some). Intro to IPython Notebook.A bit too fast in some content. Very well structured. Good collection of topics for introduction. Staff very responsive.NumPy section could use more exercises/interaction. Very helpful and useful.Wanted to learn about SciPy. The existence of Software Carpentry and the integrated exercises.The software engineering felt too rushed -- if I made an error in my code I easily fell behind and was lost. Wanted just Python. Read More ›

Alternative Teaching Models
Greg Wilson / 2013-03-01
Our two-day bootcamps are working well, but that doesn't mean they're the best—or only—way to teach basic computing skills to scientists. We've been kicking around other ideas recently, and we'd welcome your input as well. Become part of a regular university course. Use the first few days of a course to teach Software Carpentry stuff, and the rest to teach "computational methods in [XYZ]". Counter-argument: SWC doesn't really fit a traditional lecture format, but can fit with in-class projects and discussions, follow-me tutorials, etc. And any course that includes SWC will have to be a fairly special and overly interactive course. We have experimented at Columbia with running a two-day bootcamp at the start of a regular course, followed by 2 hours/week for the remainder of the semester—we should be able to report back on how well it works by April or May. Offer a two-week residential course. Some people think a week or two of full time work is enough time to turn a neophyte into a skilled practitioner. Others think that is virtually impossible without several hundred hours of homework, but (a) we can accelerate that process and (b) if people already have some basic scripting, we can accelerate it further. Either way, this would have to include time for students to apply ideas to their own research problems. For example, our Trieste bootcamp last year involved one week of Software Carpentry and one week of applying those new skills to the students' actual research projects. Helping students apply this to their work at the end of the intensive-learning-period was super awesome. But participants also may have done well because they were carefully selected. And it costs a lot more to implement, both in money and in instructors' and learners' time. This may basically come down to a "many but shallow vs. deep but few" argument. Variations on this idea include: As a course requirement of first year of PhD As part of a domain-specific summer school 90 minute sessions on a specific topic taught during a conference. How to keep people engaged after bootcamps? Offer a robust set of links to online communities by domain. Show people where and how to get help online: "how to ask a question online so that you'll get the answer you need.". Point people to hackerspaces: many professional software developers, people are motivated to share and learn, and love scientists. Online office hours. We've also thought about creating an online community using Facebook, Piazza, Stack Exchange, or a combination of mailing lists and Google groups. However: We tried this with a self-hosted forum in the fall of 2010 and the winter of 2011, with very little uptake (despite active instructors). Does keeping the community small (just SWCers) encourage the timid folks to participate more than they would on something like StackExchange? We need a large population to generate enough conversation to make it worth coming back to. All the answers are out there (e.g., on StackExchange), people just need to know how to find them. Will scientists actually use any of this? The whole mailing list model is foreign to many. Read More ›

Workshop for High-Energy Physics at UCL, Part 2
Ben Waugh / 2013-02-27
Three months after part 1, the second half of the UCL Software Carpentry workshop for High-Energy Physics took place on Friday 15th February 2013. The experimental format with two sessions three months apart might have worked better if, as envisaged, we had managed to get an online discussion going in the gap. Unfortunately this did not work out. Despite several attempts at chasing up the participants, no-one registered or posted anything on the course web site. After talking to some of the students, I think their interest was just not great enough to overcome the bureaucracy involved in registering on the UCL Moodle server and the tendency to focus on their other (assessed) courses and research projects. Attendance was down from 65% to 30% (6 of 20 students), although those who did attend were clearly motivated, worked hard and asked thoughtful questions. With two helpers (James Hetherington and HEP postgrad Sam Cook) and myself as instructor, we had a student:staff ratio of 2:1. Most of the morning session was spent getting back to where we were at the end of part 1, with various technical problems needing to be solved all over again. Surprisingly, everyone came back after lunch and the afternoon session was, I think, rather more successful. Taking advantage of the small student numbers, we divided the class into two teams of three, and set them to work on different parts of the same code base. One team wrote code to retrieve a list of particle four-momenta from a collision event, while the other team worked on generating a list of oppositely-charged pairs from a list of particles. Due in part to the nature of their tasks and the input they had to work with, the first team took a fairly ad-hoc approach to testing their code, while the second wrote a fairly comprehensive set of unit tests. In between iterations (of about 20 minutes each) they talked about what they had done and what problems they had encountered, and after each iteration they pushed their code to the shared repository on GitHub, merging their changes where needed. During the last half hour I pulled the merged code from both teams into my local repository and used a separate Git branch (based on my own version of the code) to plot the results. Then I rushed through a list of topics we had not managed to cover, including testing floating-point numbers. While there is a lot more material I would have liked to cover, and could perhaps have covered by imposing a more rigid structure on the event, I suspect that the most effective learning experience was the rather chaotic extended exercise in collaborative code development. What I would like to do next year is find a way to scale this up to what I hope will be a larger class. Read More ›

A Bootcamp for Women in Science and Engineering
Greg Wilson / 2013-02-27
This summer NumFOCUS and Software Carpentry are hosting a software skills bootcamp for women in science and engineering in Boston. We are seeking sponsorship from companies and others who would like to help raise the $6000 needed to make this happen. We'd appreciate your support, and would be grateful if you'd help spread the word. Read More ›

Wrapping Up in Melbourne
Greg Wilson / 2013-02-15
Two long but useful days at Melbourne Uni have just wound down, and five dozen meteorologists and climate scientists have given us feedback on a well-run bootcamp: Good Bad splitting big problems into small pieces SQLite is easy functions instead of loops Python was easy naming functions after pets hands-on application at the same time as instructor morning tea dictionaries explaining why, rather than just what swapping variables using our own computers copy/paste onto Etherpad arrays in NumPy sticky notes a bit of motivational speaking location (I'm not from Melbourne, eh) Subversion FTW! Git + BitBucket Python Notebook hey, there's useful stuff online helpers red/green/refactor really practical exampels of test-driven development Greg tells stories lot of participation talk to other people during the breaks regular expressions cheap! seeing how programmers think realizing how much room for growth I have scared in the first ten minutes... the material is free peer instruction WiFi didn't do any meteorological data sets the jokes viewing angle installing stuff couldn't use some advanced stuff (I'm a beginner) coffee running right after conference squeezing four half days into two full days (maybe try 3?) not enough exercises not long enough content could have been more focused didn't do NetCDF files a bit of motivational speaking didn't see comparison of Python with R lot of spread of material/ability longer breaks (I'm so tired) databases irrelevant to me IPython/NumPy confusing wasn't in New Zealand not enough time on version control wasn't in Fiji hated SQL Greg speaks waaaaaaaaay too fast learned a lot about Vi hopping pretty intense, dude took a long time to restart after breaks more on plotting lighting in the room is uneven more about linking Python to other things ...after first ten minutes, it was OK Read More ›

Expanding Our Bootcamp Types
Matt Davis / 2013-02-15
Prompted by a recent email I gave some thought to our plans to expand the types of bootcamps we offer, especially when it comes to a sort of "next level" bootcamp and whether we might offer bootcamps that introduce traditional software developers to the world of scientific programming: There is significant discussion on the best way to expand Software Carpentry. Longer bootcamps, advanced bootcamps, etc. At some point we need to start actually experimenting with these things, but we have many constraints, most notably a severe shortage of person-hours to devote to developing curriculum and staffing events. Scientists, it turns out, are very busy people! We focus on teaching software engineering to scientists, but I think it would be an interesting experiment to try to identify the overlapping interests of Software Carpentry graduates and web developers and run a bootcamp based on that. I could see people getting a lot out of the co-mingling at such an event. Picking curriculum could be a challenge. Do we keep it general or pick a subfield? Even in our regular bootcamps we find it useful to change lessons based on the audience. Biologists are into SQL, physicists more interested in fast numeric processing. And the more specific we make things the fewer people we interest. My mind goes immediately to scientific Python but then that's my specialty. If you have thoughts on what to teach in a "next level" boot camp please leave them in the comments, and if you'd like to host one please get in touch! Read More ›

More News from the UK
Greg Wilson / 2013-02-14
Two pieces of news from the UK: The folks at Queen Mary, University of London just ran a two-day bootcamp for students in Media and Arts Technology, and will be running another such event soon. Mike Jackson at the Software Sustainability Institute has posted some quick alternatives to bootcamps. If you'd like help with or advice on these, please get in touch. Read More ›

Registration for Amsterdam Bootcamp is Open
Amy Brown / 2013-02-14
Registration is now open for the upcoming bootcamp for physicists at VU University in Amsterdam. Read More ›

Second Dry-Run of DiRAC Driver's License Exam
Mike Jackson / 2013-02-13
Back in August we did an alpha test of our driver's licence for DiRAC in conjunction with The Software Sustainability Institute. In the spirit of iterative development, we revised our test in light of our experiences and two weeks ago we did a 2nd dry-run. Four researchers based at University College London kindly agreed to take part as examinees, Dr. Jeremy Yates of DiRAC made the local arrangements, and James Hetherington of UCL, a newly-appointed fellow of The Software Sustainability Institute, provided local assistance. In light of the alpha test, we'd made the following changes, Examinees were told that they could use the web, "man" pages and any other resources a software developer uses day-to-day - we're not testing recall but assessing real working practices. After consultation with Jeremy, we replaced Subversion with Git for the version control tasks. To avoid having to set-up examinee-specific accounts, we provided a local Git repository to examinees as part of a ZIP file containing all the exam material. The expectation to use version control at each stage, to add answers, was made more explicit. We added an example of a Python test function and a command-line invocation of "nosetests", for examinees that haven't written a Python unit test or used "nosetests" before. We alloted one hour to the dry-run but, on the day, extended this to two hours. Within this time the examinees attempted all the exercises (version control, shell, automation and Make and unit testing) bar a code review exercise. Experiences and observations included: Examinees learned what they needed on-the-fly (especially for Make and Git) looking everything up online and discussing it. The examinees felt that a primer or more preparation time would have been useful. We allowed examinees to share hints and tips, as asking colleagues is a great way to get help. However, in the test, examinees shouldn't share the actual answers to the exercises! We allowed examinees to write shell scripts in Python or Ruby, rather than constraining them to Bash, as we're assessing knowledge of concepts not specific tools. A non-standard Makefile, with no default target, and a less-than-clear question meant that the Make question devoured a lot of time and had to be curtailed. While some examinees had used Make before, this had been to compile code, not to regenerate data files. Running the test remotely is possible but examinees (and the examiner!) need a local expert in the room with them. Despite these challenges, the examinees stated that they'd learned a lot and the test was valuable in highlighting things that they didn't know. This is a very good outcome and one which we'd hope such a test would achieve. We are now implementing changes in light of the above and will be doing a 3rd dry-run week beginning 25th February. We have also drafted a certificate which all researchers who complete the test will receive (along with a Mozilla open badge) when it goes live: We look forward to reporting on our experiences of our next dry-run Read More ›

Partnering with the SSI
Greg Wilson / 2013-02-12
We are very pleased to announce that the Software Sustainability Institute has agreed to coordinate Software Carpentry activities in the UK. The SSI became involved in Software Carpentry in 2011 when they started developing online lectures for us on advanced shell techniques and file management in Python. In April 2012, they participated in the first general UK bootcamp at University College London; a fortnight later, in conjunction with the Digital Institute at Newcastle University and SoundSoftware, they delivered the first bootcamp to be run entirely by UK tutors. They have since delivered the majority of bootcamps run in the UK and been instrumental in helping us grow in Europe. In their new role, the SSI will help UK researchers organise bootcamps for their research groups, institutions and communities; help the local organisers of bootcamps create and customize content; attract instructors and helpers; manage registration; advise on publicity; and provide support in all aspects of organising a bootcamp. If you'd like help organising a bootcamp, or if you're interested in becoming an instructor or helper, please get in touch. Read More ›

UBC Went Well
Greg Wilson / 2013-02-11
Ted Hart and Ethan White just wrapped up a bootcamp at the University of British Columbia, and by all accounts it went very well: Good Bad learn terminal explanations are clear SQLite is awesome learn some Python following along on screen enthusiasm content was relevant practical software SQLite is very helpful in data management Etherpad SQLite SQLite: could use it tomorrow like lots of problems with partners iPython & SQLite liked different levels as beginner so could get help Etherpad ++ for collaborative note taking current & up to date Git helpers and red light green light pace was good great intro to lots of methods (handled volume of info well) free and open source software good overview of things to improve workflow collaborative notetaking and chating and small problems Git from the command line (Linux guy) Git & SQL, can use and share now hands on problems well balanced with learning good job balancing info and level variation better examples of why Python is cool disecting existing code want to be inspired by an example of something cool but not far enough to implement (longer) cool projects cheat sheets brief one line explanation of everything before class & tailored workshops to specific disciplines too hard to keep up installation is much more difficult than R fewer things so that we can actually use it new to programming more exercises not as clear how to use shell so less useful shell: examples of why we should care, e.g., working across files need to know more about iPython lot of information, too much at times (fun stuff was at end) didn't get to interconnecting the pieces (collaborative Git project) [group repo to play with] group by OS or other similar interests would have like to use Git from terminal where are things coming from and going to iPython notebook disconnected from scripting workflow wanted SQL to R/Python (would have come back for another) more time to integrate everything why Python: spend a few minutes explaining why we chose it and what the alternatives are expected it to be more advanced and focused on integration (more info) "here's what you're going to learn" at beginning not far enough in Python to know where to go next some of the one and one help would have been good to share with the group more difficult problems more collaboratively Read More ›

Correctness Isn't Compelling
Greg Wilson / 2013-02-11
The final report from the ICERM workshop on Reproducibility in Computational and Experimental Mathematics is now available, and its appearance has prompted me to explain why we don't put more emphasis on reproducibility in Software Carpentry. Long story short, it's because scientists don't care because they're not rewarded for doing so. Here's the math: Assume five million scientific papers were published in the decade 1990–2000. (The actual number depends on what you include as a "paper", but our reasoning holds.) Of those, perhaps a hundred have been retracted because of honest computational irreproducibility ("honest", because fraud isn't part of this argument). That means the odds that a scientist will have to retract a particular paper because someone noticed that her calculations couldn't be reproduced are one in fifty thousand. So if the average paper takes eight months to produce, and scientists work six-day weeks, that means it's only worth spending 115 extra seconds per paper on reproducibility as insurance. Different assumptions and models will naturally produce different answers, but won't change the conclusion: given the system we have today, investing extra time to make work reproducible as insurance against error isn't economical. RR's advocates may respond, "That's why we're trying to change the system," but chicken-and-egg traps are notoriously difficult to break out of: if people don't care about the reproducibility of their own work, they're unlikely to check it when reviewing other people's work, and around and around we go. Trying to get them to be an early adopter of new practices (which aren't yet rewarded consistently by their peer group) is therefore a very hard sell. This is more than just speculation. When we first started teaching Software Carpentry at Los Alamos National Laboratory in 1998, we talked a lot about the importance of testing to see if code was correct. People nodded politely, but for the most part didn't actually change their working practices. Once we started telling them that testing improved productivity by reducing re-work, though, we got significantly more uptake. Why? Because if you cut the time per paper (or other deliverable) from eight months to seven or six, you've given people an immediate, tangible reward, regardless of what their peers may or may not do. So here's my advice to advocates of reproducible research: talk about how it helps the individual researcher get more done in less time. Better yet, measure that, and publish the results. Scientists have been trained to respect data; if you can show them how much extra effort RR takes using today's tools, versus how much re-work and rummaging around it saves, they'll find your case much more compelling. Read More ›

Macquarie Went Well
Greg Wilson / 2013-02-08
Last week's bootcamp at Macquarie University went well: I fumbled the introduction to Python on Friday morning (which I haven't done in a long time), but with a bit of help from Eli Bressert and great crew of helpers, we got the afternoon back on track. Here's the usual good vs. bad feedback: Good Bad Understanding the syntax of Python The stories The flow of material Cygwin Linking programming to thinking Understanding size of manageable tasks Version control Sticky notes Embedding provenance in files Database concepts Learning good work habits Coding with loops Free! Let people make mistakes and then correcting Interactive nature of the learning experience I got to show my work to real human beings! Installation The stories Python introduction (Greg screwed up) Some programs don't work on some people's computers Air conditioning Struggling to switch between languages Bit more practice with coding Python in Cygwin sucks "You should only work eight hours a day" — yeah, right Not enough on databases Couldn't see the screen Discovering R isn't as good as I thought it was Not having enough examples Read More ›

We Have a Facebook Page
Greg Wilson / 2013-02-06
Software Carpentry finally has a Facebook page. Please drop by! Read More ›

The Missing Side of the Triangle
Greg Wilson / 2013-02-03
A few weeks ago, John Cook posted the following: In a review of linear programming solvers from 1987 to 2002, Bob Bixby says that solvers benefited as much from algorithm improvements as from Moore's law. Three orders of magnitude in machine speed and three orders of magnitude in algorithmic speed add up to six orders of magnitude in solving power. A model that might have taken a year to solve 10 years ago can now solve in less than 30 seconds. A million-fold speedup is pretty impressive, but faster hardware and better algorithms are only two sides to the triangle. The third is development time, and while I think it has improved since 1987, I also think the speedup is measured in single-digit multiples, not orders of magnitude. Which brings us, again, to Amdahl's Law and the purpose of Software Carpentry. The time needed to produce a new computational result is D+R, where D is how long it takes to get the code to work and R is how long it takes that code to run. R depends on hardware and algorithms; as it goes to zero, the time required to get a new result is dominated by the time required to write, test, maintain, install, and configure software [1]. Reducing that is the "effiency" part of our long-term aim to improve novelty, efficiency, and trust. [1] In practice, R doesn't go to zero for many interesting scientific applications, because scientists scale up their problems to keep running times constant. (As a colleague of mine once said, every simulation takes roughly one publication cycle to run.) Read More ›

Features and Scope in Open Courseware
Greg Wilson / 2013-02-03
A couple of weeks ago, Brian Granger (one of the core developers of IPython) posted some thoughts on features and scope in open source software. In it, he enumerates some of the risks associated with constantly adding new features to a piece of software (open source or otherwise): Additional complexity in the code base (which makes future work more difficult). Increased "surface area" for bugs (the more features there are, the more places a bug might be lurking). Increased documentation and support time. It forces developers to specialize, which makes "big picture" thinking harder. Increased testing effort. A more complex user experience. (Microsoft Word, anyone?) Opportunity costs: time spent working on X is time not spent working on Y. He then enumerates some things projects can do to throttle growth down to manageable levels: Explicitly list things that are not going to be implemented, i.e., define a limited scope for the project. Make features fight hard to be accepted and implemented by telling the community that the default answer is "no". Separate feature requests from other issues. Discuss costs and liabilities as well as benefits whenever a new feature is proposed. Be willing to remove things. He also lists some questions projects can ask: What fraction of your user base will use the feature, and how often? Can it be added as a plugin? How difficult will it be to test, debug, document, and maintain the feature, and what fraction of your development team is capable or interested in doing this work? Can you implement the functionality in a more limited, but much simpler manner? Everything Brian says applies directly to open courseware like Software Carpentry. People constantly suggest new topics that could be added to our material, but few of them say, "...and I'll write it," and more often than not, the topics are things that would interest only a fraction of scientists. We have therefore been cutting back on material rather than expanding it: as useful as Make, object-oriented programming, image manipulation, and disciplined use of spreadsheets are, they just aren't useful enough to justify expenditure of our scarcest resource—time. Read More ›

A Short Report from Tuebingen
Greg Wilson / 2013-02-03
Luis Figueira has posted a short summary of last week's bootcamp at the Max Planck Institute in Tuebingen, Germany. Judging by the number of people who have contacted us since to ask about volunteering at the next one, it appears to have gone very well. Read More ›

A Short Report from Utah State
Ethan White / 2013-02-02
(The first in an occasional series about other ways people are teaching computing to scientists.) My "Advanced Programming and Database Management for Biologists" course at Utah State is based almost entirely on Software Carpentry (though the full curriculum is obviously not just what we teach in two days). It is a blended classroom: for about 2/3 of the semester the students view SWC material before class, at the beginning of class I provide 5-10 minutes of additional introduction to the material or answer questions the students had after viewing the videos, and then they spend the rest of class working on problems related to the material. They also do an independent project related to their research and the last 1/3 of the semester involves primarily me helping individual students work on their research projects (plus I present a set of more advanced or discipline relevant topics that they choose by voting on, e.g., the GPU peta cloud). It's been fairly popular for a rather nerdy graduate class and scores off the charts. Read More ›

Next-Generation Sequencing Course 2013
Greg Wilson / 2013-02-01
Analyzing Next-Generation Sequencing Data http://bioinformatics.msu.edu/ngs-summer-course-2013 June 10th–20th, 2013 Kellogg Biological Station, MSU Course sponsor: NIH. Instructors: Dr. C. Titus Brown, Dr. Ian Dworkin, and Dr. Istvan Albert. Course Description This intensive two week summer course will introduce attendees with a strong biology background to the practice of analyzing short-read sequencing data from Illumina and other next-gen platforms. The first week will introduce students to computational thinking and large-scale data analysis on UNIX platforms. The second week will focus on mapping, assembly, and analysis of short-read data for resequencing, ChIP-seq, and RNAseq. No prior programming experience is required, although familiarity with some programming concepts is helpful, and bravery in the face of the unknown is necessary. 2 years or more of graduate school in a biological science is strongly suggested. Faculty, postdocs, and research staff are more than welcome! Students will gain practical experience in: Python and bash shell scripting cloud computing/Amazon EC2 basic software installation on UNIX installing and running maq, bowtie, and velvet querying mappings and evaluating assemblies Read More ›

A Bunch of Bootcamps
Greg Wilson / 2013-02-01
We've had a busy month: we've run bootcamps at the Max Planck Institute in Tübingen, the Technische Universiteit München, Mozilla's office in Toronto, McGill University, Columbia University (a double-header that's still running today), and the universities of Waterloo, Chicago, and British Columbia. We've also booked a bunch more, some of which are now open for registration: Site Dates Registration University of British Columbia February 4-5 Open Macquarie University February 7-8 Full AMOS Conference (Melbourne) February 14-15 Full University of Washington February 25-26 Full Lawrence Berkeley National Laboratory March 4-5 & 6-7 For LBL staff University of Virginia March 7-8 Coming soon Utah State University March 23-24 Open University of Arizona April 4-5 Open University of Manchester April 18-19 Open Telecom ParisTech April 20-21 Open Vrije Universiteit (Amsterdam) May 2-3 Coming soon GEOMAR (Kiel) May 6-7 For members of ISOS, IMAP and the SFB754 Howard Hughes Medical Institute (Virginia) May 6-7 For HHMI staff Stanford University May 6-7 Coming soon University of California Berkeley May 6-7 Coming soon (in R!) Lawrence Berkeley National Laboratory May 9-10 For LBL staff University of California Davis May 13-14 Coming soon University of Oslo July 3-4 Open University of Bath July 15-16 Coming soon And we're hoping to finalize arrangements with the following sites real soon now: Clemson University Duke University Indiana University Lancaster University Memorial University (Newfoundland) Northwestern University Pennsylvania State University Scottish Universities Physics Alliance Tufts University Tulane University University College London University of Alberta University of Chicago University of Dundee University of Massachusetts University of North Carolina University of Notre Dame University of Oklahoma University of Pittsburgh University of Queensland University of Southampton University of Wisconsin - Madison We'd love to add even more pins to our map, especially in countries we haven't visited yet. If you'd like to help make that happen, please get in touch. And many thanks once again to everyone who has hosted a bootcamp, taught one, shown up to help, or come to learn—installation headaches be damned, we're making a difference :-) Read More ›

Teaching R at UBC
Ted Hart / 2013-01-30
This January we ran a second R-based bootcamp at the University of British Columbia over three days. It was taught by myself, Rick White, Bernhard Konrad, Davor Cubric and Jenny Bryan. The first day was a week before the last two days, and covered a basic introduction to R. The last two days was an advanced R workshop / Software Carpentry. We covered the following topics: Version control with git Regular expressions Advanced graphics with ggplot2 Writing functions Reproducible research using knitr S3 Objects and classes R package creation This probably represents a strong divergence from a normal software carpentry workshop in that we were very heavily focused on the giving people the advanced skills they were interested in learning in a particular computing environment, in this case R. So here are some thoughts on what worked and what didn't work. What Worked A common programming environment: By far the most striking difference between this workshop and the one we ran in the past was that everything just worked. In fact we did away with helpers all together because we didn't need them. As opposed to past workshops where we tried to teach everyone in the environment they liked to work in with R (console, emacs, Tinn-R, Eclipse etc..) we insisted everyone install RStudio. The beauty is that RStudio just works no matter what the platform. Because everything we taught was within R everyone could just do everything. Frankly it was glorious. Catering to people's needs: Before we began this workshop we sent out a comprehensive survey to all the people on the waiting list for the previous workshop. We asked them to rank the topics they were interested in and this allowed us to teach effectively to our audience. While graphics might seem way off the map for us as an organization it's what people wanted and we were able to provide them with that. The language people use ties it together: We all walked away from the workshop feeling that showing people how to do things like version control and regular expressions in the enviroment they use day to day really helped solidify it for them. We didn't use regular expressions or Git abstractly from R, but showed them: "Here's how you can do these things in R, which is great because you use R every day." What didn't work People don't always know what they want: While we had great luck catering to some needs (think ggplot2), people don't always know what they want. Despite high interest on our survey about packages and objects, most people left the last afternoon, leaving only 5 for the last 2 sessions. Most people said: "I'll never make a package, so I don't care." Also it might be just too much. We're considering cutting some material down, cutting some sections and combining our R workshop into 3 days, 9-3 each day on a MWF. If it seems like enough material make it shorter: Every section we taught ran over in time. Those that came in close always worked through one of two methods. We either skipped student exercises and just lectured and had the students type along with us or had an erratic pace. This lead to people commenting that things went from easy to hard too fast. This was because we'd linger too long in the beginning on the basics and then realize time was short and jump to the advanced material with no bridge between. The lesson for us was less material with more student exercises is always better. Read More ›

A Bootcamp at Mozilla
Greg Wilson / 2013-01-30
We ran a two-day bootcamp at Mozilla's office in Toronto last week for people from local research hospitals. It seemed to go well: only 28 of the 37 who'd registered showed up, but with a couple of exceptions because of scheduling conflicts, everyone who showed up on day 1 came back on day 2. Many thanks to Dhavide Aruliah, Yele Bonilla, Mike Conley, Gabriel Devenyi, Fan Dong, and Blake Winton for helping out. Good Bad New Tools Good teachers Good supply of paper/book references Useful simple bash scripts for analyzing text files Snacks: mmm... New plotting tools to replace Matlab Shell tools interesting Importance of version control Preview of IPython Notebook Stickies for feedback Good coverage basics (didn't jump right in to Python) Impressing upon us that everyone's research is relevant Digressions on learning psychology Not enough testing covered Python installation problems Some examples didn't work More hands-on Python needed Digressions on learning psychology Numerical analysis parts too mathematical Need short overnight exercises to support learning More signposts what we're doing and why More exercises and application examples needed Not everything is placed on the Etherpad Getting working scripts not seamless Too damn cold Read More ›

Novelty, Efficiency, and Trust
Greg Wilson / 2013-01-28
I've spent a lot of time trying to figure out what the "big picture" is for Software Carpentry. What are the best practices every scientist should master? What are the principles of computational thinking? How exactly are we helping people? My latest attempt to put this all in a nutshell has three strands: novelty, efficiency, and trust: By improving scientists' computing skills, we're helping them do things they couldn't do before. We're also showing them how to do things with less effort. Finally, scientists can have more confidence in results (both their own and others') that are produced our way. The second of these, efficiency, is what sets us apart from other "computing for scientists" initiatives. On on side, supercomputing and big data have opened up entirely new kinds of science, just as radio telescopes and PCR did. On another, advocates of open access and reproducible research are changing the way science is done, and by doing so, making it easier for people to check and build on each other's work. Our goal is to help them implement all of these things without superhuman effort—to do with ten minutes of scripting what used to take an hour of mousing around in a spreadsheet. Taken that to an extreme, you could say that our goal is to put ourselves out of business: to reduce the time scientists spend writing software to as close to nothing as possible, so that they can spend more time thinking about their science. I suspect, however, that some form of Parkinson's Law will kick in—that time spent programming will expand to fill the hours available. If every one of of those hours is spent productively, we'll know we've done our job well. Read More ›

Visualizing Nuclear Fuel Inventories
Joshua L. Peterson / 2013-01-24
I was working on characterizing the used nuclear fuel in the United States and needed to calculate the isotopic inventories for 174,000 fuel assemblies. This seemed like an overwhelming project and decided to see if using a database could help me. With the help of the online lessons from Software Carpentry I was able to write a Python script that would run the needed information and then store the results in a Sqlite database. I then used an open source tool RapidMiner to visual that data: (Image taken from Wagner et al, "Categorization of Used Nuclear Fuel Inventory in Support of a Comprehensive National Nuclear Fuel Cycle Strategy", ORNL/TM-2012/308.) Read More ›

How to Become an Instructor
Greg Wilson / 2013-01-23
As we've mentioned elsewhere, our instructors are volunteers who donate their time because it's fun, because it makes the world a better place, because they learn things themselves from teaching, and because it's good for their careers. But how did they become instructors? And how can you become one too? Starting with the first question, 21 of our 31 instructors are people who figured out how to do hygienic computational science before we met them (or in some cases, taught us some of what we now teach). The other 10 started as learners: they attended a bootcamp, then volunteered to help with another, and graduated from that to teaching. That's how we see the pool of instructors growing in future, so one of my main jobs this year is to regularize that. We run an online study group to teach the principles of educational psychology, and how those principles translate into classroom practice. We arrange for would-be instructors to help out at bootcamps, then co-teach with someone more experienced, and then run one themselves. We are (slowly) assembling an instructors' guide that lays out our core material and the pedagogy around it (in educational jargon, the "pedagogical content knowledge"). The third round of our online study group kicked off last week, and we expect to start another one in March. If you're interested in taking part, please let us know: we're always looking for good people. Read More ›

Record and Playback in the IPython Notebook
Greg Wilson / 2013-01-22
We decided last fall to start teaching Python using the IPython Notebook rather than a plain old command-line interpreter. It will take us a few months to complete the transition, but one of the things that we're hoping to do in the meanwhile is build a record-and-playback plugin for the Notebook. We think this will be more useful than the screencasts we currently have because: People can't pause a video, then copy and paste its text, because there is no text: there are just pixels arranged to look like letters. For the same reason, the code in videos is invisible to search engines. We can't diff and merge videos in version control the way we can diff and merge source code and Notebooks. Video is inherently less adaptable to new platforms (e.g., tablets and smart phones) than first-class content that's rendered by the devices themselves. As we explain in the issue we've opened for it on GitHub, the workflow we envision is: Author spends minutes, hours, or days incrementally building up the notebook. Author puts notebook in "mark time" mode. Author records audio her favorite desktop tool while clicking "next" in the notebook to insert time marks into metadata. Author saves audio and inserts URL into notebook. Time passes. Audience opens notebook in browser (either a local or web copy—the audio file may be local or web as well). Audience presses "Play". Notebook incrementally reconstructs itself in sync with voiceover. Audience can use "Rewind", "Fast Forward", "Pause", "Go To Cell" (jumps the audio) or "Go To Time" (jumps the notebook) at will. We'd welcome your input: if you have any thoughts on design, interface, or implementation, please add comments to the issue. Read More ›

Online Office Hours
Jon Pipitone / 2013-01-21
Starting this week we're trying something new: one-on-one online help sessions from instructors for anyone who has taken a bootcamp with us. Bootcamps are great for introducing scientists to useful technology and new ideas about how to automate tasks and organise code. Our bootcamps try as much as possible to be relevant to scientists' daily workflow, but they're only two day long events; what happens when participants go back to "real life" and start to fit what they've learned into their daily work? Online office hours are a way we can directly help participants as they tackle their day-to-day computing tasks. We see these as open-ended sessions where scientists can come to ask the "stupid" questions they face as they setup version control for the first time, or write unit tests, or use regular expressions, or... well, anything goes. We'll do this using Skype/Google+ Hangouts/online chat/screen sharing or whatever else works to get people the help they need. Our next online office hours session is this Thursday, January 24, from 4-6pm EST. You can register in advance in order to get a reminder email, or just show up. If you are an instructor and would like to help out, please send us mail. For more info, and to see when future office hours will be, visit the office hours page. Depending on the demand, and supply of instructor hours, we'll try to hold office hours at least once a week. Read More ›

University of Chicago in January
Anthony Scopatz / 2013-01-16
This last Saturday and Sunday, Katy Huff and I had the pleasure of organizing a Software Carpentry workshop at the University of Chicago. We had 77 people attend in total and by the end of the day on Sunday we remained 45+ strong. (Frankly, who doesn't want to listen to me rant about documentation at 4 pm on a Sunday afternoon?!) This weekend was unique in that we had a host of first time instructors: John Blischak Radhika Khetani Patrick Fuller Jed Brown Will Trimble Sri Hari Krishna Narayanan Of note, John Blischak was an attendee of the April 2012 bootcamp at the University of Chicago. Cait Pickens also came up from Michigan State University and did some extensive surveying of the students prior- and post-bootcamp. So be on the watch for her analysis when it becomes available. Support & Thanks One thing that made running this bootcamp significantly different and easier than all prior workshops I have been involved in was the generous institutional support from the Graduate Student Affairs office and Kalee Ludeks. In addition to providing free lunches catered by Z&H on both days, Kalee also helped us reserve a room, contact departments and graduate students, create a flier, and all of the other minutia that goes into organizing one of these events. I can safely say that without their help, this bootcamp would not have happened! So if you get an opportunity, please take the time to thank them. And, as always, thanks again to Greg Wilson and the Software Carpentry crowd for enabling this kind of activity. Pictures For the visually inclined, here are some pictures taken by Cait of the workshop in progress. Read More ›

Montreal in January
Ross Dickson / 2013-01-14
This weekend was the first time I had instructed at a SWC bootcamp, and I'd only observed one other (Greg Wilson in Halifax in July 2012). By the new standards for instructors Greg's trying to put in place I would probably not even qualify, but hey, we're in a growth phase right now; we take a few chances, eh? I felt my inexperience acutely, but despite that I think I can say that the bootcamp went well, thanks to a great group of attendees and helpers and my co-instructor, Jessica McKellar. The site organizers were Alex Demarsh and Rolina van Gaalen, and the other helpers were Greg Ward, Julia Evans and Jonathan Villemaire-Krajden. Nominal registration charge For this event we charged $20.00 for registration (which Eventbrite transmogrified into $22.19), with the question in mind "What does this do to the no-show rate?" We exactly sold out: Forty tickets were bought, the last one going less than a day before the event. We didn't get a precise head count early on Day 1, but we estimated later that we had 35 to 37 out of 40 there. One person explicitly wrote in sick on Day 1, but made it out on Day 2. We also had some drop-off. By the time we wrapped up on Sunday afternoon we had 26 attendees in the room. We missed an opportunity there: Had I thought of it in time, we could've gotten the die-hards to give us their names in Etherpad, then we could've diffed that against the original sign-up list and sent mail to the no-shows and the leavers asking what we could've done to retain them. By way of compensation for the registration charge, we arranged pizza and pop gratis for lunch both days. Alex and his colleagues produced coffee on demand, and also clementine oranges and cookies on Day 2. All very good! Setup issues One reason we didn't get a better head count up front was that the instructors were still wrestling with network and video setup issues as the clock struck 9:00 on Day 1. We should've scheduled more than 30 minutes of set-up time before the students were to arrive. And unfortunately Jessica, who came with a Mac, was unable to get the Windows-centric university-owned video system to play nice, and she was condemned to delivering all her content from Ross's Windows laptop. With a maddeningly small keyboard. Syllabus and students From the standard syllabus we did the shell, Python, revision control with Git, SQL and testing units. Once again we had a lot of people in the room who already knew some R --- epidemiologists and linguists, in this instance. My opinion was reinforced that, had we the right materials and experience, we could get them a more-solid intro to good coding practices and testing if we were able to skip the "teach a language" step and give it to 'em straight in R. You can say "this is language-independent" all you like, but there's nothing like seeing it done with the tools you already use. Bioinformaticians comprised the second-largest bloc of attendees, and there was a smattering of singletons from other disciplines too, I think. That's another thing we didn't capture explicitly. I'm looking forward to having an entry survey worked out to alleviate this in the future. Here's the Good & Bad list that we asked them for in the afternoon of Day 2: Negative Positive How do I apply database stuff? How do I import database data from Excel files? Would like more about the shell Need more basic shell commands, refresher, exercises? Want more about testing, please, More about data extraction/conversion Could go deeper on Python No entry into scientific /mathematical computing(Numpy was not on syllabus for this course) Hard to see how to transfer the knowledge to my tools Would be nice to combine the tools to produce one "big" project Cheat sheets not comprehensive enough Could use more on troubleshooting Want more examples from my field Attendees from one field might improve many things Maybe split the class on day 2 into basic (more Python) and advanced (testing)? More sharing of tool knowledge? Crowdsource? Good breadth, lots of topics Liked the intro to general programming Version control detail welcome! Not too scary for newbies Hands-on component Helpful helpers Good coding style appreciated Liked the cheat sheets (We had paper cheat sheets for shell, Python, SQL) Different levels of Python exercises for different levels of student. Regarding that last comment, at one point Jessica recommended to students: http://bit.ly/sc-codingbat, http://bit.ly/sc-codingbat2, and http://rosalind.info depending on whether the student was feeling too-challenged or bored. Despite the negative list being longer than the positive, almost all the (remaining) attendees seemed to me to be genuinely pleased with what they'd learned. Tools The tools we had them install were Python 2.7 (no iPython); GitBash if on Windows, Git otherwise; the Firefox SQLite Manager; and we recommended Notepad++ for Windows and Smultron for Mac by way of text editors. We used Etherpad for out-of-channel communications and "group memory", which was voted a big win at the end of the event. I also handed out red and green stickies as status flags. They were used haphazardly. I think these semaphores might be Just One More Bleepin' Thing To Remember for the student, or else I didn't reinforce their use enough in the first hour. I'll probably try them again. The biggest tool problems we had were with GitBash. I decided about two weeks before the event to call for GitBash instead of Cygwin for the Windows users, to teach the shell, since Jessica was planning to use it for teaching Git. This was a mixed success. The installation is simpler than Cygwin and it uses the native Windows directory structure, but there are surprises in GitBash that we banged our shins on: 'more' is missing, but 'less' is there. 'man' is missing, but 'cmd --help' usually works. (One helper thought that the absence of 'man' was actually a win. He has a point.) 'seq' is missing. 'file' is missing. The cut-n-paste facilities are inconvenient. And you don't need to set an executable bit on a script in order to execute it: $ cat junk echo "Hello bash" $ ls -l junk -rw-r--r-- 1 ross Administ 19 Jan 14 11:53 junk $ junk Hello bash This last no doubt has something to do with the underlying Windows, but I feel like bash is now popular enough that it's spawning incompatible variants. *Sigh*. My advice to future bootcamp instructors is, try GitBash in the safety of your own home for a good long time before calling for it as an instructional tool. I'm still of two minds about it, myself. Read More ›

Teaching Commercially
Greg Wilson / 2013-01-11
A couple of people have contacted us recently to ask about running Software Carpentry bootcamps for companies. Our material is all Creative Commons licensed, so anyone who wants to use it for corporate training can do so, and doesn't need our permission. What does need our permission is using our name and logo, since they're trademarked. We're happy to give that permission if we've certified the instructor [1] and have a chance to double-check the content [2]. It would be great if starving grad students could help pay their bills from this, in the way that many programmers earn part or all of their living from open source software, so if you'd like to give it a try, please get in touch. [1] If you'd like to join our instructor's roster, please get in touch: we run an online study group to help train people, and once you've gone through that and co-taught with us a couple of times, we'd be happy to badge you. [2] "Double-check the content" because we've already had one instance of someone calling something "Software Carpentry" when it had nothing to do with what we usually teach. We've worked hard to create material that actually helps scientists, and to build some name recognition around it, and we'd like to make sure our name continues to mean something. Read More ›

PLoS Ad for Software Carpentry
Greg Wilson / 2013-01-10
The folks at PLoS Computational Biology kindly placed an ad for Software Carpentry on their home page today—thanks! Read More ›

The Art of Cold Calling
/ 2013-01-05
An updated version of this post is now available. A few people have asked how we go about approaching potential workshop hosts. The short answer is, however we can; the longer answer is, we collect names any way we can—from journal articles, watching who follows us on Twitter, just bumping into people—and send emails like the one below to people who are publishing there. A few things to note: I always open by apologizing for adding to their inbox. (It's a Canadian thing...) Always establish a point of connection: "I read your paper on X" or "I was speaking to Y". This must be specific: "I recently read a paper of yours" sounds auto-generated (because it so often is). Explain how we're going to help make their lives better (e.g., "Your graduate students will be able to push your project XYZ ahead much faster if you let us help them"). Be specific ("Here's our usual two-day curriculum") so that they can figure out right away whether this is worth pursuing. Cite our backers (currently the Sloan Foundation and Mozilla), as this makes us more credible. And while the example below doesn't do this, it helps to mention a recent workshop run for someone they know, or at somewhere they'd find impressive—again, credibility. Don't hide the fact that they'll have to pay for travel and accommodation, but point out that they're not paying for people's time, which makes this training really, really cheap. Above all, keep it short. The message below takes 30 seconds or less to scan; add another few seconds for them to check the Cc: list (where possible, approach people in groups), and either they're hooked enough to hit 'reply' or they're not. It's worked pretty well for us: About 60% of emails are answered. Over half of those answers are, "Sure, let's talk more." More than half of those discussions lead to workshops which means that about 1/5 of emails turn into workshops. This is a lot more impressive than it might sound—most people in sales figure that 2-5% conversion on cold calls is outstanding. If you'd like to give this a try (i.e., email someone on behalf of Software Carpentry to try to start setting something up for 2013), please let me know—it's a useful skill to pick up. Hi, I hope you don't mind mail out of the blue, but I saw your recent paper on building a computational materials repository, and was wondering if you'd be interested having us run a Software Carpentry workshop for your intended users --- we're scheduling workshops for the coming year right now, and it might be a way to help your community get more out of what you're doing. Our aim is to teach researchers (usually graduate students) basic computing concepts and skills so that they can get more done in less time, and with less pain. Our usual two-day curriculum includes things like: The Unix Shell Version Control Testing Structured programming with Python Databases Number Crunching but we're happy to tailor content to meet the needs of specific audiences. We're funded by the Sloan Foundation and Mozilla, and our instructors are volunteers, so the only cost to host sites is their travel and accommodation. (We can handle registration online, or leave it in hosts' hands.) We aim for 40 people per workshop, and look for 2-3 local helpers to assist during practicals. Two independent assessments in the spring of 2012 confirmed that what we're doing accelerates participants' research, so if there's an upcoming meeting, conference, or get-together where a lot of your intended users will be together, we'd welcome a chance to chat at greater length. Thanks for your time—we look forward to hearing from you. Dr. Greg Wilson http://software-carpentry.org Read More ›

Why We Teach
Greg Wilson / 2013-01-04
Data Sharing and Management Snafu in 3 Short Acts: you have to laugh, because otherwise you'd cry... Read More ›

Advice From a Newbie No More
Greg Wilson / 2013-01-04
Adina Chuang-Howe recently wrote a great blog post titled "Advice from newbie to newbie", in which she gives some advice to her younger and not-yet-computationally-proficient self: Commit (and stop whining) Use the right tool for the job Take Command Install and use a text editor Learn to manipulate files Shortcuts! Learn a programming language Become addicted to print statements (I'd actually say "...to a debugger", but never mind) Do something relevant Start automating and most importantly of all: "You can do it!" The full article is great advice to anyone who's getting into computing, scientific or otherwise, and well worth reading. Which raises the question: what advice would you give your younger self? Read More ›

Computer Science Curricula 2013
Greg Wilson / 2012-12-23
Following a roughly 10 year cycle, the ACM and IEEE Computer Society jointly sponsor the development of a Computing Curricula volume on Computer Science. These volumes have helped to set international curricular guidelines for undergraduate programs in computing. Planning for the next volume in the series began in the summer of 2010, and the Version 0.8 (Ironman) draft was published a couple of months ago. It divides topics into: Tier 1 Core: every Computer Science curriculum should include all of this material, and every student should have to cover it. Tier 2 Core: curricula should include all or almost all of these topics, and the vast majority of students should cover them. Elective: every cirriculum should also include significant elective material. The material is also divided into "knowledge areas". Most, like "Algorithms and Complexity", are coherent and well-defined, but others, like "Platform-Based Development", are grab bags filled with odds and ends. So how does our material stack up against these recommendations? Overall, not badly: Algorithms and Complexity Basic Analysis 4 hours 1/2 hour Architecture and Organization Machine-Level Representation of Data 3 hours 1/2 hour for both Memory System Organization and Architecture 3 hours Computational Science Data, Information, and Knowledge elective 1/2 hour Information Assurance and Security Fundamental Concepts 3 hours nothing yet, but working on it Network Security 5 hours 15 minutes (SSH and keys) Information Management Query Languages elective (really??) 1.5 hours (SQL) Information Storage and Retrieval elective 10 minutes (character encoding) Networking and Communication Introduction 1.5 hours 10 minutes (TCP and DNS) Networked Applications 1.5 hours Operating Systems File Systems elective (really??) 15 minutes Programming Languages Functional Programming 7 hours 1/2 hour (first-class functions) Basic Type Systems 5 hours 10 minutes Software Development Fundamentals Algorithms and Design 11 hours 1/2 hour [A] Fundamental Programming Concepts 10 hours 1 hour [B] Fundamental Data Structures 12 hours 1 hour [C] Software Engineering Software Processes 3 hours 15 minutes (mostly about agile, mostly as asides) Tools and Environments 2 hours 1.5 hours (version control and testing tools) Software Design 8 hours 15-30 minutes (mostly by example while teaching "Fundamental Programming Concepts") Software Construction 2 hours 15-30 minutes (as above) Software Verification and Validation 3 hours 1/2 hour (overlapped with "Tools nad Environments") Social Issues and Professional Practice Intellectual Property 2 hours none, but we need to add something [A] The ACM/IEEE curriculum focuses on problem-solving strategies like divide-and-conquer, which aren't part of what we teach. It also includes abstraction, program decomposition, encapsulation, and interface/implementation separation, which we definitely do. [B] This heading includes the basics of imperative programming: loops, conditionals, file I/O, functions, and so on. It's the only place where there's pretty much a one-to-one alignment between our material and the curriculum's. [C] Arrays, strings, sets, and maps (dictionaries): check. Stacks and queues: we don't do that (although we would if there was time). References and aliasing: definitely, though I always wonder how much our learners actually understand. The biggest discrepancy is actually between our material and what appears under their "Computational Science" heading. It is an odd beast, including: "Modeling and Simulation" (no problem there); "Processing", which is mostly about the practical implications of computer architecture; "Interactive Visualization", which rehashes the larger "Graphics and Visualization" knowledge area; and "Data, Information, and Knowledge", which does the same for the "Information Management" knowledge area. It's very revealing that version control, testing tools, and modular program design aren't included. The standard's authors would probably say that's because they're covered elsewhere, but the same is true of processing, visualization, and data management, all of which get special mention. It seems we have our work cut out for us... Read More ›

Sample Data Management Plans
Greg Wilson / 2012-12-21
Neil Chue Hong recently pointed us at DataONE's Data Management Planning page, which has two really useful things: a link to DMPTool, which will help you build a data management plan for your project; and five sample data management plans, all written for the NSF, which people can use as reference models. Data management is something we deliberately left out of "Best Practices" paper—it would have increased the paper's length by half again—but what we should put together is some sample code management plans. Any volunteers? Read More ›

Code of Conduct
Greg Wilson / 2012-12-21
Software Carpentry is dedicated to providing a harassment-free learning experience for everyone, regardless of gender, sexual orientation, disability, physical appearance, body size, race, religion, or choice of text editor. Following the example of the Python Software Foundation, PyCon Canada, and other community members, we have therefore decided to adopt a code of conduct to make it clear what kinds of language and behavior are and are not acceptable at our events. If you'd like to know more, we couldn't possibly write anything better than this blog post by the Python Software Foundation's Jesse Noller. Read More ›

Minutes from 2012-12-19
Greg Wilson / 2012-12-19
Our second all-hands meeting was held on Wednesday, December 19, from 12:00-13:10 Eastern time. Actions Dhavide Aruliah Co-author bullet-points and cheat sheet for Unix shell Co-author bullet-points and cheat sheet for basic Python Amy Brown Proof-read Code of Conduct Add Code of Conduct to bootcamp "how-to" Create graphic for "Participant" badge Create PDF certificates for "Participant", "Instructor", and "Organizer" Add wording to web site to clarify what does/doesn't qualify for use of "Software Carpentry" name and logo Add wording to web site asking people to let us know if they're using our material, and requiring them to check with us if they're using our name/logo Keep track of who's talking to what sites about bootcamps Get explicit retroactive signoff on contributions Neil Chue Hong Contact OSS-Watch regarding enforcing trademarks for our material Matt Davis Arrange bootcamp at Cal Poly Arrange bootcamp at San Diego State matplotlib material Ted Hart Co-author bullet-points and cheat sheet for regular expressions Arrange bootcamp at ESA conf. Bernhard Konrad Co-author bullet-points and cheat sheet for testing Emily Jane McTavish Co-author bullet-points and cheat sheet for regular expressions Create ticket to arrange bootcamp at U. Kansas Geoff Oxberry Create ticket to arrange bootcamp at U. Delaware Cait Pickens Co-author bullet-points and cheat sheet for Git Co-author bullet-points and cheat sheet for basic Python Karthik Ram Co-author bullet-points and cheat sheet for Git Joon Ro matplotlib material Alex Viana Co-author bullet-points and cheat sheet for SQL Ben Waugh Co-author bullet-points and cheat sheet for testing Ethan White Co-author bullet-points and cheat sheet for SQL Lynne Williams Co-author bullet-points and cheat sheet for Unix shell Greg Wilson Publish Code of Conduct to web site Retroactively issue "Participant" badges for 2012 bootcamp attendees Issue "Instructor" and "Organizer" badges Finish Instructor's Guide by end of Jan 2013 All Check locally to ensure that it's OK to ask bootcamp participants to agree Code of Conduct Claim ticket for "Arrange bootcamp at [site]" Introduce us to one more potential bootcamp site for 2013 Volunteer to help with online office hours Dhavide Aruliah Erik Bray Shreyas Cholia Matt Davis Ross Dickson Steve Haddock Mike Hansen Geoff Oxberry Cait Pickens Alex Viana Ben Waugh Ethan White Send Greg a picture of yourself in a Software Carpentry t-shirt Roll Call Aron Ahmadia Carlos Anderson Dhavide Aruliah David Ascher Azalee Bostroem Erik Bray Amy Brown Shreyas Cholia Neil Chue Hong Warren Code Matt Davis Gabriel Devenyi Ross Dickson Justin Ely Steve Haddock Ted Hart Konrad Hinsen Katy Huff Trevor King Ian Langmore Chris Lasher Emily Jane McTavish Cameron Neylon Geoffrey Oxberry Cait Pickens Karthik Ram Anthony Scopatz Alex Viana Ben Waugh Ethan White Lynne Williams Greg Wilson Minutes Staffing Amy Brown from Continuum to help with administration starting Jan 2013. Jon Pipitone to help with web site and online tutorials starting Jan 2013. Moved to Git Git tutorials to be folded together and put into lessons to teach to scientists Code of Conduct Many sites not sure that we can set a code of conduct for events that happen on campus Code of conduct modeled on the PyCon Code of Conduct Those who have hosted events/or know of events scheduled at US universitites, please check and let Greg know what your local policy is Might apply to overseas schools too (e.g., Saudi Arabia) Problems with inappropriate behavious at bootcamps: Rare for problems, but nice to let people know expectations in advance Criteria for Badges Criteria needs to be clear to ensure the instructors are qualified Previous instructors will be grandfathered in Badges for learners important as part of grant QA process Obstacles: No way to know what attendees learned Not allowed to publish identifying information without consent --> we would need sign-off giving permission to publish names Greg: "We could do a participant badge with consent". Also mentioned "Driver's License Test" to come in future Will use http://openbadges.org as a badging tool Motion passed "Participant Badges" --> will contact 2012's attendees to tell them they can have a record of attendance Will postpone assigning skill specific Participant badges until we have specific criteria (to be developed by the spring) Amy will be designing PDF for certificate Past participants can apply for badges post-attendance Use of "Software Carpentry" Name If events focus on four general areas okay: modular program structure task automation version control testing If the event does not fall into one (or more) of these categories, then they cannot call their workshop/bootcamp Software Carpentry or use the Software Carpentry logo Not trying to stop people we don't know from using materials, trying to stop people not teaching SWC content areas from using SWC name. Greg still needs to speak with a lawyer about international trademark laws Ian Mitchell has a draft, but policy still needs legal review Greg will circulate draft later Trademark only licensed in Canada currently Need to have people to check with SWC central before using the name. Even if we can't legally enforce it (Thanks Konrad!) Motion passed "use of Software Carpentry" Neil Chue Hong will get in touch with OSS-Watch regarding enforcing trademarks for open source software Volunteers to write three-line summary of modules for pre-bootcamp circulation as well as cheat sheets for use in bootcamps People going to bootcamps not entirely sure what they will be learning Need summary of what SWC teaches to include in agenda Summary (for before the workshop) + cheat sheet required --> makes sense for pairs to work on both bcause they will have to think about all the same issues. Unix shell - Dhavide -- Lynne Fernando and I gave out this UNIX cheat sheet at the last workshop: It's CC licensed; might be helpful as a starting point at least, but we liked it. That cheat sheet is CC BY-SA, and SWC is CC BY, but maybe not a problem if you keep the cheat sheet separate. Version control with Git - Cait, Karthik. Basic Python --Dhavide, Cait Testing Ben, Bernhard (if you want help) (Yes, thanks, will e-mail you.) SQL -- Ethan, Alex Viana Regular Expressions - Emily Jane, Ted [what portion of bootcamps include regexp?] [I don't know in general- but folks at UT really liked them last week][I like them too, but we didn't cover them] [Do you guys also cover regexp testers? Kodos is super useful for Python. [like regexpal?]I use regexpal as well Cool! --> grep and sed in shell scripting Volunteers to write cheat sheets for modules for use in bootcamps See prevous point on 3 line summary. The volunteers for each section will do both the 3 line summary and the cheat sheet Volunteers to lead "sales" conversation with not-yet-scheduled bootcamps Go to the SWC GitHub repo and volunteer to help coordinate workshop Volunteers to make introductions to new sites Companies are okay, even though focus is on public sector Bootcamps do not have to be in English; welcome moving the material to other host countries Amy to keep track of who is talking with which sites [katy] Argonne! Katy: Other Chicago area school ideas? [katy] other than UChicago, which we're doing... I can't bottom-line more than one more before this summer... but I could help at Northwestern, UIC. I don't really have contacts at those places... but... hm. [cait] I don't have contacts either (undergrad at Lake Forest College, north of the city, but no grad students there). Northwestern could be interesting. [Geoff] Livermore? I could do Philadelphia-area sites also, with enough advance notice, since I grew up around there. I went to the University of Delaware, and can talk to them. [Emily Jane] University of Kansas [Ted] I have a proposal in for ESA 2013 (Ecological society). [Karthik] yay Ted. I'm maxed out on ESA workshops but will help run it if I don't have schedule conflicts. let me know. [Geoff] I could do one at the AIChE conference; it's being run in San Francisco, so it's easy for me to add another couple days.[Karthik] I'm in SF and can help. [Matt] Cal Poly, San Diego State Volunteers for online office hours (Jan-Feb, once a week) Online tutorials not well attended last summer Will test having individual help to answer that question Greg/John will send out a call in the new year Volunteers willing to give 1 hour per week: Steve Haddock: sounds like fun to try. Dhavide Aruliah: I'm willing to give it a go for an hour on Wednesdays... Ethan White: Count me in Mike Hansen: I'm onboard. Ben Waugh: me too, and have some funding to pay grad students to help Matt Davis: Definitely in for this. Cait Pickens: I'm in Geoff Oxberry: I'm in, after I get trained in the study group Ross Dickson: volunteers Erik Bray Alex Viana Shreyas Cholia: onboard assuming travel schedule is not too hectic Hosts please encourage people to use the service once it goes online Goal is to give people enough knowledge to ask their questions on something like StackOverflow Lynne to track questions/responses and create FAQ page for webpage Volunteers for minor tasks Someone to volunteer to write numpy/matplotlib (#72) [Joon Ro just developed some material as well] [I thought Matt had a start on this?] Developers Certificate of Origin/Signed-off-by for Git(Hub) commits. https://github.com/wking/signed-off-by/blob/master/Documentation/SubmittingPatches Need retroactive clearance for open source for content developers Applying to Sloan Foundation to have someone work FT for SWC for a year New teacher materials? - Instructor's Guide Partially Exists Hopefully by end of January with John's help Next round of study group starts off in January. Let Greg know if you would like to participate Big picture question (maybe folded into discussion of long-term assessment): Are two-day workshops the best model for making a difference? I am becoming skeptical about how much sticks two months later...[Steve] Greg: "If you can give a three week or three month one, tell me and let me know how it goes." We have done a couple 2-week class (and official univeristy quarter-long). The two-day format seems good at telling you what you should learn, but not necessarily teaching it (with implications for curriculum). I like week long formats, personally [Anthony] I'm a fan of 2 day workshops, long enough to get some real material in, but able to fit into busy schedules [Emily Jane] Greg's Last Request: Send to Greg a nice hi-res photo of yourself in the SWC T-Shirt. Read More ›

You've Shown Me the C, Now Where's the Python?
Greg Wilson / 2012-12-16
The W3C's Provenance Working Group recently published a new draft of their proposed standard for tracking provenance on the web. It's pretty dense stuff: even the primer, which uses the word "intuitive" five times to describe itself, is hard to follow if you've never been immersed in Dublin Core, TURTLE, and the like. That isn't a criticism—this stuff is intrinsically hard—but I think most scientists won't be able to see the forest for the trees. Which raises a question: if this is the C, where's the Python? I don't mean, "Where are the libraries?", but rather, if this is the low-level detailed language for describing provenance, where's the 80/20 version that'll do what most people need with much less palaver? Loren Shure and I talked about this briefly at the recent ICERM meeting, and if we can put something together that: is a strict subset of the W3C proposal, works with a variety of files formats (e.g., CSV, JSON, MAT, HDF5, PDF, and PNG), and requires people to add no more than a couple of function calls to their code then I think we could actually get people to adopt it. Later: In reply to some early feedback, I think provenance needs to be stored in the files themselves, rather than beside the data, so that it's easier to move from place to place. I've been wrong before, though... :-) Read More ›

Three Non-trivial Use Cases for Git
Nicolas Limare / 2012-12-15
Today's guest post comes from Nicolas Limare. Here are three problems I encountered while introducing fellow grad students and post-docs to Git. These are situations where Git seems not to provide the solution we need, either because it requires the addition of an external service provider, unusual configuration steps not adapted to beginners, or simply because the functionality is missing in Git. Or maybe did I just ignore the perfect command or option which would have answered all my wishes? 1. Simple Collaboration Two or three people want to work together on a single repository, like they are used to with Subversion. No branch, no pull request, no GitHub, just a shared bare repo on the lab server, cloned and sync'd onto everyone's own laptop via SSH. The problem is to allow proper read/write permissions for everyone. Basic UNIX groups are not practical, for example because they require to go through the local sysadmin for every new project, and to define a new group per project. In that case, my solution is to use POSIX extended ACLs: setfacl -Rm u:USER:rwX '~/code/project.git' setfacl -Rm d:u:USER:rwX '~/code/project.git' It is fairly clean, standard, and works perfectly, but these ACLs are usually not enabled by default on filesystems and sysadmins may not want to allow them. Moreover, non-admin users are not used to work with this kind of permissions and having to touch them to collaborate feels somehow abnormal. 2. Sharing Binary Data I had a group of students working on a shared dataset. One person would produce the input data, and other persons would process them, in chain, each with their own program. Everything is in a Git repo, everyone's program is stored in source code and built with Make, the data is processed in chain with Make too, and so far everything is fine. But the input data is regularly updated, and it's a hundred of 10 Mpixel JPEG images. So the size of the repository quickly becomes quite heavy on everyone's machine (JPEG is already compressed, and doesn't shrink any further in Git, and binary diff of JPEG files is, or was, inefficient). This huge weight is the problem. We want to keep the history and to be able to go back to previous versions of the data, but we absolutely don't want every one in the group to have on their own machine 25 different versions of a 100MB dataset. The solution may be git annex, but it didn't exist at this time and it is not part of the standard Git toolset. It's certainly also possible with git rebase and squash every time the input data is updated, but this git-fu is too complex for our simple needs. I want people to be able to configure their local repo so that it only keeps less than a configurable amount of data locally, something like a permanent shallow clone. 3. Mirroring I was also trying to sell Git as a good solution to replicate files between two machines, either for website or web app deployment, or to maintain a workspace in different places. A sort of improved Rsync, with diff transfers and the full history. But to do that with Git, pushing updates from one machines to another, the receiving repo needs to automatically perform a checkout. This step requires some hook configuration (this is non-trivial for beginners), and the checkout must be very carefully configured to be performed at the right place, and to behave correctly when some files need to be deleted or local changes overwritten. This is too esoteric for a simple need. Read More ›

Lorena Barba's Reproducibility PI Manifesto
Greg Wilson / 2012-12-15
Prof. Lorena Barba, of Boston University, presented an excellent manifesto at the ICERM meeting on reproducible research: I will teach my graduate students about reproducibility. All our research code (and writing) is under version control. We will always carry out verification and validation. For main results in a paper, we will share data, plotting script & figure under CC-BY. We will upload the preprint to arXiv at the time of submission of a paper. We will release code at the time of submission of a paper. We will add a "Reproducibility" declaration at the end of each paper. I will keep an up-to-date web presence. So say we all. Read More ›

Two R Workshops at UBC in 2013
Greg Wilson / 2012-12-13
We're pleased to announce two workshops in 2013 at the University of British Columbia: Beginners R Date: January 11, 1 Day Venue: MSL 101 Format: 4×1.5 hour blocks R Basics, workspace, working directory, projects in RStudio Importing, exporting and cleaning data. Data manipulation, aggregation and indexing. Working with a statistical model object. All of the above will include lattice graphics and the usefulness of data frames, and will focus on code not the mouse. Advanced R Date: January 21 & 22 Venue: MSL 101 Format: 8×1.5 hour blocks (order to be determined) Graphics with ggplot2 Regular expressions, grep and gsub Writing functions in R S3 objects in R, methods and classes File handing from within R (interacting with the shell) Reproducible research within R Version Control within R Basics of creating an R package Registration for both is currently closed so that people who were waitlisted for our October workshop have first change to sign up. If there are still seats free after that, we'll post a note here. Read More ›

IPython Funding: Hurray!
Greg Wilson / 2012-12-12
Via email: We are proud to announce that we've received funding from the Alfred P. Sloan Foundation that will support IPython development for the next two years. Thanks to this $1.15M grant, Brian Granger and Fernando Perez will be working roughly 3/4 of their time on IPython, Min Ragan-Kelley will be the lead project engineer, and Paul Ivanov and Thomas Kluyver will work as postdocs on the project. Furthermore, Matthew Brett and JB Poline of Nipy fame will be working part-time on the development of notebooks for applied statistics in collaboration with Jonathan Taylor from Stanford. This will also give resources to support two annual all-hands development sprints at Berkeley where I hope all our core developers will be able to attend, as well as funding for cloud resources so that we can provide paid support for our CI hosting and nbviewer beyond what the free plans allow (and further services that may arise over time). We'd like to particularly thank Josh Greenberg, our program director at the Sloan Foundation, for the phenomenal guidance and support he provided during the grant proposal preparation and detailed review process. Needless to say, we are extremely excited about this news. We'll have more announcements to make over the next few weeks, but now I need to get a lot of administrative wheels in motion quickly, as the funding officially starts Jan 1, 2013! - Fernando Perez Read More ›

Feedback from Edinburgh
Mike Jackson / 2012-12-12
This week saw the first bootcamp to be held at The University of Edinburgh. The bootcamp was organised by the The Software Sustainability Institute and EPCC as part of the PRACE Advanced Training Centre. We had 36 researchers from geosciences, astronomy, biology, statistics, chemistry and mathematics and based at 8 institutions across the UK. Azalee and myself led the workshop and we had 7 helpers including Nancy Giang of the University of Dundee who we recruited after she attended the Newcastle bootcamp back in October, and David Jones of the Climate Code Foundation. We also had as helper/observers Mike Mineter of The University of Edinburgh and Norman Gray of The University of Glasgow also came along to help and to see what a bootcamp is like, with a view to running bootcamps for the physics and geosciences communities. Feedback from the attendees was good though some felt we went too fast (on make, for example), or too slow (on bash). For once, it seemed like we weren't bogged down with installation woes. We'd e-mailed attendees many times to tell them to do it before they arrived (and that the weekend before is too late!) and provided a VMWare Ubuntu VM but it helped that a number of researchers came from the same department and their systems team had configured a VM for them to log into (a model to be promoted!). From Azalee: We had 6 helpers + 1 instructor free at all times on the first day to assist with installation (along with content) issues. At first I thought that this was way too many people, but it turned out to be the right number to address the issues of the class as they arose and give students the individual help they needed as they needed it rather than during the breaks. The diversity of experience and expertise of the helpers also contributed to the success of this workshop. One deviation from previous workshops I've attended was a practical at the end. We asked students to write a script which read in a file of student names and grades and calculated the mean. They then wrote tests for the function, put it under version control, and created a make file for it. People worked in groups of 2-5 people from their table for about 1.5 hours. We asked students to get as far as they could in that time. The mean function had already been presented in an earlier exercise. A few groups completed the assignment while others only got as far as version control, but it was great to see the students working together, sorting through their misconceptions, creating working code, and asking questions in a very supportive environment. I think that this demonstrated the intended work flow as well as demonstrating how everything we taught them fit together. The down side of this was that it was that much less teaching time to cover new content. We created a pre-workshop questionnaire asking students to rate their level as well as their interest in the planned curriculum. We also asked them which operating systems they would be using. This was very helpful in designing the material and supporting installations. I'm still looking for the best way to get text files to students. We tried the download section of bit bucket (a great place for students to use wget and curl) but with long urls this was cumbersome. I'm leaning towards a pre-workshop repository with just the code you want students to start with and a post-workshop repository with all of the answers. I'd love to hear other people's suggestions/experiences. I also struggled with balancing making the size of the text big enough to be read by the class with the commands scrolling off the screen before the slower students could type them. Again - suggestions welcome. Here are the comments from the attendees, helpers and instructors, good and bad: Good Good to see all the different techniques available even if they will not all be used Learning more about Unix Impressed by version control/Mercurial/BitBucket which will use more of (5) Got to know about Mercurial before choosing SVN Had some quite specific problems that people helped me with Learning about makefiles (2) Nice and experience helpers, relaxed atmosphere Did not know a lot of the software, good to know more Learnt a lot in two days Problems with the computer, helpers helped out Interaction with people on the course and helpers were useful Fun Good intro to Python for a complete beginner Enjoyed being here and helping, and learning from helpers, also good to see how things work Bad Could not get the shared Mercurial to work Learned a lot but not sure what I will use Having to tell everyone a bad point Queue for coffee (2) Not enough on make, don't understand make Some sections went a bit too quick because I was not so familiar with it (2) Lot of things to learn Some Python could give more context e.g. hypoteneuse which people may not be able to remember Sometimes people fell behind and could not catch up (2) Mixed dynamics, talk, helpers helping - cleaner structure between presentations and practical would be useful Sometimes too easy, too difficult at other times Went a bit slow for bash and zooming through other stuff later on More on bash would have been good Too much time on Python Going a bit fast sometimes but there were enough helpers to help out Other observations from Azalee and myself, and from our helpers (who were great at taking notes as to how things were going) included: For bash, the concept of a home directory should be introducted explicitly. "history | grep 'some command'" is a good example of a pipe in action and a very useful command! There should be an explicit introduction to an editor (open file, edit, save, exit) otherwise attendees may just copy the instructor which can cause problems when an instructor is zooming through his use of XEmacs (guilty!). This was especially and issue in the version control section when some people forgot to put a commit message in the command-line. Revision control commit messages should be provided via the editor and not at the command-line to reduce the risk of attendees reusing commit messages from previous commands in their shell history (or using messages like "Commit 1", "Commit 2" etc). .py files should be introduced as soon as there is a need to input code fragments that extend over 1 line e.g. conditional, loop or function. This is a place where the ipython notebook might be a happy medium although it is one more item to install. Any code fragments we present must be "exemplary" - fully commented and idiomatic - ie. exactly how we'd write it ourselves if doing it "for real" because, as has been commented, people learn by reading others' code! David Jones wrote a blog post on his experience with more useful information and suggestions. Read More ›

Some of the Things We've Learned About Teaching Git
Greg Wilson / 2012-12-11
We've tried teaching Git six or seven times now; here's what some of the instructors have learned. Instructor #1 I spent too much time explaining the "stage." It's an important concept that sets git apart from other VCSes, but for beginning students it's probably too advanced. I wish I had used a more realistic project with several files and a history of logs. Creating a new project live is useful in teaching how to set up a git repository, but then I'm limited to working with a small history. I left the GitHub discussion until the end, and because we were tight for time I was not able to really get into it. I wish I had started to talk about GitHub early on, maybe as soon as their first commit. Instructor #2 I should have left out the part where folks set up their Git SSH keys. It's just not worth the confusion and only helps folks by letting them avoid typing in their passwords. Since more people understand passwords than SSH keys, just skip the SSH key set up part. Also, I think I could have crammed in more content if the examples early on had been better/more time efficient. I would have liked to cover "git commit -amend," "git reset", "git merge --no-ff", various workflows, the importance of keeping your main branches clean and working in side branches, etc. I also would have liked to end the whole thing with a more elaborate (though straightforward, ideally) exercise that folks work on by themselves, encounter conflicts, etc. As it stands in my notes, they only really encounter a conflict once. Talking about local version control all the way through branches was good for me, but, as Instructor #1 suggests, it was helpful to introduce github immediately after branches and as soon as remotes come up. Instructor #3 Don't pretend it's SVN or anything, just teach git as Git. Use lots and lots of graphics and demonstrations, especially demonstrations of opposing cases, e.g., here's a pull where everything went great, here's a pull with a conflict, or here's a push that goes smoothly, and here's what happens when your local copy is out of date. Read More ›

Things Are Going Well in Texas on Ada Lovelace's Birthday
Greg Wilson / 2012-12-10
Read More ›

What To Work On In 2013
Greg Wilson / 2012-12-09
Votes are in and tallied: here's what you'd like us to work on in 2013. Read More ›

Creating a Task List
Greg Wilson / 2012-12-08
We're going to start using GitHub to manage our content, and as a corollary of that, we're going to use its issue tracker to manage the things we're working on. I've already put a handful of medium-to-large items in it; suggestions would be welcome, particularly if accompanied by, "...and I'll do the work." :-) Read More ›

Why Be an Instructor
Greg Wilson / 2012-12-05
Our instructors are all volunteers—bootcamp hosts cover their travel and accommodation costs, but they're not paid for their time. So why do they do it? Make the world a better place. As I say in a lot of my talks, the two things we need to get through the next hundred years are more science and more courage. I don't know if we can do much about the latter, but we can sure help a lot with the former. Make their own lives better. Most of the time, we try to have astronomers teach astronomers, ecologists teach ecologists, and so on. These are people whose tools instructors might one day want to use themselves, so by making them more clueful, instructors are helping themselves. It's fun. How could it not be? You get to stand up and look smart in front of a bunch of smart people who actually want to be there, and you don't have to mark anything afterward. Build a reputation. Showing up to run a really useful workshop is a great way for people to introduce themselves to places they'd eventually like to work, and a great way to make contact with potential collaborators. This is probably the most important reason from Software Carpentry's point of view, since it's what makes our model sustainable. Get practice teaching. We're doing more every year to train instructors, and giving them chances to teach online as well—both of which are useful for people with academic careers in mind. Help get people of diverse backgrounds into the pipeline. Computer Science is 12-15% female, and that figure has been dropping since the 1980s. From what I've seen, there's a similar gender skew among computationally-oriented people in other sciences, which if left alone will be self-perpetuating. Some of our instructors are involved in part because they want to break that cycle and be a public example of a competent, confident woman programmer. Teaching forces you to learn new things, or learn old things in more detail than you already know. See for example "Graduate Students' Teaching Experiences Improve Their Methodological Research Skills" The more you know, the less you have to write yourself. Putting a grant application together? Have a site review coming up? We probably have slides for that... :-) And why do people do online office hours? For some, the primary incentive is a little bit of guilt: we can't give everyone all the help they need during a two-day workshop, and online follow-up is a way to make up for that. For others, it's a chance to pick up and practice some skills that are increasingly in demand: everyone believes that a lot of teaching is going to move online in the coming years, and this is a way to figure out how to do it. Read More ›

Who Can Run a Software Carpentry Workshop?
Greg Wilson / 2012-12-05
A couple of people have mailed us in the past week to ask if they can use our materials and run a workshop on their own. The short answer is "yes"; the longer answer is "yes please" :-) All of our lessons are covered by the Creative Commons - Attribution 3.0 Unported licensed (often abbreviated "CC-BY"), which in brief means that you can share or remix stuff however you want, as long as you cite us as the original source (e.g., by providing a link back to this site). There is one caveat, though. The name "Software Carpentry" and the Software Carpentry logo are both trademarked, so if you're doing something that doesn't use our material, or cover more or less the same things we do, you should call it something else, and use some other graphics. We're all in favor of teaching people iPhone game programming, multigrid methods using C++ and MPI, and scientific grant writing, but we've worked hard to establish an identity for Software Carpentry, and we'd like to protect that. If you do decide you want to try teaching this stuff, please give us a shout: we'd be happy to help if we can, or hook you up with an instructor in your area. We're also always happy to hear from people who'd like us to come and run workshops for them to get things started: again, just give us a shout to get the ball rolling. Read More ›

Sustainability
Greg Wilson / 2012-12-05
Software Carpentry wouldn't exist without support from the Sloan Foundation, Mozilla, and a lot of other supporters, but that support won't last forever. Over the next two years, we need to find a way to make this project self-sustaining; to achieve that, we're currently doing the following: Use a "host pays expenses" model so that our budget doesn't have to grow as we scale up.Status: mostly done. Make contribution easier, so that people who want to help can do so with little or no overhead.Status: we should finish migrating to GitHub by the end of the year, and using IPython Notebooks for teaching will help too, but the world still (desperately) needs a version control-friendly format for general teaching materials. Demonstrate career value to instructors, so that people can keep teaching for us even when they're crunched by other commitments.Status: we're adding testimonials all the time, and hope to have some bigger stories by the middle of 2013. Assess our impact in ways scientists will find credible.Status: we're currently exploring ways to do this, and would like to hear from anyone who'd like to help. Lobby funding agencies and journals to ask hard questions about software (or about software development processes). In the long term, we need to convert twenty percent of scientists to our way of thinking, but we can accelerate that by focusing on the right twenty percent.Status: we haven't started this yet. Get scientists to include training in budgets and schedules.Status: this is the longest shot of all, because it depends on funding agencies being amenable to the idea. In practice, it may have to wait until we reach our 20% target... If you can think of other things we can do to ensure Software Carpentry's long-term viability, we'd like to hear from you. Read More ›

Six Years Later
Greg Wilson / 2012-12-05
I'm giving a short talk next week at an ICERM workshop on Reproducibility in Computational and Experimental Mathematics, which has prompted me to look back at a talk I gave six years ago at SciPy'06. The slides from that talk are included below; I'm now trying to decide whether I'm pleased or depressed by how little has changed. Read More ›

Our First Hackathon
Greg Wilson / 2012-12-05
We'd like to invite our friends to a week-long hackathon at Mozilla's office in Toronto in June to: create new teaching material give each other feedback on teaching (we'll run a workshop during the week) build new tools (like extensions for the IPython Notebook) get to know each other We will have a small budget to help with travel and accommodation, but unfortunately most people will have to fund themselves. On the other hand, Toronto's beautiful that time of year, and there are lots of interesting places a visiting scientist could give a talk. If you're interested, please stay tuned for dates (which we should have in January). We hope to see you there! Read More ›

Moving Up and Moving Down
Greg Wilson / 2012-12-05
Our existing workshops/material are aimed at people who know enough to write a hundred-line script, but don't yet use version control or do any systematic testing. Many scientists either haven't gotten this far, or are well past it, and we'd like to start helping them. We think that means we need: "Software Carpentry for Complete Beginners", which would only cover basic programming and version control, and "Advanced Software Carpentry", which would cover parallel computing, integration with legacy code written in C/C++ and Fortran, and so on. The challenge, as always, is resources: doing this requires people to put together the content and teach half a dozen times to tune it and help show others how to teach it as well. It's a big commitment, but the potential rewards are pretty big too. If you're willing to commit a hundred hours or so to help make the world a better place and give your career a boost, please get in touch. Read More ›

See You at PyCon 2013
Greg Wilson / 2012-12-04
The list of talks for PyCon 2013 has been posted, and we're very pleased to see three by members of our team: Titus Brown: Awesome Big Data Algorithms Matt Davis: Teaching with the IPython Notebook Jessica McKellar: How the Internet Works There will also be the first-ever Python Education Summit and a whole lot more—hope to see you there. Read More ›

European Grid Infrastructure is Organizing a Software Carpentry Workshop
Greg Wilson / 2012-12-01
The European Grid Infrastructure project is putting together a Software Carpentry workshop for Manchester in April-early details are in their post, and we'll post more details ourselves as they evolve. Read More ›

Good News About Software Carpentry (and More)
Greg Wilson / 2012-11-30
We're pleased to announce that Mozilla and Software Carpentry will be continuing and expanding our work to empower scientists with computer science and webmaking skills. We have been working with the Sloan Foundation to teach scientists the concepts, tools and techniques they need to use computers and the web to accelerate their research. In 2013-14, the program will offer an expanded set of workshops, online tutorials, and a new peer-to-peer mentoring program. In the same way that Mozilla's Webmaker program aims to move millions of people from using the web to making the web, Software Carpentry and Mozilla's new Webmaking Science Lab will provide scientists with the tools they need to fully leverage the power of the web in their work. To help advance these goals, Mozilla is now seeking to hire a Director for our Webmaking Science Lab, and is also offering a rotating three-month paid internship for a graduate student to help organize and run workshops, create new lessons and assess impact. Want to get involved? Apply or learn more about the Webmaking Science Lab Director position. Get in touch about our internships. Contact us about hosting a Software Carpentry workshop for universities, government organizations, or anyone else who'd like to do more science in less time, with less pain. Participate in an upcoming workshop. Read More ›

Alpha Testing Ideas for the IPython Notebook
Greg Wilson / 2012-11-27
Before I post any of these ideas on the IPython Notebook wiki, I'd be grateful for feedback from our learners and instructors. Most of these (maybe all of them?) would be done as extensions, rather than as patches to IPyNB itself. Please let us know what you think; I'll revise and then cross-post to the IPyNB folks. Our Users Anna teaches scientists how to program. She usually does this by coding live in front of a small class (20-40 people), but also wants to create recordings for people to view outside class. Farouk is a graduate student in chemistry who wants to learn how to program. He learns best when he can bounce ideas off other people. Progressive Reveal Context Anna knows that showing learners too much is just as bad as showing them too little. In particular, putting an entire lesson in front of people at once is distracting: she would prefer to reveal one snippet at a time so that they're always concentrating on the right thing. Slideshows are one way to do this, but Farouk finds slideshows frustrating, because everything he is shown disappears to make room for something new. Proposal Add "progressive reveal" to the notebook, so that clicking on a button (or better yet, a key combination) will make the next section of the notebook visible. Visually, this will appear to append a cell (or a group of cells) to what's already in the notebook. However, the implementation should store everything in the notebook, and just use Javascript to show/hide sections. There should also be "unshow", "show all", etc. The authoring interface used to define what gets revealed in turn is probably harder to design and implement than the actual display code. When editing, Anna will want to see the entire notebook at once, but will also want some sort of visual hints to show what the revealed blocks will be (e.g., an outline box around each section). Anna will also want an equally-intuitive way to define and change the scope and order of what's revealed. Note: Anna will almost always reveal things in "append" order, but there are cases where she will want to reveal a block between already-visible blocks, i.e., go back and say, "So this is what we should have done at this point." There may also be cases where she will want to replace blocks rather than append to them, but there are good pedagogical reasons not to support this: the final display should be a record of everything that was said and done, not just the end result. Multiple-Choice Questions Context Peer instruction is a scalable teaching technique in which: the instructor poses a multiple-choice question with 3-4 plausible answers each learner votes for an answer (typically using a clicker) learners discuss their answers in small groups (typically 3-4 people) the instructor presents and explains the correct answer learners discuss again in order to clear up one another's misconceptions This process is inherently synchronous. In order to implement it online, Anna needs a way for learners to talk to one another in small groups, and a way for them to vote. The former is handled (badly, but handled) by Skype, Google Chat, and other tools; she'd like support in the notebook for the latter, both to support peer instruction, and also to handle "can we move on?" and "how are we doing?" questions in synchronous online classes being run in broadcast mode. Proposal One possibility is to add support for simple real-time voting to the notebook as part of the implementation of multi-user servers. (A possible starting point or inspiration would be Socraticqs.) However, this is a specialized enough need that it should instead be used as a test of the plugin API for the multi-user server and notebook: it should be possible to add voting, tallying, and display without modifying the core. Auto-Interrupt Based on Lines or Time Context Farouk is learning how to write while loops. The results have been unsatisfying: infinite loops with print statements fill up his cells, while infinite loops without output are impossible to distinguish from a slow server or a bad network connection. Proposal Allow users to specify how long the back-end should be allowed to process a command, and/or how much output it should be allowed to produce. This can be piggy-backed on top of the existing Ctrl-C interrupt mechanism: when execution starts, a timer is started in the browser, and an interrupt key press event is faked when the timer expires. Something similar could be done based on how much output is received (e.g., "halt the process at 100 lines"), though this is probably less important to implement. Synchronizing Test Output to Changes in Code Under Test Context Farouk is learning how to write unit tests in the notebook. He hasn't converted to test-driven development yet, so he: writes a function writes a few unit tests runs the unit tests tweaks the function repeats He is frequently interrupted by other tasks (like answering his phone or updating Facebook). When this happens, he sometimes forgets what he was doing, and thinks that the currently-visible output of his tests is in sync with the code, when in fact the code has changed and the tests haven't been re-run. Proposal Farouk should be able to specify that some cells (the ones holding tests) are automatically run whenever changes are made to other cells (the ones holding the code under test). This is a restricted two-stage case of specifying arbitrary cell execution order: it may be enough to create two cell groups (one for code, one for tests) and re-execute everything in the test group whenever anything in the code group changes. Timed Text Recording Context Anna has been recording screencasts to show learners how to use lists, write functions, and so on. She and Farouk both find them frustrating: the text in the videos is never as easy to read as they'd like it's impossible to search for text in the video, or to copy and paste it into Farouk's browser Anna would like the "video" to play in the notebook itself, in the same way that ttyrec replays a shell session inside a terminal window. She would also like this replay to be synchronized with a soundtrack. Proposal Record text events (typing and program output) with millisecond-level timestamps in the notebook. This data should be stored in an auxiliary file outside the notebook itself. Provide a tool like the notebook viewer that will reconstruct a notebook character by character (and image by image) given such a file. Use something like Popcorn.js to synchronize this replay with an audio soundtrack. Read More ›

Titus Brown on the Scripps Institute Bootcamp
Greg Wilson / 2012-11-25
Hot on the heels of Cait Pickens' posts about our workshop at the Scripps Institute comes this post-mortem from Titus Brown. Again, there's lots of good analysis, and many useful recommendations—particularly concerning installation and setup headaches (which continue to be our second-biggest problem). Read More ›

Cait Pickens on the Scripps Institute Bootcamp
Greg Wilson / 2012-11-24
Cait Pickens (a grad student from Michigan State University) has written a great series of posts about the bootcamp she helped teach at the Scripps Institute a couple of weeks ago: Preparation Day 1, Part 1 Day 1, Part 2 Day 2, Part 1 Day 2, Part 2 Day 2, Part 3 She also has a post describing how they used HipChat in the classroom, which is almost what I think we need to do peer instruction online. There's lots of good stuff to think about here—recommended. Read More ›

Who Wants To Build a Faded Example Tool for the IPython Notebook?
Greg Wilson / 2012-11-19
While I'm asking people to write code for me, here's a small addition to the IPython Notebook that someone should be able to knock off in an hour. The starting points are cognitive load theory and the idea of faded examples: long story short, one good way to teach is to show learners one example in detail, then ask them to fill in the ever-larger blanks in a series of similar partly-worked-out examples. Here's one that Ethan White did for our online study group: def get_word_lengths(words): word_lengths = [] for item in words: word_lengths.append(len(item)) return word_lengths This function creates and returns a list containing the lengths of a series of character strings. The list of character strings is passed in as the argument words, and the local variable word_lengths is initialized to an empty list. The loop iterates over each character string in the list. Each loop iteration determines the length of a single character string using the function len() and then appends that length to the list word_lengths. The final word_lengths list is returned by the function. An example of using it is: >>> print word_lengths(['hello', 'world']) [2, 2] Given that, we would ask learners to fill in the blanks to make this verison work: def word_lengths(words): word_lengths = ____ for item in range(len(____)): word_lengths.append(len(____)) return word_lengths and work up to: def word_lengths(words): return [____ for ____ in ____] I'd like to be able to put exercises like this into the IPython Notebook. In particular, I'd like a two-by-two display: code with blanks (read-only) user's code (editable) [reset] expected output (read-only) actual output (colorized) [run] The column on the left is read-only. Whenever the user clicks the "reset" button, the code with blanks is copied over to the right column. Whenever the user clicks "run", the code is re-run, and the actual output is compared to the expected output (and if it's textual, differences are colorized to draw the eye). I think this is "just" a bit of Javascript—anyone want to take a crack at it and let me know? Read More ›

The Tool (I Think) We Need To Do Peer Instruction Online
Greg Wilson / 2012-11-19
Clay Shirky's recent essays "Napster, Udacity, and the Academy" has attracted a fair bit of attention. I've written about some of the things it doesn't discuss on my personal blog, but here, I'd like to use it as a jumping off point for a description of a tool I'd like someone to write for us. Given that my last request produced working code in just a few hours, I'm hoping one of you will wow me again :-) Our starting point is peer instruction, a scalable, evidence-based teaching model that replaces "sage on the stage" with the following interactive cycle: Instructor poses multiple-choice question. Learners commit to an individual answer (typically by voting with a clicker). Learners discuss their answers in small peer groups (typically 3-4 people) and re-vote. Instructor presents correct answer. Learners discuss again (so that those who understood can clear up the misconceptions of those who didn't). Chris Lee, at UCLA, has built a tool called Socraticqs to implement this model using a little web app running on the instructor's laptop (which they connect to over WiFi). I'd like to go one step further and try to do this over the web—after all, it is supposed to be a medium for collaboration. Here's what I'm imagining: Instructor broadcasts to learners via screen sharing, synchronized slides, or some kind of in-browser co-piloting like Towtruck. Learners vote on multiple-choice question over the web. Learners put into small groups, possibly based on their answers, for discussion. When this happens, the system automatically switches from 1-to-N broadcast to k-way all-see-all sharing. The instructor can drop into any of these discussions at any point. After several minutes, the system pulls everyone back into whole-class mode so that learners can re-vote and the instructor can present the correct solution. The system them switches people back to the same groups for wrap-up discussion. The pieces of this all exist, more or less: we can do one-to-many broadcast and four-way split-panel talking-heads chats. What's lacking is the integration: in particular, leaving one Skype call or Google Hangout and joining another several times an hour will introduce a lot of friction and frustration. Based on our experiments earlier this year with online tutorials, I think this would be a much better online learning experience for most people. Does it already exist? If so, where? And if not, how hard would it be to build? Read More ›

Updating Our Reading List
Greg Wilson / 2012-11-17
We're planning to launch an update to this web site in the next few days, and as part of that, we're revisiting some of our content. For example, we'd like to shorten and update our recommended reading list—the current short list is below, and we'd welcome suggestions for additions. However, if you'd like to put something in, please suggest something to take out (so that the list doesn't become a Sunday stew). Jennifer Campbell, Paul Gries, Jason Montojo, and Greg Wilson: Practical Programming: An Introduction to Computer Science Using Python. Pragmatic Bookshelf, 1934356271, 2009. An introduction to programming using Python that includes material on building GUIs, working with databases, and a few other useful things. Michael Feathers: Working Effectively with Legacy Code. Prentice Hall PTR, 0131177052, 2004. If code is exercised by unit tests, changes can be made quickly and safely; if it isn't, they can't, so your first job when you inherit legacy code should be to write some. That's where this book comes in. What to know three different ways to inject a test into a C++ class without changing the code? Or which classes or methods to focus testing on, or how to break inter-class dependencies in Java so that you can test one module without having to configure the entire application? It's all here, along with lots of other useful information. Chris Fehily: SQL. Peachpit Press, 0321118030, 2002. Describes the 5% of SQL that covers 95% of real-world needs. While it moves a little slowly in some places, the examples are exceptionally clear. Karl Fogel: Producing Open Source Software: How to Run a Successful Free Software Project. O'Reilly Media, 0596007590, 2005. This book is an excellent guide to how open source projects actually work. Every page offers practical advice on how to earn commit privileges on a project, get it more attention, or fork it in case of irreconcilable differences. Robert L. Glass: Facts and Fallacies of Software Engineering. Addison-Wesley Professional, 0321117425, 2002. Most of us have heard know that maintenance consumes 40-80% of software costs, but did you know that roughly 60% of that is enhancements, rather than bug fixes? Or that if more than 20-25% of a component has to be modified, it is more efficient to re-write it from scratch? Those facts, and many more, are in this little book, along with references to the primary literature to back up every claim it makes. Steve Haddock and Casey Dunn: Practical Computing for Biologists. Sinauer, 0878933913, 2010. The best general introduction to "the other 90%" of scientific computing on the market today, this book covers all of the core material of this course and more. Hans Petter Langtangen: Python Scripting for Computational Science. Springer, 3540739157, 2007. The book's aim is to show scientists and engineers with little formal training in programming how Python can make their lives better. Regular expressions, numerical arrays, persistence, the basics of GUI and web programming, interfacing to C, C++, and Fortran: it's all here, along with hundreds of short example programs. Steve McConnell: Code Complete: A Practical Handbook of Software Construction. Microsoft Press, 0735619670, 2004. This classic handbook covers everything from how to avoid common mistakes in C to how to set up a testing framework, how to organize multi-platform builds, and how to coordinate the members of a team. Wes McKinney: Python for Data Analysis. O'Reilly, 1449319793, 2012. A practical introduction to data crunching in Python that covers a lot more than just statistics. Dan Pilone and Russ Miles: Head First Software Development. O'Reilly Media, 0596527357, 2008. Many people will find this book's cartoon-ish format and awkward jokes annoying, but it's still a good hype-free introduction to agile development practices. Deborah S. Ray and Eric J. Ray: Unix and Linux: Visual QuickStart Guide. Peachpit Press, 0321636783, 2009. A gentle introduction to Unix, with many examples. Robert Sedgewick: Algorithms in C. Addison-Wesley Professional, 0201756080, 2001. These books are a guide to all the other conceptual tools that working programmers ought to have at their fingertips, from sorting and searching algorithms to different kinds of trees and graphs. The analysis is far more accessible than that of many other textbooks, and while the author's use of C may seem old-fashioned in an age of Java and C++, it does ensure that nothing magical is hidden inside an overloaded operator or virtual method call. Read More ›

Who Wants To Write a Little Code?
Greg Wilson / 2012-11-16
We have always steered away from building libraries to use in teaching—we want to show people the "real" stuff, and we can't afford to maintain things. The IPython Notebook has us rethinking that: now that we can display images inline, it would be wonderful if we could leverage Mark Guzdial and Barbara Ericson's work, and teach basic Python using simple image manipulation for examples (rather than text munging). Their research found that using images led to higher retention: more students stuck around, and the ones who did, remembered and used more of what they'd learned. The problem is, libraries like PIL and scikit-image aren't novice-friendly or teaching-oriented. Again, drawing from Guzdial and Ericson's work (which my co-authors and I did in Practical Programming), we want something like this: >>> from skimage import novice # special submodule for beginners >>> picture = novice.open('kite.png') # create a picture object from a file >>> print picture.format # pictures know their format... 'png' >>> print picture.path # ...and where they came from... '/Users/example/kite.png' >>> print picture.size # ...and their size (400, 500) >>> print picture.width # 'width' and 'height' also exposed 400 >>> picture.size = (200, 250) # changing size automatically resizes >>> for pixel in picture: # can iterate over pixels ... if (pixel.red > 0.5) and \ # pixels have RGB (values are 0.0-1.0)... ... (pixel.x < picture.width): # ...and know where they are ... pixel.red /= 2 # pixel is an alias into the picture ... >>> print picture.modified # pictures know if their pixels are dirty True >>> print picture.path # picture no longer corresponds to file None >>> picture[0:20, 0:20] = (0., 0., 0.) # overwrite lower-left rectangle with black >>> picture.save('kite-bluegreen.jpg') # guess file type from suffix >>> print picture.path # picture now corresponds to file '/Users/example/kite-bluegreen.jpg' >>> print picture.modified # and is now in sync False The key thing here is that the iterator variable aliases the pixel, but has extra information (its XY coordinates). Yes, this is a bit odd if you're used to standard image processing libraries, or worried about performance, but this is what works best for teaching. If this existed, we'd rewrite our intro Python material (variables, loops, conditionals, functions) on top of it. My question is, would any of our regular readers like to build it for us? Read More ›

We Apologize for the Interruption in Our Service
Greg Wilson / 2012-11-16
Our hosting service, Dreamhost, is relocating various domains to new servers. As a result, service at this site is suffering from longer-than-usual lag (90 seconds or more to load the home page, worse for pages with embedded videos). We've filed a problem report, and will try to get things back on track as quickly as possible. We will also start looking for a new hosting plan: slow-and-slower has become all too common here :-( Read More ›

Matt Davis's Great Californian Adventure
Greg Wilson / 2012-11-16
Matt Davis (who works at the Space Telescope Science Institute, and has become one of our regular instructors) taught three bootcamps in eight days back in October. He has finally recovered enough to write a description of what he did, how it went, and what he learned. The two biggest lessons are: Try to get to know your audience beforehand by talking to organizers and/or using surveys. Try to plan bootcamps so that the students come from similar backgrounds and, if possible, the same department or lab. Read More ›

Making a Difference at LBL
Matt Davis / 2012-11-16
Without any doubt, the best thing about teaching for Software Carpentry is making a difference for someone and getting to hear about it. One of our students at the Lawrence Berkeley Lab, nuclear engineer Dr. Bethany Goldblum, was impressed enough by Katy Huff's presentation of git that she wanted her team to start using version control. Bethany's description of her team's past code management practices is typical of academic projects, with many people modifying their own copy of a code until that "resulted in over 20 versions of the codes, each with bugs of its own." On their current project "a postdoc is in the process of developing the code, but now every time he makes a change, he uploads it to a different directory on our analysis machine and we just have multiple copies of almost the same code floating around. Unless one is careful, they may not know which code is the most recent version. I believe there is some fear there to undo anything that the original developer put in place." These are exactly the problems version control is meant to solve. With a project under version control there is only one canonical source of the code. Changes made by individual contributors are easily shared with the group by committing them back to the original source. The entire history of the project is always available with records of every change and why it was made. Special versions of the code, such as original copies or versions used for papers can be tagged with meaningful names to be easily accessible. Bethany and some of her team doing a science. Left to right: Nick Brickner (UC Berkeley), Lee Bernstein (LLNL), Bethany Goldblum (UC Berkeley) Bethany told us "I didn't even know version control existed until I attended the [UC Berkeley] Python BootCamp and didn't really understand how to use it until after the Software Carpentry workshop." Bethany's team now has their code in a git repository on the code hosting site Bitbucket (chosen because LBL provides free private hosting there). All of their changes are tracked and each member of the team always has an up-to-date version of the code. They are even using Bitbucket's issue tracker and wiki to organize tasks and documentation! Today Bethany's team is protected when inevitable mistakes like deleting or overwriting source files occur. They won't find and then forget about bugs. The whole team will benefit when someone adds a new feature. Today Bethany and her team are working like software engineers. This is what Software Carpentry is all about: helping scientists and engineers learn the tools and practices that make software development a tolerable, even enjoyable, task for those of us that do it every day. And nothing is more rewarding to the instructors who contribute their time and experience than a success story like this. Read More ›

This Is What We Do
Greg Wilson / 2012-11-15
Day 1 of the workshop at the Scripps Institute: a room full of biologists learning how to do better science faster by building things the right way. Read More ›

Workshop for High-Energy Physics at UCL
Ben Waugh / 2012-11-14
Following the success of the open bootcamp at UCL in April/May this year, we decided to have a go at introducing some Software Carpentry to the first-year postgraduate students in my own research group, High-Energy Physics (HEP). There is already a longstanding University of London intercollegiate lecture programme for these students, including some programming in C++ and Python. The new workshops aim to help participants consolidate and extend their existing Python knowledge, as well as learning about more general best practice in software development, by following an extended worked example using version control and test-driven development. In a variation on the "normal" two-day Software Carpentry format, we are running two one-day workshops about a term apart, with the first taking place last Thursday. Attendance was 65% (13 of 20 students) with a single instructor (me) and two other helpers: Sean Brisbane of the Oxford HEP group, where he intends to run something similar, and James Hetherington of UCL's new Research Software Development Team. Students were able to use their own laptops, or use the provided UCL standard Windows desktop machines to SSH to Linux machines at CERN, UCL or their own institution. It took a while to get the Exceed X server configured appropriately on the Windows machines, and get everyone connected to a suitable Linux machine, which also meant creating accounts on the UCL HEP cluster for students with no other Linux system available. We used PyROOT so that we could use the ROOT toolkit, which is ubiquitous in HEP, while still programming in Python rather than C++. Predictably there were various problems with incompatible versions of Python and ROOT, but these were all overcome. For version control we used Git, with an emphasis on its use as a personal extended undo facility: we didn't get as far as branching or setting up a shared repository. Nor did we get as far as I had hoped into unit testing, but we did some informal testing in the Python interpreter, and introduced the Python debugger. Next we will arrange some online tutorials and discussion forums, and use the feedback from these to guide the plan for the second workshop in February. Pre-workshop survey results never heard of it heard of it but never used it have used it but don't understand it use occasionally use regularly expert Bash (or other Unix/Linux shell) 2 11 Python 4 4 4 1 version control 3 4 4 2 unit testing 11 2 Good and bad Good Bad applying Python knowledge diversity of environments (OS, ROOT, Python versions) causes problems introduction to version control: wouldn't have discovered how to use it alone better to have worksheet so can continue while others debug their system if yours is working learned about XEyes needed another break in the afternoon everything covered was new to me took too long to get everyone set up enjoyed subject-specific examples need worksheet so can catch up if behind good to have an introduction to PyROOT didn't get far enough into unit testing helpers were constructive and didn't make us feel like idiots moved too fast at times liked tips on readable code Read More ›

FOSDEM 2013
Greg Wilson / 2012-11-14
Via Sylwester Arabas (one of the organizers): A day-long session ("devroom") on Free/Libre and Open Source Software (FLOSS) for scientists will be held during the next FOSDEM conference, Brussels, 2-3 February 2013 (http://fosdem.org/2013). We aim at having a dozen or two short talks introducing projects, advertising brand new features of established tools, discussing issues relevant to the development of software for scientific computing, and touching on the interdependence of FLOSS and open science. You can find more info on the call for talks at http://slayoo.github.com/fosdem2013/. The deadline for sending talk proposals is December 16th 2012; please send your submissions or comments to foss4scientists-devroom@lists.fosdem.org. Read More ›

A Mostly Successful Decade
Greg Wilson / 2012-11-14
A video of Fernando Perez's keynote at PyCon Canada, titled "Science And Python: retrospective of a (mostly) successful decade", has been posted. If you'd like to see what the IPython Notebook can do, and how it fits into a broader picture, it's well worth watching. Science And Python: retrospective of a (mostly) successful decade Read More ›

Web 4 Science
Greg Wilson / 2012-11-13
Titus Brown (a long-time supporter of, and contributor to, this project) has a series of posts on his blog about using the web for science. As he's well ahead of the curve in doing this himself, I thought readers of this blog would be interested: opening remarks the awesomeness we're experiencing the challenges ahead strategizing for the future tech wanted There's lots here to chew on—I'm sure he'd welcome comments from our readers. Read More ›

Pre-Assessment
Greg Wilson / 2012-11-13
One of the recurring problems in our bootcamps is that at any point, about 1/4 of people are lost, and 1/4 are bored. The only fix we can think of is to let people self-assess before arrival to determine whether this is the right thing for them or not (and to let instructors know more about who they're going to be teaching). We've used variations on the questionnaire below a couple of times with useful results; we'd appreciate feedback on: what else we should ask (please give us the actual question, rather than simply, "something about such-and-such") what we could take out how intimidating you think this might be to people who aren't confident in their skills—we'd really like to not frighten away the people who need us most 1. Career stage: Undergrad Grad Post-doc Faculty Industry Support staff 2. Disciplines: Space sciences Physics Chemistry Earth sciences (geology, oceanography, meteorology) Life science (ecology and organisms) Life science (cells and genes) Brain and neurosciences Medicine Engineering (civil, mechanical, chemical) Computer science/electrical engineering Economics Humanities/social sciences Tech support/lab tech/support programmer Admin 3. Platform: Linux Mac OS X Windows 4. A tab-delimited file has two columns: the date, and the highest temperature on that day. Produce a graph showing the average highest temperature for each month. could do it easily could struggle through wouldn't know where to start Language/tool I would use: ____________ 5. Write a short program to read a file containing columns of numbers separated by commas, average the non-negative values in the second and fifth columns, and print the results. could do it easily could struggle through wouldn't know where to start Language/tool I would use: ____________ 6. Check out a working copy of a project from version control, add a file called paper.txt, and commit the change. could do it easily could struggle through wouldn't know where to start Tool I would use: ____________ 7. In a directory with 1000 text files, create a list of all files that contain the word Drosophila, and redirect the output to a file called results.txt. could do it easily could struggle through wouldn't know where to start Tool I would use: ____________ 8. A database has two tables Scientist and Lab. The Scientist table's columns are the scientist's student ID, name, and email address; the Lab table's columns are lab names, lab IDs, and scientist IDs. Write an SQL statement that outputs a count of the number of scientists in each lab. could do it easily could struggle through wouldn't know where to start Tool I would use: ____________ Read More ›

More Oxford feedback
Mike Jackson / 2012-11-07
Following on Phil's analysis of the feedback from the first Software Carpentry workshop in Oxford, here are the round-the-room comments from the attendees, helpers and instructors, good and bad... Good Learned something. Mercurial, very useful yet simple. Can apply straight away. Practical at end seemed useful way of tieing everything together. Doing things in synch with lecturers. Scientific approach to programming. Whole approach. Good overview of the key concepts and foundation to learn more. Focussed. Course as a whole is good. Thorough - basic to advanced - and see new ways to do basics. Good practice, can take forward. Best practice. Really good overview and intro to things not heard of before but would be very useful (version control, MDAnalysis). Make and use for things other than C. Other instructors chipping in on current instructor was good. Loved it, especially version control and make. Bad Keeping up with the instructors typing on the projectors, so slow the pace. Resources and case-studies from the simple examples to complex examples. Would like more of a chance to apply what was learned with the instructors around, or at least feedback on a plan. Most sessions ran over. Problems with later aspects of Python. Balance between beginner and advanced - always wasting someones time. Knowing simpler Python commands might've helped beforehand. Sometimes a bit too fast. More time for exercises. Target the course a bit more to their area. Helps to start with, but good to have more courses and more on each area. More time to work on problems and work through case-studies. Downloading all the files for every exercise. Days away from work. Read More ›

More Tips
Greg Wilson / 2012-11-06
Philip Fowler, who recently hosted a bootcamp at Oxford, has written a 5-point guide for people who are thinking about doing it themselves—we hope you'll find it useful. Read More ›

An Administrative Note
Greg Wilson / 2012-11-05
In order to maintain focus, I'm moving discussion of general education & technology issues over to my personal blog. I will continue posting items specifically related to teaching computational competence to scientists here. Read More ›

Winter School on Reproducible Research
Greg Wilson / 2012-11-04
It is a pleasure to announce that the 13th Geilo Winter School will take place January 20th-25th, 2013 at Dr. Holms Hotel, Geilo, Norway. As you can see from the attached flyer, the Winter School will give an introduction to reproducible science and modern techniques for scientific software development, a topic we hope many of you find interesting. The aim of the winter school is that participants will be able to apply the learned techniques to make their own research reproducible, and topics that will be covered include reproducible research, verification and validation, software testing, and continuous integration. More information is available on the winter school webpages, and participants are also encouraged to present their own research in a poster. The registration deadline is December 3rd, 2012. Read More ›

How to Help at a Bootcamp
Greg Wilson / 2012-11-03
Aleksandra Pawlik has written a five-point guide to helping out with Software Carpentry bootcamps: Familiarity is not enough. You're not just there to troubleshoot, you're there to troubleshoot fast. Expect the unexpected. Be proactive. IT all-rounder. This, and Katy Huff's "How to Help at a Bootcamp", are tremendously valuable: if you have tips of your own to add, please send 'em in. Read More ›

Pelican Guts: on content management for Software Carpentry
Jon Pipitone / 2012-11-01
This morning I spent some time talking over the content management problem for the Software Carpentry website with Greg. To recap: we'd like to make it easier for anyone to contribute and manage the website content (lecture pages, new bootcamps, events, etc). For example, there are plenty of bits of site building that could be automated so that contributing a new lecture or topic doesn'thave to involve wiring in links to a table of contents, forward and backward links to other lectures/topics, and so on. Greg has already written some code that takes HTML pages for the lectures, bootcamps, and other miscellaneous pages, and knits them together to form basically the same sort of website you see here. Metadata about the ordering of lecture topics, for instance, is placed in HTML META tags, and this gets used to create forward/backward links, and overall topic pages. It's a system that works but is newly hand-rolled and "surely since the whole idea of generating websites isn't a new idea there must be something already out there to do this for us that is being maintained and already had a bunch of bugs squashed and... right?" Right? Pelican looks like it might something we could work with (he says, having spent about an hour looking through the documentation). It reads in pages written in Markdown or reST, and outputs static HTML knitted together in a blog-like format (complete with an RSS feed, tag cloud, yada yada). It's built to be extensible so it would be possible for us to add in new functionality to do essentially what Greg has working right now: read in HTML, extract metadata, build custom SWC pages for lectures and bootcamps and anything else we'd want in the future. The question is, why use Pelican if we're just going to be mostly using it to drive SWC-site specific code? That is, as it stands, it doesn't seem to do much of what we need it to do for the site, although one could argue that at least we'd be reusing the guts which should count for something (mmm, pelican guts). So, any opinions? Has anyone used Pelican (or anything like it) and added in lots of their own site-specific functionality? How'd it go? Read More ›

Oxford Wrap-Up (with charts!)
Greg Wilson / 2012-11-01
The first Software Carpentry workshop in Oxford has wrapped up, and by all accounts was a big success. Philip Fowler, the host, has posted about running the workshop ("everyone sat up a bit straighter during version control...") and also done an analysis of what participants think they learned. In addition, Aron Ahmadia has given us some detailed feedback on how we could improve things the next time 'round. Our thanks to everyone involved—we're looking forward to a return visit soon. Read More ›

Charging and Being Charged
Greg Wilson / 2012-11-01
As noted in the minutes from our meeting on Monday, one of the many things we want to improve about our workshops is the attendance rate. While we get near-complete turnout for some, we have 50% no-shows for others, which is disappointing for instructors, and unfair to people left on the waiting list. One idea is to charge $20 or so to help people decide if they're serious about taking part or not. However, we've been told in the past that as soon as we start charging, some universities will start to charge us for space, network access, and so on. I agree with their reasons—it would be chaos if random people could use their facilities for profit and not give any back to the university—but in some cases, what they'd charge us would far exceed what we'd get from learners. Here's one data point: Even with the proceeds going to charity it seems we would have to pay [the university's] "associate rate". For the room we used in the last workshop, which was quite crowded with 40 students + helpers, this would be £165 per day, so about £8 per attendee for two days. But here's another: [The university's] flat rate is $1500/day up to 50 people. They said it's mostly for insurance costs. Also, you have to do catering from the university, and that's $6.50 for coffee and a bagel for each break. What's the policy where you are? We'd be grateful for any hard data you can give us. Read More ›

Minutes from 2012-10-29 All-Hands Meeting
Greg Wilson / 2012-10-30
We held our first online all-hands meeting yesterday (Monday, October 29), and despite Hurricane Sandy, 28 people were able to attend. Minutes from the meeting are given below; we have a lot to do, but we're very excited to be doing it. In attendance: Aron Ahmadia Carlos Anderson Azalee Bostroem Erik Bray Steve Crouch Matt Davis Ross Dickson Justin Ely Julia Gustavsen Tommy Guy Steve Haddock Ted Hart Konrad Hinsen Katy Huff Emily Jane McTavish Trevor King Justin Kitzes Ian Langmore Ian Mitchell Aleksandra Pawlik Ariel Rokem Anthony Scopatz Chang She Joshua Smith Laura Tremblay-Boyer Ben Waugh Ethan White Greg Wilson Minutes Attendance and no-shows Attendance ranges from less than 50% to over 100% of those who register Former case is disappointing (to instructors) and unfair (to those left on the wait list) Options: Have people to register in groups/teams Worked well at UC London and elsewhere Difficult to implement on EventBrite Try out team signup at upcoming workshops Try a registration charge (e.g., $20) which we keep, return after attendance, or give to charity Institutions may start charging us for space if any money changes hands Find out if we can charge for attendance without being charged in turn Returning money to attendees is difficult (getting credit card number and only charging for no-show is equally fraught) What charity would we donate to? (Software Carpentry isn't a non-profit) Licensing Existing content is CC-BY / MIT licensed Do not want to have to manage content with multiple licenses Need sign-off from contributors Linux Submitting Patches guide and GitHub's Contributing Guidelines are both examples we can copy. Incorporate licensing agreement into web site Get retroactive signoff for existing material Workshop/participant level mismatch Most common complaints about workshops are "too fast" and "too slow" describes our target Telling people what we expect as background hasn't worked People who don't know enough to absorb what we teach show up in the hopes that they'll get something, and are then lost People who know much of what we're teaching show up in the hopes of learning something new, and are bored Having people send us material for evaluation (e.g., code sample) doesn't scale "I think the people in my field able to produce a code example are way ahead of the group that needs the most help." Use a self-assessment? People will still undershoot/overshoot and show up anyway Pre-tests scare away some of the people we most want to help Ask them to work through a free online course before coming to us? MOOCs work best for people who already understand concepts and want to add information... ...which isn't our audience... ...and many people who are willing to try a two-day workshop probably won't commit to a full course (up front) Experiment with pre-assessments for upcoming workshops Make completion of the pre-assessment a condition for getting a ticket Impact Assessment We don't know what impact we're having (but funders would really like to) Collect names/email addresses of actual workshop participants to contact later Design lightweight post-workshop instrument for use 3 months after Migrating the Software Carpentry repository to GitHub Find examples of people doing collaborative course development using Git Need to reach consensus on one-big-repo vs. lots-of-little-repos Produce short A-or-B position statements on Git organization by Nov 7 for vote Online Lessons Broadcast video tutorials didn't work well Forums hosted at Software Carpentry didn't do well in 2010 SciComp Stack Exchange isn't intended for novice questions about shell, version control, etc. Local learning groups (once-a-week lunchtime sessions) seemed to work well And when video tutorials did work, it was partly because participants were co-located, e.g., watching a single screen in a small group and talking amongst themselves out of band Experiment with online office hours Create "Software Carpentry" tag on Stack Exchange Re-launch our own forums Given how many workshops we're running, and how closely they're scheduled, maybe this time there would be critical mass Targeting Specific Audiences Our mission is not to teach Python (or Bash, or Subversion), but to teach scientists how to think like programmers: to grow programs in a structured, repeatable way (create and combine lots of little tools) to manage and share what they build (version control, readable code, provenance) to be confident that it's right (defensive programming, testing, debugging) to automate, automate, automate (build system, and the very idea of programming) and to do this using open source tools as far as possible. Using R instead of Python fits this mission (see Justin Kitzes' summary) Teaching MPI doesn't What about teaching these concepts in C or Fortran to people who already speak the language? Try using R instead of Python Try using C or Fortran without teaching the language itself Workshop Sprint in 2013 Initial idea: bring Europeans over to help run a bunch of workshops in North American in late March 2013, then send North Americans over to help teach another bunch in late June Good for publicity A good way for people to meet each other and build ties Are (some) people able to take 2-3 weeks to do this? Yes. Particularly if they give academic talks at the same time as teaching workshops, which everyone should. Does the timing work? Exams, people leaving to work at field stations, etc. Find out who's available when for workshop sprint Find budget for workshop sprint Content Sprint in 2013 Independently of the workshop sprint, bring people together for a one-week sprint on content Or possibly 2-3 sites (e.g., London, Toronto, Mountain View) connected by virtual presence Find out who's available when for content sprint Find budget for content sprint Deferred IPython Notebooks experience report Git experience report What to do about Windows? Badging Actions Try out team signup at upcoming workshops. [pending new workshops] Find out if we can charge for attendance without being charged in turn. [all] Incorporate licensing agreement into web site. [GVW] Get retroactive signoff for existing material. [GVW] Experiment with pre-assessments for upcoming workshops. [pending design and new workshops] Collect names/email addresses of actual workshop participants to contact later. [all] Design lightweight post-workshop instrument for use 3 months after. [WC, MH] Find examples of people doing collaborative course development using Git. [any?] Produce short A-or-B position statements on Git organization by Nov 7 for vote. [KH + AA, MD, TG, TK, CS, JS] Experiment with online office hours. [KH, others?] Create "Software Carpentry" tag on Stack Exchange. [volunteer?] Re-launch our own forums. [deferred] Try using R instead of Python. [TH, JK, LTB] Try using C or Fortran without teaching the language itself. [AA, KH] Find out who's available when for workshop sprint. [GVW] Find budget for workshop sprint. [GVW] Find out who's available when for content sprint. [GVW] Find budget for content sprint. [GVW] Note: GVW would really appreciate a volunteer to tackle #15-18. AA Aron Ahmadia WC Warren Code MD Matt Davis TG Tommy Guy MH Mike Hansen TH Ted Hart KH Konrad Hinsen KH Katy Huff JK Justin Kitzes TK Trevor King CS Chang She JS Joshua Smith LTB Laura Tremblay-Boyer GVW Greg Wilson Read More ›

A List of Bioinformatics Courses
Greg Wilson / 2012-10-30
Titus Brown is compiling a list of bioinformatics courses. If you know one that should be added, he'd enjoy hearing from you. Read More ›

Position Available: Director, Webmaking Science Lab, Mozilla
Greg Wilson / 2012-10-29
Mozilla is seeking a dynamic individual to drive the vision and product strategy for its Webmaking Science Lab. This is a key strategic role reporting to the Executive Director. This individual will lead the charge in reaching out to the scientific community, helping to establish a vibrant community of scientists, developers, designers, and other individuals working to advance the practice of science on the web. A passion for open science, a keen understanding of the web as both a technology stack and culture, experience with existing web-based science toolsets, and enthusiasm for community-driven technology development are the initial key qualifications. As a senior member of the Mozilla organization, the successful candidate will be responsible for gathering partners, inspiring a team, building a community, shaping a vision, and leading the development and delivery of new products for conducting science both on and like the web. For details, or to apply, please visit http://careers.mozilla.org/en-US/position/oQuSWfwS. Read More ›

Usability Testing and Instructional Design
Greg Wilson / 2012-10-28
This is a story in several parts. 1. From Guido van Rossum's intermittently-updated blog about the history and design of Python: Python's first and foremost influence was ABC, a language designed in the early 1980s [that] was meant to be a teaching language, a replacement for BASIC, and a language and environment for personal computing. It was designed by first doing a task analysis of...programming...and then doing several iterations that included serious user testing... ABC's authors did invent the use of the colon that separates the lead-in clause from the indented block. After early user testing without the colon, it was discovered that the meaning of the indentation was unclear to beginners being taught the first steps of programming. The addition of the colon clarified it significantly: the colon somehow draws attention to what follows and ties the phrases before and after it together in just the right way. This story and others like it were something of a revelation to me when I first encountered Python in the late 1990s. Usability testing of programming languages? Huh—why isn't everyone doing that? 2. So when an enriched syntax for loops was proposed in 2000, I conducted a little experiment: Given the statement: for x in [10, 20, 30]; y in [1, 2]: print x+y would you expect to see: 'x' and 'y' move forward at the same rate: 11 22 'y' go through the second list once for each value of 'x': 11 12 21 22 31 32 an error message because the two lists are not the same length? All 11 of the people I tested voted for 'B', which is not what the designer of this syntax had intended it to mean. I did a slightly larger experiment a few days later to compare a few other syntax proposals, and while it was both fun and informative, the practice never caught on. 3. A few years later, I discovered the Media Computation work of Barbara Ericson and Mark Guzdial at Georgia Tech. They weren't interested in syntactic details; they were tackling the much larger issue of retention: What and how should we teach to get more people into computing and keep them there (particularly from underrepresented groups like women and non-whites/non-Asians)? What and how should we teach so that the people who do stick around remember more of what we've taught? They found that both kinds of retention could be improved by using a media-first approach to computing, i.e., by using examples like resizing images, red-eye removal, and sound editing right from the start. There are many reasons—it's more immediately useful than finding primes, more fun than sorting strings, and the visual output is often easier to debug—but what really mattered was that they had evidence to back up their teaching strategy. They turned their findings into a series of textbooks, and my colleagues at the University of Toronto and I borrowed many of their ideas for our introductory Python book. 4. We also borrowed some code, or at least its API. Ericson and Guzdial realized early on that novices needed a different kind of toolkit than experienced programmers. For novices, a picture was the clay out of which they would shape their understanding of what programming was. They didn't need high-performance edge detection operations; they needed a simple, single-step way to loop over the picture's pixels. The Python Imaging Library didn't cater to this kind of thing because it's actually not the "right" way to do image processing, so the Media Computation team built a simpler (but lower-performance) library in Jython to keep simple things simple. We used a C-Python version of this called PyGraphics, which included some simple audio manipulation functions as well. Which brings us to the point of this story. We currently teach Python in a very traditional order: arithmetic, assignment, lists, loops, conditionals, and functions are introduced in more or less that order. We also teach it using very traditional examples: the values we push around are numbers and strings, and the I/O we do is mostly readline-in-a-loop. Given what the folks at Georgia Tech have discovered, and the speed of modern machines, I'd really like to switch that up and try an images-first approach (particularly given how easy the IPython Notebook makes it to display images alongside code). However, I don't want to have to maintain an image manipulation library, no matter how small, or require learners to download and install anything more than they already need to. But the point is, it's premature to worry about either issue until we know whether this approach actually works any better than what we're doing right now. If we tile images instead of cutting columns out of CSV files, do scientists learn more, faster, about how loops and conditionals and assignment and call stacks work? Do they remember more two weeks or two months later? Do they do more with what they've learned, and if so, does it actually help them do more science? I believe some of these questions can be answered, though the answers may not be easy to build, and as I've said before, if we're going to teach scientists, we damn well ought to act like scientists ourselves. Read More ›

Why This Is Hard (Part Deux)
Greg Wilson / 2012-10-27
I pointed a grad student at the IPython tutorial and made some notes as she started to work through it. (She has used the vanilla Python interpreter before, but is still a novice.) Here's what happened. The first thing the tutorial mentions is tab completion, which she's used in other tools, so she tried that: >>> x = 'hello' >>> x.<TAB> # shows methods of the string --- cool! Does it work directly with strings? >>> 'abc'.<TAB> # shows --- what? It took me a moment to figure out it was showing us all the files in the current working directory whose names begin with a '.' character. Since these files are normally hidden, I don't think the grad student had ever seen them; I also don't think she'd have figured out what they were on her own. >>> x? # gets help for the string referred to by x --- cool! The information itself isn't very helpful for novices, but she was pleased it was there. Now what about this? >>> 'abc'? Uh oh—it tells her "Object `` not found." Given that we've been using: >>> name = 'Perry' >>> len(name) and: >>> len('Perry') interchangeably, this confused her. She moved on, but I think it got filed as "weird arbitrary computer stuff", which is exactly what we're trying to clear up. Now, coming from RStudio, one of the things she wants is to be able to edit while in the interpreter. And woo hoo, there's an '%edit' magic in IPython. But '%edit' brought up Vim, which may be the hardest editor in the world to get out of if you've never seen it before. There's nothing in the tutorial to tell her how to change the editor, and the word '%edit' doesn't link to any docs either. If I hadn't been there, she would have been toast. I'm not sure what to recommend, but unless you think ':wq' is intuitive or discoverable, this is a problem. And here's something even harder: "%%edit x = 5" opened Vim with the line: get_ipython().magic(u'%edit x = 5') When she quit with 'q!', Vim was immediately reopened with the same line. And again, and again, until she killed the window. I opened a fresh window and restart IPython for her. We then typed: >>> %%edit x = 5 and got a traceback starting with the error message: WARNING: Argument given (x = 5) can't be found as a variable or as a filename. Er, what? We then assigned 5 to x and tried the '%%edit x = 5' line again—same traceback. Hm. What if we try: >>> %edit all by itself, quit from Vim, and then try: >>> %edit x = 5 Yup, that reproduces the "Vim forever!" problem. At this point, we're five minutes in, and has spent almost all of those five minutes wrestling with things that seem arbitrary to her, and that she wouldn't have been able to fix on her own. She has also asked what "cell magic" is, and I've punted by saying, "Oh, it's just multi-line stuff." I don't think that clarified anything for her, but luckily she was immediately distracted by Vim rising from its grave and forgot the question. She then tried playing around with shell commands from inside Python at my request (since we'd like to stop teaching the shell), and again there was some confusion: >>> %cd tmp works fine, but: >>> %ls tells us that the magic function `ls` is not found. She can use the shell escape '!ls' instead, but she'll have to keep a cheat sheet to remember which are '%' and which are '!', and it'll be a while before she understands why some require one prefix and some require the other, but at least when it doesn't work, it doesn't work. But then she tried this: >>> !cd .. to get back to her home directory from the 'tmp' directory she changed into earlier. It looks like it worked, but '!ls' shows that it hasn't. I understand what's going on—the 'cd' happened in a sub-shell that's immediately discarded, rather than in the interpreter—but she's completely confused at this point. None of this is intended as a criticism of IPython: in the hands of someone who has mastered the underlying concepts, it's a great power tool, and I'm not going to switch back to the generic Python interpreter any time soon. But as a novice, this grad student doesn't have a clear mental model of how the operating system, the shell, and the Python interpreter relate to each other, so anything that moves back and forth between levels leaves her bewildered. Explaining things categorically therefore doesn't work (yet), because she doesn't have the categories. Equally, more documentation isn't a solution, because looking something up (i.e., adding information to existing concepts) and forming or internalizing concepts are different cognitive processes. What about putting safeties on the tool itself, i.e., taking features out of IPython to reduce the number of "mistakes" novices could make, or the number of times they'd say "huh?" The problem with that approach is that every one of those features is there for a reason: people really do want to run arbitrary shell commands from inside IPython, and change its own notion of the current working directory, and so on. As Mark Chu-Carroll pointed out, some of this stuff really is irreducibly hard, because some of the things we're trying to do really are complex. Finding ways to "cook" it to make it more digestible is the heart of instructional design, and perhaps the greatest challenge we now face. Read More ›

Two Self-Assessments
Greg Wilson / 2012-10-26
A recurring problem with our workshops is the diversity of learners' backgrounds. Below are the results of two pre-workshop self-assessments (on different groups), one phrased in terms of content, the other in terms of specific tasks. The consistency is better than I expected: broadly speaking, it seems that people's impression of how much they know about X is more or less in line with their belief about whether they could do something that requires understanding X. The next step will be to follow up a few weeks after the workshop to see if they actually can do X. Survey 1 never heard of it know what it is / might have used it occasionally use it but don't really understand it use it regularly and feel I understand it well expert bash shell scripting 9% 31% 19% 41% 0% version control (so you can "rollback" changes) 47% 44% 6% 3% 0% make (e.g. automated analysis) 44% 31% 19% 6% 0% simple python including lists and dictionaries 12% 38% 16% 28% 6% adding unit tests to python scripts 66% 22% 3% 9% 0% useful python modules (numpy and scipy) 28% 31% 16% 25% 0% MDAnalysis (load a protein structure into python) 41% 31% 6% 19% 3% Survey 2 could do easily could probably struggle through wouldn't know where to start A tab-delimited file has two columns: the date, and the highest temperature on that day. Produce a graph showing the average highest temperature for each month. 69% 21% 8% Write a short program to read a file containing columns of numbers separated by commas, average the non-negative values in the second and fifth columns, and print the results. 15% 32% 52% In a directory with 1000 text files, create a list of all files that contain the word Drosophila, and redirect the output to a file called results.txt. 13% 26% 60% A database has two tables Scientist and Lab. The Scientist table's columns are the scientist's student ID, name, and email address; the Lab table's columns are lab names, lab IDs, and scientist IDs. Write an SQL statement that outputs a count of the number of scientists in each lab. 6% 10% 82% Check out a working copy of a project from version control, add a file called paper.txt, and commit the change. 6% 13% 80% Read More ›

Mozilla Web Literacies White Paper
Greg Wilson / 2012-10-26
The Mozilla Foundation (which has been our home for the last eight months) have posted a white paper on web literacies that outlines their vision for the web: ...the world's largest public resource, the operating system of the future, and (we believe) one of the greatest drivers of happiness and human flourishing the world has ever seen. The skills/competencies/literacies grid gives a condensed overview of how beginners and intermediates could explore, create, connect, and protect with and on the web. Over the coming months, we'll be working to find ways of connecting this with scientists. If you'd like to help, please get in touch. Read More ›

Counting to Five (or, A Plan for Online Tutorials and What's Wrong With It)
Greg Wilson / 2012-10-26
My daughter is five and a half years old, and some time in the last couple of weeks, she made a small cognitive breakthrough. It used to be that if I asked her, "What's five plus three?", she would carefully count five fingers, then count three, then go back and count them all together to get eight. She doesn't do that any more; instead, she holds up five fingers, then counts, "Six... seven... eight!" to get the answer. It may sound like a small thing, but it's not: the idea that five is always five, even if you didn't count it this time around, is such a big one that we forget it ever had to be learned. A lot of my time these days goes into trying to count from five in online learning. While an ignorance of prior art might be considered an asset in Silicon Valley, I have enough failures and harder-than-it-should-have-beens behind me to make me want to build on what people have done before rather than reinvent things, even if the latter is more fun. What kinds of distance learning have worked in the past? How many of those ideas still work online? What new ideas work? What doesn't work, and why not? And how do we know? I don't believe for a moment that all the answers I want are already out there, but I'd have to be pretty arrogant to believe that none of them are, or that it'd be faster for me to rediscover things than read about them. For example: between March and July we ran two dozen online tutorials for people who'd been through Software Carpentry workshops. We used BlueJeans (and later Vidyo) to share the presenter's desktop with the learners, and two-way audio (later supplemented with IRC-style text chat) both for questions directed at the presenter, and sideways Q&A between learners. The net effect was exciting and frustrating in equal measure: exciting because we could actually do it, and frustrating because the technology didn't work for some people, and the experience was never really satisfying for anyone. Even when learners were sitting together in small groups so that they could have high-bandwidth interactions with each other during the tutorial, the result was a pale imitation of being there in person. At the same time, though, I was helping people one-to-one, and that seemed to work a lot better. These sessions typically started with a "can you please help me?" email, then turned into a Skype call and some desktop sharing. I found was that diagnosing a novice's problem was a lot easier when I could see what they were doing (or when they could see what I was doing—we did it both ways). The reason was that almost by definition, novices don't know how to explain their actual problem: if they knew enough to say, "I think my EDITOR variable must be set incorrectly," they'd almost certainly be able to fix the problem themselves. Email back and forth with someone who's still trying to get from A to B is a lot like a game of Twenty Questions: can you run this command and send me its output, have you edited that file, and so on. A few seconds of interaction short-circuited a lot of that, which made it much more satisfying on both sides. That experience suggests a different kind of online teaching than the MESS (Massively Enhanced Sage on the Stage) being used by Coursera, Udacity, the Khan Academy, and so on. Grossly misrepresented for rhetorical purposes, their model is: Watch a recording of a sage explaining something in a single fixed way. Try to do something that the sage decided would be good for you. If it doesn't work, hope that the static hints embedded in the robograder trigger a moment of satori. If not, try to translate your incomplete and imperfect understanding of what you're actually doing and why it isn't working into a coherent explanation. Hope that someone reads it, that you happened to include enough of the right information for them to diagnose your actual problem, and that they are able to write an explanation that you can understand given your current state of knowledge. Repeat. Minus steps 1 and 2, this is the model that Stack Overflow and similar sites use as well. I know from conversation with workshop participants than only a handful (certainly less than 10%) ever post a question on SO or similar sites. I used to think the reason was that people don't want to look stupid, but I now believe that the cost of translating "this doesn't work" into a complete enough description for someone else to work with, and the number of long-latency round trips required to converge on the actual problem and its diagnosis, are contributing factors as well. But what if we used 21st Century technology for Q&A instead of souped-up 1970s-vintage bulletin boards? What if there was a "help me now" button on Stack Overflow that was enabled whenever the person who'd asked the question was online? When clicked, it would connect the clicker and the person needing help via VoIP and desktop sharing (with some kind of "are you sure?" handshake at the outset) so that people could get real-time interactive help when they needed it. Based on what I've seen in the last five months, this would be a lot more effective than anything with the translation bottleneck and latency of a bulletin board. This is where "counting from five" comes in. Somebody must have done this before, or done something similar enough to tell me what potholes there are on this road. Enter Scott Gray, who's been building, using, and running online educational systems for two decades, and is now the head of the O'Reilly School of Technology. We chatted today, and here's what he said: GW: did you see the Mozilla Towtruck demo? GW: just a prototype, but the idea is to share your browser session with a tutor/helper/more knowledgeable friend when/as you need a few minutes of help GW: we're already finding with scientists that "hey, can we share a desktop for a couple of minutes to figure this out?" is very effective, both on the learning side, and on the engagement side. SG: Isn't Stackoverflow a low feature version of that? GW: The "low feature" adjective is the killer: novices don't understand what's going wrong well enough to summarize enough relevant detail to permit diagnosis—Catch-22 GW: There are obviously exceptions (novices do post answerable questions on SO) but I think it's a much more stringent filter than people who've passed through it realize SG: Yeah... seems like adding the hookup would be a lot of help: it's what we do at work but with skype. First we use the low feature texting like we are now. GW: Do people screenshare with skype? Has anyone done an analysis of what they actually show each other, and how much accidental/sideways knowledge transfer is taking place (i.e., how often person X notices something that person Y didn't explicitly communicate, but which turns out to be relevant?) SG: My employees are spread out all over the place; we use either Skype or Google hangout constantly. SG: For actually teaching courses I have seen it used and abused: as far back as the mid-nineties we used CU See ME. SG: They use Illuminate now. I watched them use it this past weekend, and I think that it can be massively improved. SG: I think complete share with voice should be the last of an escalating use of communication tools all available in whatever learning or tutorial tool is being used. SG: It's like right now: we're nearing what we can do with just text. We're communicating but we are reaching a time where we'd be more efficient actually talking, but we didn't know that 20 minutes ago. GW: Do you think the initial async/text exchange between helper and tutor will be productive often enough to make it worth doing, or will they be better off jumping to real-time/full info early? GW: I.e., should we flip the order in which we do things when one or both of us knows a lot less about what we're talking about? :-) SG: To your first question, absolutely. GW: Absolutely the text exchange will be productive, or absolutely they should leapfrog early? SG: The first. GW: Do you have any data to back that up? Looking at Stack Overflow, and talking to scientists who don't post questions, I think there's huge survivor bias in the sample, but I don't have anything more substantial than post-workshop conversations to back that up. SG: What goes on at GITHUB? If everyone had to do sync, no one would. No one has that kind of time. So screen sharing is a last resort not first. GW: Right—I don't think sync scales to the whole planet, but is it optimal for things like an online intro-to-programming class? A few thousand people, all with some declared stake in the game? SG: No. I have 6000 students, all novices; we don't use any sync, ever. GW: Would they if it was available? Would they use it with each other (the only scalable approach)? GW: It works for the online gaming community, big time—lots of people helping lots of other people learn to be better dwarf axe throwers and whatever in WoW :-) SG: Gaming and programming are completely different. GW: OK—a flawed analogy is a bad point to start design from :-) SG: Programming is a text based endevor: it's async by its very nature. So are literature and mathematics. GW: Ah, OK, then that's where we disagree GW: If I'm trying to figure out what's wrong with a novice's program, I want to run it on various inputs, or tweak a line or two and see what happens GW: If I'm trying to figure out why they can't install X, I want to run commands to check settings, env variables, etc., and that's a very interactive process. GW: The program is text, but programming is a verb. SG: Yep. GW: So maybe it's a difference between helping them with their program, and helping them with their programming. SG: But when we talk sync and async we're talking human to human. Both can be done async. SG: The problem is the ability to easily share asnychonus activities: think Google Docs but for programming. GW: hm... All right: if someone with (literally) ten times as much experience as me tell me I'm off track, I ought to listen carefully and think a little more, even if it means discarding what seemed to me to be a very promising idea. Another alternative is to try to find a really cheap way to cobble together some low-fidelity experiments to get a better feeling for how this would work in practice—the equivalent of a napkin sketch in UI design. If you have insights to share, please share 'em. Read More ›

Prime Numbers, Biologists, and Data Visualization
Greg Wilson / 2012-10-25
A couple of days ago, Titus Brown posted a question from Randy Olson: are there relevant, non-abstract problems (for biologists) that you can solve with minimal programming skills? I.e., what's the biology equivalent to traditional intro CS problems like "calculate the primes". The answer has three parts: Calculating primes, finding the longest line in a file, etc., are lousy introductory problems because they're completely valueless to most of the intended audience. You have to already believe you really, really want to learn how to program, and be good at delayed gratification, to get through them to the good stuff. (This isn't just true in computing: see the discussion in How Learning Works of the impact of motivation on learning.) Guzdial, Ericson, et al's work at Georgia Tech showed pretty conclusively that a media-first approach to computing has better outcomes, i.e., if people are manipulating images (and audio and video) right from day 1, they'll retain more of what they learn, and more students will stay in the program. (But note: you probably can't use an industrial-strength image manipulation library like PIL for intro teaching, at least not directly: you need something that assumes less background knowledge both in domain and skills.) Robbins, Senseman, and Pate have been piloting a "visualization first" programming course for biologists (see this paper if you can—apologies for the paywall). While it's early days, it seems that this should have the benefits of "media first" for scientists. For example, a programming exercise could be "draw the curves in red if the end point is less than the starting point, and in blue otherwise". We haven't reorganized our intro material around this idea yet, partly because of inertia, but partly because of the installation headaches of getting visualization working on N platforms in a two-day workshop. I'm very keen to try it out, though, particularly if the IPython Notebook really does make simple visualization simple to do on all major platforms. Read More ›

Feedback from Newcastle
Mike Jackson / 2012-10-24
This week the Digital Institute at Newcastle University with The Software Sustainability Institute ran a second bootcamp at Newcastle University. This bootcamp was run for attendees from the North East of England, from whom there was demand after the first Newcastle bootcamp, back in May. Feedback from the attendees was good though some wanted more time or detail on specific subjects. The most popular comment was the desire to have the slides and instructor scripts available so that attendees could catch up if they fell behind the instructor. The other major challenge, again, was Cygwin. We may want to provide a VM for attendees who typically use Linux but have a Windows laptop. Here are the comments from the attendees, helpers and instructors, good and bad... Good Wide range of languages. More confidence in using shell. Could follow along with everything. Good to be introduced to Cygwin. Really liked the post-it notes. Automated testing could be really useful. Useful to have Python and sqlite. Learned a lot about version control and sqlite. Can now keep track of how my thesis is progressing using Mercurial. Liked sqlite. Will use in my research. Demonstrators were extremely helpful. bootcamp overall has a nice "plot". Bad Helpful to have been given the slides to go through. Useful to put the code and the slides online before the bootcamp. Could have done with a crib sheet. A handout in case you are delayed so you can follow that. Difficult to keep up with the pace of the exercises. Had not always completed one before the next started. Sometimes the exercise did not have a progression in difficulty, e.g. the square overlap could be broken up more. Suggest the overlapping lines solution before the attendees implement the function so they come away thinking about TDD and not the solution to the problem (Instructor). Python was a bit elementary. Hoping for more Python. Would liked a bit longer in revision control. More shell examples, less shell commands. Instructor should pause before pressing RETURN when entering a command as sometimes the command disappears off screen. Crib sheets are a bit long - need to shorten (Instructor) Provide a VM for attendees who typically use Linux but have a Windows laptop. Read More ›

Twenty Percent
Greg Wilson / 2012-10-23
I realized a couple of days ago that I'd never blogged about what Software Carpentry needs to accomplish in order to change the practice of science fundamentally and permanently. In a nutshell, we need to convert a fifth of scientists to our way of thinking. Once we do that, the odds are better than 50-50 that every time someone sends a paper out for review, at least one reviewer will ask hard questions about how the computational work was done. I get that number by assuming: Number of Reviewers Fraction of Papers 2 10% 3 40% 4 40% 5 10% which means the probability that none of a paper's reviewers will ask the right questions is 0.1×0.82 + 0.4×0.83 + 0.4×0.84 + 0.1×0.85 = 46.5%. It's a grossly simplistic model, but at least it gives us something to shoot for. Read More ›

Key Points
Greg Wilson / 2012-10-23
On the flight back from Vancouver yesterday, I finally did what I should have done eight months ago and compiled the key points from our core lesson content. The results are presented below, broken down by lesson and topic; going forward, we're going to use something like this as a basis for defining what Software Carpentry is, and what workshop attendees can expect to learn. The Shell What and Why The shell is a program whose primary purpose is to read commands, run programs, and display results. Files and Directories The file system is responsible for managing information on disk. Information is stored in files, which are stored in directories (folders). Directories can also store other directories, which forms a directory tree. / on its own is the root directory of the whole filesystem. A relative path specifies a location starting from the current location. An absolute path specifies a location from the root of the filesystem. Directory names in a path are separated with '/' on Unix, but '\' on Windows. '..' means "the directory above the current one"; '.' on its own means "the current directory". Most files' names are something.extension; the extension isn't required, and doesn't guarantee anything, but is normally used to indicate the type of data in the file. cd path changes the current working directory. ls path prints a listing of a specific file or directory; ls on its own lists the current working directory. pwd prints the user's current working directory (current default location in the filesystem). whoami shows the user's current identity. Most commands take options (flags) which begin with a '-'. Creating Things Unix documentation uses '^A' to mean "control-A". The shell does not have a trash bin: once something is deleted, it's really gone. mkdir path creates a new directory. cp old new copies a file. mv old new moves (renames) a file or directory. nano is a very simple text editor—please use something else for real work. rm path removes (deletes) a file. rmdir path removes (deletes) an empty directory. Pipes and Filters '*' is a wildcard pattern that matches zero or more characters in a pathname. '?' is a wildcard pattern that matches any single character. The shell matches wildcards before running commands. command > file redirects a command's output to a file. first | second is a pipeline: the output of the first command is used as the input to the second. The best way to use the shell is to use pipes to combine simple single-purpose programs (filters). cat displays the contents of its inputs. head displays the first few lines of its input. sort sorts its inputs. tail displays the last few lines of its input. wc counts lines, words, and characters in its inputs. Loops Use a for loop to repeat commands once for every thing in a list. Every for loop needs a variable to refer to the current "thing". Use $name to expand a variable (i.e., get its value). Do not use spaces, quotes, or wildcard characters such as '*' or '?' in filenames, as it complicates variable expansion. Give files consistent names that are easy to match with wildcard patterns to make it easy to select them for looping. Use the up-arrow key to scroll up through previous commands to edit and repeat them. Use history to display recent commands, and !number to repeat a command by number. Use ^C (control-C) to terminate a running command. Shell Scripts Save commands in files (usually called shell scripts) for re-use. Use bash filename to run saved commands. $* refers to all of a shell script's command-line arguments. $1, $2, etc., refer to specified command-line arguments. Letting users decide what files to process is more flexible and more consistent with built-in Unix commands. Finding Things Everything is stored as bytes, but the bytes in binary files do not represent characters. Use nested loops to run commands for every combination of two lists of things. Use '\' to break one logical line into several physical lines. Use parentheses '()' to keep things combined. Use $(command) to insert a command's output in place. find finds files with specific properties that match patterns. grep selects lines in files that match patterns. man command displays the manual page for a given command. Version Control with Subversion Version control is a better way to manage shared files than email or shared folders. The master copy is stored in a repository. Nobody ever edits the master directory: instead, each person edits a local working copy. People share changes by committing them to the master or updating their local copy from the master. The version control system prevents people from overwriting each other's work by forcing them to merge concurrent changes before committing. It also keeps a complete history of changes made to the master so that old versions can be recovered reliably. Version control systems work best with text files, but can also handle binary files such as images and Word documents. Basic Use Every repository is identified by a URL. Working copies of different repositories may not overlap. Each changed to the master copy is identified by a unique revision number. Revisions identify snapshots of the entire repository, not changes to individual files. Each change should be commented to make the history more readable. Commits are transactions: either all changes are successfully committed, or none are. The basic workflow for version control is update-change-commit. svn add things tells Subversion to start managing particular files or directories. svn checkout url checks out a working copy of a repository. svn commit -m "message" things sends changes to the repository. svn diff compares the current state of a working copy to the state after the most recent update. svn diff -r HEAD compares the current state of a working copy to the state of the master copy. svn history shows the history of a working copy. svn status shows the status of a working copy. svn update updates a working copy from the repository. Merging Conflicts Conflicts must be resolved before a commit can be completed. Subversion puts markers in text files to show regions of conflict. For each conflicted file, Subversion creates auxiliary files containing the common parent, the master version, and the local version. svn resolve files tells Subversion that conflicts have been resolved. Recovering Old Versions Old versions of files can be recovered by merging their old state with their current state. Recovering an old version of a file does not erase the intervening changes. Use branches to support parallel independent development. svn merge merges two revisions of a file. svn revert undoes local changes to files. Setting up a Repository Repositories can be hosted locally, on local (departmental) servers, on hosting services, or on their owners' own domains. svnadmin create name creates a new repository. Provenance $Keyword:$ in a file can be filled in with a property value each time the file is committed. Put version numbers in programs' output to establish provenance for data. svn propset svn:keywords property files tells Subversion to start filling in property values. Basic Programming Basic Operations Use '=' to assign a value to a variable. Assigning to one variable does not change the values associated with other variables. Use print to display values. Variables are created when values are assigned to them. Variables cannot be used until they have been created. Addition ('+'), subtraction ('-'), and multiplication ('*') work as usual in Python. Use meaningful, descriptive names for variables. Creating Programs Store programs in files whose names end in .py and run them with python name.py. Types The most commonly used data types in Python are integers (int), floating-point numbers (float), and strings (str). Strings can start and end with either single quote (') or double quote ("). Division ('/') produces an int result when given int values: one or both arguments must be float to get a float result. "Adding" strings concatenates them, multiplying strings by numbers repeats them. Strings and numbers cannot be added because the behavior is ambiguous: convert one to the other type first. Variables do not have types, but values do. Reading Files Data is either in memory, on disk, or far away. Most things in Python are objects, and have attached functions called methods. When lines are read from files, Python keeps their end-of-line characters. Use str.strip to remove leading and trailing whitespace (including end-of-line characters). Use file(name, mode) to open a file for reading ('r'), writing ('w'), or appending ('a'). Opening a file for writing erases any existing content. Use file.readline to read a line from a file. Use file.close to close an open file. Use print >> file to print to a file. Standard Input and Output The operating system automatically gives every program three open "files" called standard input, standard output, and standard error. Standard input gets data from the keyboard, from a file when redirected with '<', or from the previous stage in a pipeline with '|'. Standard output writes data to the screen, to a file when redirected with '>', or to the next stage in a pipeline with '|'. Standard error also writes data to the screen, and is not redirected by '>' or '|'. Use import library to import a library. Use library.thing to refer to something imported from a library. The sys library provides open "files" called sys.stdin and sys.stdout for standard input and output. Repeating Things Use for variable in something: to loop over the parts of something. The body of a loop must be indented consistently. The parts of a string are its characters; the parts of a file are its lines. Making Choices Use if test to do something only when a condition is true. Use else to do something when a preceding if test is not true. The body of an if or else must be indented consistently. Combine tests using and and or. Use '<', '<=', '>=', and '>' to compare numbers or strings. Use '==' to test for equality and '!=' to test for inequality. Use variable += expression as a shorthand for variable = variable + expression (and similarly for other arithmetic operations). Flags The two Boolean values True and False can be assigned to variables like any other values. Programs often use Boolean values as flags to indicate whether something has happened yet or not. Reading Data Files Use str.split() to split a string into pieces on whitespace. Values can be assigned to any number of variables at once. Provenance Revisited Put version numbers in programs' output to establish provenance for data. Lists Use [value, value, ...] to create a list of values. for loops process the elements of a list, in order. len(list) returns the length of a list. [] is an empty list with no values. More About Lists Lists are mutable: they can be changed in place. Use list.append(value) to append something to the end of a list. Use list[index] to access a list element by location. The index of the first element of a list is 0; the index of the last element is len(list)-1. Negative indices count backward from the end of the list, so list[-1] is the last element. Trying to access an element with an out-of-bounds index is an error. range(number) produces the list of numbers [0, 1, ..., number-1]. range(len(list)) produces the list of legal indices for list. Checking and Smoothing Data range(start, end) creates the list of numbers from start up to, but not including, end. range(start, end, stride) creates the list of numbers from start up to end in steps of stride. Nesting Loops Use nested loops to do things for combinations of things. Make the range of the inner loop depend on the state of the outer loop to automatically adjust how much data is processed. Use min(...) and max(...) to find the minimum and maximum of any number of values. Nesting Lists Use nested lists to store multi-dimensional data or values that have regular internal structure (such as XYZ coordinates). Use list_of_lists[first] to access an entire sub-list. Use list_of_lists[first][second] to access a particular element of a sub-list. Use nested loops to process nested lists. Aliasing Several variables can alias the same data. If that data is mutable (e.g., a list), a change made through one variable is visible through all other aliases. Functions and Libraries How Functions Work Define a function using def name(...) The body of a function must be indented. Use name(...) to call a function. Use return to return a value from a function. The values passed into a function are assigned to its parameters in left-to-right order. Function calls are recorded on a call stack. Every function call creates a new stack frame. The variables in a stack frame are discarded when the function call completes. Grouping operations in functions makes code easier to understand and re-use. Global Variables Every function always has access to variables defined in the global scope. Programmers often write constants' names in upper case to make their intention easier to recognize. Functions should not communicate by modifying global variables. Multiple Arguments A function may take any number of arguments. Define default values for parameters to make functions more convenient to use. Defining default values only makes sense when there are sensible defaults. Returning Values A function may return values at any point. A function should have zero or more return statements at its start to handle special cases, and then one at the end to handle the general case. "Accidentally" correct behavior is hard to understand. If a function ends without an explicit return, it returns None. Aliasing Values are actually passed into functions by reference, which means that they are aliased. Aliasing means that changes made to a mutable object like a list inside a function are visible after the function call completes. Libraries Any Python file can be imported as a library. The code in a file is executed when it is imported. Every Python file is a scope, just like every function. Standard Libraries Use from library import something to import something under its own name. Use from library import something as alias to import something under the name alias. from library import * imports everything in library under its own name, which is usually a bad idea. The math library defines common mathematical constants and functions. The system library sys defines constants and functions used in the interpreter itself. sys.argv is a list of all the command-line arguments used to run the program. sys.argv[0] is the program's name. sys.argv[1:] is everything except the program's name. Building Filters If a program isn't told what files to process, it should process standard input. Programs that explicitly test values' types are more brittle than ones that rely on those values' common properties. The variable __name__ is assigned the string '__main__' in a module when that module is the main program, and the module's name when it is imported by something else. If the first thing in a module or function is a string that isn't assigned to a variable, that string is used as the module or function's documentation. Use help(name) to display the documentation for something. Functions as Objects A function is just another kind of data. Defining a function creates a function object and assigns it to a variable. Functions can be assigned to other variables, put in lists, and passed as parameters. Writing higher-order functions helps eliminate redundancy in programs. Use filter to select values from a list. Use map to apply a function to each element of a list. Use reduce to combine the elements of a list. Databases A relational database stores information in tables with fields and records. A database manager is a program that manipulates a database. The commands or queries given to a database manager are usually written in a specialized language called SQL. Selecting SQL is case insensitive. The rows and columns of a database table aren't stored in any particular order. Use SELECT fields FROM table to get all the values for specific fields from a single table. Use SELECT * FROM table to select everything from a table. Removing Duplicates Use SELECT DISTINCT to eliminate duplicates from a query's output. Calculating New Values Use expressions in place of field names to calculate per-record values. Filtering Use WHERE test in a query to filter records based on logical tests. Use AND and OR to combine tests in filters. Use IN to test whether a value is in a set. Build up queries a bit at a time, and test them against small data sets. Sorting Use ORDER BY field ASC (or DESC) to order a query's results in ascending (or descending) order. Aggregation Use aggregation functions like SUM MAX to combine many query results into a single value. Use the COUNT function to count the number of results. If some fields are aggregated, and others are not, the database manager displays an arbitrary result for the unaggregated field. Use GROUP BY to group values before aggregation. Database Design Each field in a database table should store a single atomic value. No fact in a database should ever be duplicated. Combining Data Use JOIN to create all possible combinations of records from two or more tables. Use JOIN tables ON test to keep only those combinations that pass some test. Use table.field to specify a particular field of a particular table. Use aliases to make queries more readable. Every record in a table should be uniquely identified by the value of its primary key. Self Join Use a self join to combine a table with itself. Missing Data Use NULL in place of missing information. Almost every operation involving NULL produces NULL as a result. Test for nulls using IS NULL and IS NOT NULL. Most aggregation functions skip nulls when combining values. Nested Queries Use nested queries to create temporary sets of results for further querying. Use nested queries to subtract unwanted results from all results to leave desired results. Creating and Modifying Tables Use CREATE TABlE name(...) to create a table. Use DROP TABLE name to erase a table. Specify field names and types when creating tables. Specify PRIMARY KEY, NOT NULL, and other constraints when creating tables. Use INSERT INTO table VALUES(...) to add records to a table. Use DELETE FROM table WHERE test to erase records from a table. Maintain referential integrity when creating or deleting information. Transactions Place operations in a transaction to ensure that they appear to be atomic, consistent, isolated, and durable. Programming With Databases Most applications that use databases embed SQL in a general-purpose programming language. Database libraries use connections and cursors to manage interactions. Programs can fetch all results at once, or a few results at a time. If queries are constructed dynamically using input from users, malicious users may be able to inject their own commands into the queries. Dynamically-constructed queries can use SQL's native formatting to safeguard against such attacks. Number Crunching with NumPy High-level libraries are usually more efficient for numerical programming than hand-coded loops. Most such libraries use a data-parallel programming model. Arrays can be used as matrices, as physical grids, or to store general multi-dimensional data. Basics NumPy is a high-level array library for Python. import numpy to import NumPy into a program. Use numpy.array(values) to create an array. Initial values must be provided in a list (or a list of lists). NumPy arrays store homogeneous values whose type is identified by array.dtype. Use old.astype(newtype) to create a new array with a different type rather than assigning to dtype. numpy.zeros creates a new array filled with 0. numpy.ones creates a new array filled with 1. numpy.identity creates a new identity matrix. numpy.empty creates an array but does not initialize its values (which means they are unpredictable). Assigning an array to a variable creates an alias rather than copying the array. Use array.copy to create a copy of an array. Put all array indices in a single set of square brackets, like array[i0, i1]. array.shape is a tuple of the array's size in each dimension. array.size is the total number of elements in the array. Storage Arrays are stored using descriptors and data blocks. Many operations create a new descriptor, but alias the original data block. Array elements are stored in row-major order. array.transpose creates a transposed alias for an array's data. array.ravel creates a one-dimensional alias for an array's data. array.reshape creates an arbitrarily-shaped alias for an array's data. array.resize resizes an array's data in place, filling with zero as necessary. Indexing Arrays can be sliced using start:end:stride along each axis. Values can be assigned to slices as well as read from them. Arrays can be used as subscripts to select items in arbitrary ways. Masks containing True and False can be used to select subsets of elements from arrays. Use '&' and '|' (or logical_and and logical_or) to combine tests when subscripting arrays. Use where, choose, or select to select elements or alternatives in a single step. Linear Algebra Addition, multiplication, and other arithmetic operations work on arrays element-by-element. Operations involving arrays and scalars combine the scalar with each element of the array. array.dot performs "real" matrix multiplication. array.sum calculates sums or partial sums of array elements. array.mean calculates array averages. Making Recommendations Getting data in the right format for processing often requires more code than actually processing it. Data with many gaps should be stored in sparse arrays. numpy.cov calculates variancess and covariances. The Game of Life Padding arrays with fixed elements is an easy way to implement boundary conditions. scipy.signal.convolve applies a weighted mask to each element of an array. Quality Defensive Programming Design programs to catch both internal errors and usage errors. Use assertions to check whether things that ought to be true in a program actually are. Assertions help people understand how programs work. Fail early, fail often. When bugs are fixed, add assertions to the program to prevent their reappearance. Handling Errors Use raise to raise exceptions. Raise exceptions to report errors rather than trying to handle them inline. Use try and except to handle exceptions. Catch exceptions where something useful can be done about the underlying problem. An exception raised in a function may be caught anywhere in the active call stack. Unit Testing Testing cannot prove that a program is correct, but is still worth doing. Use a unit testing library like Nose to test short pieces of code. Write each test as a function that creates a fixture, executes an operation, and checks the result using assertions. Every test should be able to run independently: tests should not depend on one another. Focus testing on boundary cases. Writing tests helps us design better code by clarifying our intentions. Numbers Floating point numbers are approximations to actual values. Use tolerances rather than exact equality when comparing floating point values. Use integers to count and floating point numbers to measure. Most tests should be written in terms of relative error rather than absolute error. When testing scientific software, compare results to exact analytic solutions, experimental data, or results from simpler or previously-tested programs. Coverage Use a coverage analyzer to see which parts of a program have been tested and which have not. Debugging Use an interactive symbolic debugger instead of print statements to diagnose problems. Set breakpoints to halt the program at interesting points instead of stepping through execution. Try to get things right the first time. Make sure you know what the program is supposed to do before trying to debug it. Make sure the program is actually running the test case you think it is. Make the program fail reliably. Simplify the test case or the program in order to localize the problem. Change one thing at a time. Be humble. Designing Testable Code Separating interface from implementation makes code easier to test and re-use. Replace some components with simplified versions of themselves in order to simplify testing of other components. Do not create arbitrary, variable, or random results, as they are extremely hard to test. Isolate interactions with the outside world when writing tests. Sets and Dictionaries Sets Use sets to store distinct unique values. Create sets using set() or {v1, v2, ...}. Sets are mutable, i.e., they can be updated in place like lists. A loop over a set produces each element once, in arbitrary order. Use sets to find unique things. Storage Sets are stored in hash tables, which guarantee fast access for arbitrary keys. The values in sets must be immutable to prevent hash tables misplacing them. Use tuples to store multi-part elements in sets. Dictionaries Use dictionaries to store key-value pairs with distinct keys. Create dictionaries using {k1:v1, k2:v2, ...} Dictionaries are mutable, i.e., they can be updated in place. Dictionary keys must be immutable, but values can be anything. Use tuples to store multi-part keys in dictionaries. dict[key] refers to the dictionary entry with a particular key. key in dict tests whether a key is in a dictionary. len(dict) returns the number of entries in a dictionary. A loop over a dictionary produces each key once, in arbitrary order. dict.keys() creates a list of the keys in a dictionary. dict.values() creates a list of the keys in a dictionary. Simple Examples Use dictionaries to count things. Initialize values from actual data instead of trying to guess what values could "never" occur. Phylogenetic Trees Problems that are described using matrices can often be solved more efficiently using dictionaries. When using tuples as multi-part dictionary keys, order the tuple entries to avoid accidental duplication. Development The Grid Get something simple working, then start to add features, rather than putting everything in the program at the start. Leave FIXME markers in programs as you are developing them to remind yourself what still needs to be done. Aliasing Draw pictures of data structures to aid debugging. Randomness Use a well-tested random number generation library to generate pseudorandom values. If a random number generation library is given the same seed, it will produce the same sequence of values. Neighbors and and or stop evaluating arguments as soon as they have an answer. Bugs Test programs with successively more complex cases. Refactoring Refactor programs as necessary to make testing easier. Replace randomness with predictability to make testing easier. Performance Scientists want faster programs both to handle bigger problems and to handle more problems with available resources. Before speeding a program up, ask, "Does it need to be faster?" and, "Is it correct?" Recording start and end times is a simple way to measure performance. Analyze algorithms to predict how a program's performance will change with problem size. Profiling Use a profiler to determine which parts of a program are responsible for most of its running time. A New Beginning Better algorithms are better than better hardware. Read More ›

25 Questions
Greg Wilson / 2012-10-23
We've tried several times to define Software Carpentry's aims and content in terms of the questions that researchers ask (see for example our competence matrix and this post mapping eight questions to our current content). Our latest attempt is crowd-sourced with a small 'c': the people who are taking part in the first round of our online study group for instructors and would-be instructors put together their own lists of questions, along with the answers they'd expect from novice, intermediate, and expert scientific programmers. I've consolidated those lists into 25 questions, which I'd like to boil back down to no more than 10. If you'd like to take a crack at doing that, please post your suggestion as a comment, and I'll send you a Software Carpentry t-shirt if we use it. Why is this program crashing? How can I tell if my program's answers are right or wrong? How can I tell if someone else's program is giving the right answers or not? Why is this program giving me the wrong answer? Why does this program work correctly on one machine but not on another? How can I fix this program? How can I make this program easier to understand? How can I keep track of what I've done? How can I figure out how someone else's program works? How can I avoid putting bugs in my programs in the first place? How can I avoid writing code that has been written before? How can I make my code easier for other people to use? How should I manage my programs? How should I share my programs? How can I analyze this data? How should I manage my data? How can I reformat this data? How should I share my data? How can I find things in my data? How can I reproduce a result I produced some time ago? How can I work with others? How can I install this program or library? How can I do things faster? How can I make my programs faster? How can I find out how to do something with my computer? Read More ›

Getting Credit
Greg Wilson / 2012-10-22
A recurring theme in our discussion with scientists is how hard it is to get academic credit for building software, but there are some hopeful signs. As Carole Goble pointed out to me a couple of weeks ago, Nucleic Acids Researchruns two special issues every year: one devoted to web servers, and one to databases. As the introduction to the former makes clear, the journal's definition of "web servers" is pretty broad: The 2012 Web Server Issue of Nucleic Acids Research is the 10th in a series of annual special issues dedicated to web-based software resources for analysis and visualization of molecular biology data...The present issue reports on 102 web servers. I'd be very interested to find out what kinds of software skills the contributors to these special issues have... Read More ›

Feedback from UC Berkeley
Matt Davis / 2012-10-22
We just finished up our UC Berkeley bootcamp with me, Katy Huff, and Justin Kitzes teaching. Here's the feedback: Good Bad first day good with no programming experience couldn't run the VM on underpowered computer good general intro to basics of Python and VC don't see why Python is better than R liked the intro/advanced Python split would like a longer bootcamp with more focus on Python good git introduction didn't want to learn Python IPython notebook helped me learn would like to give anonymous feedback got to see real software engineering didn't catch some important details that held things up good coverage of Python + software engineering have better setup instructions, test scripts wanted to learn Python Python instruction was too slow/basic liked simple examples got left behind sometimes IPython/shell interaction awesome not sure how to integrate git into workflow, have practical demos, case studies good job squeezing a lot of content in hard to type/listen/watch all at once introduced scientific Python would be nice to have a data cleaning example, workflow example liked the showcase demos of matplotlib/IPython features could have sped up introductory topics combination of topics was good have more content summaries, workflow visualizations liked intro Python exercises reading suggestions to follow up bootcamp understand the basics of version control wanted to see object oriented programming/calling C code Unfortunately I missed a pre-workshop memo that most of the attendees would be experienced R users and I could have sped up/skipped some of the introductory programming material in favor of more scientific Python. The second day material on version control and software engineering still seemed to hit home, though, so we didn't totally waste their time. In future workshops I'll try to do a better job getting attendee profiles beforehand. One piece of feedback I was particularly gratified to hear was someone saying that they thought the IPython Notebook was a good pedagogical tool for exactly the reasons Ethan White and I outlined in a previous post. The student enjoyed focusing simply on the language without the distractions of switching between editors and shells. Read More ›

Excel Isn't Intrinsically Evil
Greg Wilson / 2012-10-22
Excel and other spreadsheets aren't intrinsically evil, but like any power tool, they can easily take off fingers when used carelessly. From Neil Saunders' blog: ...a text download of 450K methylation from the Cancer Genome Atlas project reveals that Excel has had its evil way with the data at some point. Gene names such as MAR1, DEC1, OCT4 and SEPT9 are now reformatted as dates. As he goes on to say, "Despair at the quality of public data, fears about reproducibility in science. Must be Monday." Read More ›

Why Teaching People to Program Is Hard
Greg Wilson / 2012-10-21
Update: it's clear from comments that I explained myself poorly in this post. We don't ever teach by starting with a big example like the one below—we start with basic arithmetic, then assignment, then lists and loops, and so on. (See our Python lectures for details.) What I was trying to show was that by the time we reach a realistic example, students have to interleave those concepts in a very fine-grained way, i.e., if they'd been taking notes in the order in which we taught things, they'd have to flip back and forth through those notes constantly in order to make sense of things. That "cognitive assembly" is an extra burden on novices. Another way to think of it is this: coarse-grained interleaving like 'AAAABBBBCCCCDDDD' is easy to understand, and so is regular fine-grained interleaving like 'ABCDABCDABCDABCD', but we're asking learners to take the ABCD's we've shown them and put them together as 'ABADCCDABBDCCABD'. Let me show you why it's hard to teach people how to program. Our starting point is a Python program that grabs annual average temperatures for a couple of countries from the World Bank's site and calculates displays their ratio of one to the other. (The story we tell is that a climate scientist is trying to figure out whether global warming is happening faster in Canada than in Australia, or vice versa.) Here's the program: 01 import sys 02 import urllib2 03 import json 04 05 def kelvin(celsius): 06 '''Convert degrees C to degrees K.''' 07 return celsius + 273.15 08 09 def get_temps(country_code): 10 '''Get annual temperatures for a country.''' 11 url = 'http://climatedataapi.worldbank.org/climateweb/rest/v1/country/cru/tas/year/%s' 12 u = url % country_code 13 connection = urllib2.urlopen(u) 14 raw = connection.read() 15 structured = json.loads(raw) 16 connection.close() 17 result = {} 18 for entry in structured: 19 year, celsius = entry['year'], entry['data'] 20 result[year] = kelvin(celsius) 21 return result 22 23 def main(first_country, second_country): 24 '''Show ratio of average temperatures for two countries over time.''' 25 first = get_temps(first_country) 26 second = get_temps(second_country) 27 assert len(first) == len(second), 'Length mis-match in results' 28 keys = first.keys() 29 keys.sort() 30 for k in keys: 31 print k, first[k] / second[k] 32 33 if __name__ == '__main__': 34 first_country = 'AUS' 35 second_country = 'CAN' 36 if len(sys.argv) > 1: 37 first_country = sys.argv[1] 38 if len(sys.argv) > 2: 39 second_country = sys.argv[2] 40 main(first_country, second_country) Lesson 1 in any programming class covers basic data types (int, float string), variables, assignment, the print statement, and basic arithmetic. Which lines of this program can we understand if those are the only concepts we have? 01 import sys 02 import urllib2 03 import json 04 05 def kelvin(celsius): 06 '''Convert degrees C to degrees K.''' 07 return celsius + 273.15 08 09 def get_temps(country_code): 10 '''Get annual temperatures for a country.''' 11 url = 'http://climatedataapi.worldbank.org/climateweb/rest/v1/country/cru/tas/year/%s' 12 u = url % country_code 13 connection = urllib2.urlopen(u) 14 raw = connection.read() 15 structured = json.loads(raw) 16 connection.close() 17 result = {} 18 for entry in structured: 19 year, celsius = entry['year'], entry['data'] 20 result[year] = kelvin(celsius) 21 return result 22 23 def main(first_country, second_country): 24 '''Show ratio of average temperatures for two countries over time.''' 25 first = get_temps(first_country) 26 second = get_temps(second_country) 27 assert len(first) == len(second), 'Length mis-match in results' 28 keys = first.keys() 29 keys.sort() 30 for k in keys: 31 print k, first[k] / second[k] 32 33 if __name__ == '__main__': 34 first_country = 'AUS' 35 second_country = 'CAN' 36 if len(sys.argv) > 1: 37 first_country = sys.argv[1] 38 if len(sys.argv) > 2: 39 second_country = sys.argv[2] 40 main(first_country, second_country) That's right: two lines out of 36 non-blank lines. There is one other statement in the program that does a simple assignment (line 11), but the string that's being assigned contains '%s', because it's used as a formatting template on the very next line, and we haven't covered string formatting yet. OK, let's move on to lesson 2: lists, indexing, and for loops. How much can we understand now? 01 import sys 02 import urllib2 03 import json 04 05 def kelvin(celsius): 06 '''Convert degrees C to degrees K.''' 07 return celsius + 273.15 08 09 def get_temps(country_code): 10 '''Get annual temperatures for a country.''' 11 url = 'http://climatedataapi.worldbank.org/climateweb/rest/v1/country/cru/tas/year/%s' 12 u = url % country_code 13 connection = urllib2.urlopen(u) 14 raw = connection.read() 15 structured = json.loads(raw) 16 connection.close() 17 result = {} 18 for entry in structured: 19 year, celsius = entry['year'], entry['data'] 20 result[year] = kelvin(celsius) 21 return result 22 23 def main(first_country, second_country): 24 '''Show ratio of average temperatures for two countries over time.''' 25 first = get_temps(first_country) 26 second = get_temps(second_country) 27 assert len(first) == len(second), 'Length mis-match in results' 28 keys = first.keys() 29 keys.sort() 30 for k in keys: 31 print k, first[k] / second[k] 32 33 if __name__ == '__main__': 34 first_country = 'AUS' 35 second_country = 'CAN' 36 if len(sys.argv) > 1: 37 first_country = sys.argv[1] 38 if len(sys.argv) > 2: 39 second_country = sys.argv[2] 40 main(first_country, second_country) That didn't help much. There are plenty of places where we subscript, but the thing being subscripted is always either a dictionary or sys.argv, neither of which we've covered. We could change the order in which we teach things in order to get more coverage early on, but that would be cheating: I'm deliberately sticking to our usual order, and using an example that shows what a real scientist might want to use Python for in real life. Lesson 3 is typically all about functions, and since we're trying to teach people good programming practice, we'll introduce docstrings and assertions at the same time. And if we're doing that, let's throw string formatting into the mix, which gives us: 01 import sys 02 import urllib2 03 import json 04 05 def kelvin(celsius): 06 '''Convert degrees C to degrees K.''' 07 return celsius + 273.15 08 09 def get_temps(country_code): 10 '''Get annual temperatures for a country.''' 11 url = 'http://climatedataapi.worldbank.org/climateweb/rest/v1/country/cru/tas/year/%s' 12 u = url % country_code 13 connection = urllib2.urlopen(u) 14 raw = connection.read() 15 structured = json.loads(raw) 16 connection.close() 17 result = {} 18 for entry in structured: 19 year, celsius = entry['year'], entry['data'] 20 result[year] = kelvin(celsius) 21 return result 22 23 def main(first_country, second_country): 24 '''Show ratio of average temperatures for two countries over time.''' 25 first = get_temps(first_country) 26 second = get_temps(second_country) 27 assert len(first) == len(second), 'Length mis-match in results' 28 keys = first.keys() 29 keys.sort() 30 for k in keys: 31 print k, first[k] / second[k] 32 33 if __name__ == '__main__': 34 first_country = 'AUS' 35 second_country = 'CAN' 36 if len(sys.argv) > 1: 37 first_country = sys.argv[1] 38 if len(sys.argv) > 2: 39 second_country = sys.argv[2] 40 main(first_country, second_country) 16 out of 36 lines after three lessons might feel like progress, but there's still only one lump of the program (the Celsius to Kelvin conversion function) that we understand in its entirety. Let's throw libraries into the mix as lesson 4 and see what happens once learners have this.that, sys.argv, and __name__ in their heads: 01 import sys 02 import urllib2 03 import json 04 05 def kelvin(celsius): 06 '''Convert degrees C to degrees K.''' 07 return celsius + 273.15 08 09 def get_temps(country_code): 10 '''Get annual temperatures for a country.''' 11 url = 'http://climatedataapi.worldbank.org/climateweb/rest/v1/country/cru/tas/year/%s' 12 u = url % country_code 13 connection = urllib2.urlopen(u) 14 raw = connection.read() 15 structured = json.loads(raw) 16 connection.close() 17 result = {} 18 for entry in structured: 19 year, celsius = entry['year'], entry['data'] 20 result[year] = kelvin(celsius) 21 return result 22 23 def main(first_country, second_country): 24 '''Show ratio of average temperatures for two countries over time.''' 25 first = get_temps(first_country) 26 second = get_temps(second_country) 27 assert len(first) == len(second), 'Length mis-match in results' 28 keys = first.keys() 29 keys.sort() 30 for k in keys: 31 print k, first[k] / second[k] 32 33 if __name__ == '__main__': 34 first_country = 'AUS' 35 second_country = 'CAN' 36 if len(sys.argv) > 1: 37 first_country = sys.argv[1] 38 if len(sys.argv) > 2: 39 second_country = sys.argv[2] 40 main(first_country, second_country) And now conditionals as lesson 5: 01 import sys 02 import urllib2 03 import json 04 05 def kelvin(celsius): 06 '''Convert degrees C to degrees K.''' 07 return celsius + 273.15 08 09 def get_temps(country_code): 10 '''Get annual temperatures for a country.''' 11 url = 'http://climatedataapi.worldbank.org/climateweb/rest/v1/country/cru/tas/year/%s' 12 u = url % country_code 13 connection = urllib2.urlopen(u) 14 raw = connection.read() 15 structured = json.loads(raw) 16 connection.close() 17 result = {} 18 for entry in structured: 19 year, celsius = entry['year'], entry['data'] 20 result[year] = kelvin(celsius) 21 return result 22 23 def main(first_country, second_country): 24 '''Show ratio of average temperatures for two countries over time.''' 25 first = get_temps(first_country) 26 second = get_temps(second_country) 27 assert len(first) == len(second), 'Length mis-match in results' 28 keys = first.keys() 29 keys.sort() 30 for k in keys: 31 print k, first[k] / second[k] 32 33 if __name__ == '__main__': 34 first_country = 'AUS' 35 second_country = 'CAN' 36 if len(sys.argv) > 1: 37 first_country = sys.argv[1] 38 if len(sys.argv) > 2: 39 second_country = sys.argv[2] 40 main(first_country, second_country) and dictionaries as lesson 6: 01 import sys 02 import urllib2 03 import json 04 05 def kelvin(celsius): 06 '''Convert degrees C to degrees K.''' 07 return celsius + 273.15 08 09 def get_temps(country_code): 10 '''Get annual temperatures for a country.''' 11 url = 'http://climatedataapi.worldbank.org/climateweb/rest/v1/country/cru/tas/year/%s' 12 u = url % country_code 13 connection = urllib2.urlopen(u) 14 raw = connection.read() 15 structured = json.loads(raw) 16 connection.close() 17 result = {} 18 for entry in structured: 19 year, celsius = entry['year'], entry['data'] 20 result[year] = kelvin(celsius) 21 return result 22 23 def main(first_country, second_country): 24 '''Show ratio of average temperatures for two countries over time.''' 25 first = get_temps(first_country) 26 second = get_temps(second_country) 27 assert len(first) == len(second), 'Length mis-match in results' 28 keys = first.keys() 29 keys.sort() 30 for k in keys: 31 print k, first[k] / second[k] 32 33 if __name__ == '__main__': 34 first_country = 'AUS' 35 second_country = 'CAN' 36 if len(sys.argv) > 1: 37 first_country = sys.argv[1] 38 if len(sys.argv) > 2: 39 second_country = sys.argv[2] 40 main(first_country, second_country) Six lessons, with a practical exercise after each, and our learners can finally do something they might find useful. Getting through all that takes four hours, i.e., if we start at 9:00, we're finished by 2:00 (assuming we take a break for lunch). That's might not sound bad, considering how mature our learners are. But look at the striping—look how long it takes to assemble enough pieces for learners to completely understand any of this program's natural chunks. Over and over again, we have to say, "Trust us, this will prove useful later," and that kind of delayed gratification makes it harder for learners to put the pieces together correctly in their heads. So let's pick an example that comes together earlier, like averaging a column out of a table of numbers stored in a file: 01 import sys 02 03 def main(filename, column): 04 reader = open(filename, 'r') 05 total = 0.0 06 count = 0 07 for line in reader: 08 fields = line.strip().split(',') 09 assert column < len(fields), 'Not enough fields' 10 count += 1 11 total += float(fields[column]) 12 assert count > 0, 'No data found' 13 print total / count 14 15 if __name__ == '__main__': 16 assert len(sys.argv) == 2, 'Filename and column number required' 17 column = int(sys.argv[2]) 18 assert column >= 0, 'Non-negative column number required' 19 main(sys.argv[1], column) We can get to payoff one lesson earlier in this case, but (and it's a very big "but") it's a bad example: scientists shouldn't parse CSV files themselves, they should use libraries that already know how to do it: 01 import sys 02 import numpy 03 04 assert len(sys.argv) == 2, 'Filename and column number required' 05 column = int(sys.argv[2]) 06 values = numpy.loadtxt(sys.argv[1]) 07 print numpy.average(values, 0)[column] The problem is, if we only show people these high-level tools, they don't learn how to build new tools of their own. And that is why teaching people to program is hard. Later: it's clear from comments that I explained this poorly. I've put a clarifying note at the top of this post, and I'll take another run at it if and when I come up with a clearer approach. Read More ›

Feedback from Lawrence Berkeley National Lab
Matt Davis / 2012-10-20
Katy Huff, Justin Kitzes, and I wrapped up our LBL workshop yesterday. We had ~25 participants with a broad range of backgrounds and levels of experience. Here's our traditional table of good/bad feedback: Good Bad can use python better don't know how to apply git to work covered lots of material covered too much material can use git better now group felt too large for number of helpers covered from basics to advanced programming felt unprepared for git instruction helpers were knowledgeable and nice couldn't run VM, add system reqs motivated version control well python level was too easy started with basics, had good help need well defined goals for exercises got exposed to a lot group people by experience all the material is available on GitHub would have liked a more detailed description of course learned enough to get started didn't get any emails on the waitlist have hope getting info ahead of time good overview for later whole thing too fast format - work through code live topics jumped around testing cases maybe too language (Python) dependent, no objects self-consistent complete package burnt out at end of first day (fast) inspirational 9-4:30 draining, more breaks between topics great intro to vocabulary, how programmers think didn't expect to program at end, was right gained confidence from seeing experts switch morning git with afternoon python on 2nd day seeing github, advice on resources not clear that mac users needed to install software ahead, why certain steps had to do exercises, version control in beginner class some hard to follow along, relies on previous success with commands instructors :-D, with real-world experience more relevant examples to audience, day to day data/problems demo writing, troubleshooting, testing very valuable what was xcode for? falling behind in python - 2 tiered exercises documentation needs comments is there help for release management? advanced math? third day - integrative exercise, case study - start to finish follow up project, final suggested homework assignment The good news is that they liked our material and they liked us (and seemed to think we knew what we were talking about). The bad news is a lot of them had a hard time keeping up and a couple complained it was too slow. These are typical complaints of our shotgun approach to teaching to a random sampling of scientists. We'd like to do more discipline specific bootcamps where we can tailor things better. Another typical complaint was they they aren't sure how to apply what they've learned to their research. We're still working on that one. Read More ›

I Screwed Up (or, Why Automation Isn't Always a Good Thing)
Greg Wilson / 2012-10-17
A couple of months ago, a group of astronomers asked us to run a workshop at Caltech in conjunction with their annual get-together: they'd cover travel costs for two instructors, and seats at the workshop would be reserved for them. Which would have been fine, except when I pushed the button to create the workshop, I also created an EventBrite signup page and a Software Carpentry bootcamp page with open enrollment. As a result, I had to send email to 15 people last night to say that there isn't actually room. I'm very sorry for my mistake, particularly as some would-be participants moved other commitments so that they could attend. I'll try to arrange another workshop in the area as quickly as possible, and put everyone who was bumped at the front of the line for it. I'll also try to put some safety checks into my workshop setup process (or better yet, not try to set things up at 11:00 pm after a long day at work). :-( Read More ›

Why We Teach Version Control
Greg Wilson / 2012-10-12
Read More ›

Rebuilding Redux
Greg Wilson / 2012-10-12
The time has come to replace our creaking combination of web tools with something that will let us do more for more people while spending less time switching between browser tabs. To recap an earlier post, we currently use: WordPress for our blog, web site pages, and comments; YouTube to host our videos (links to which are embedded in the WordPress pages); Subversion (administered via the Dreamhost control panel) to manage our slides, code examples, and other files; Mailman (administered the same way) to manage mailing lists; a Google Map to show the locations of past and future bootcamps; a Google Calendar to show their dates; Google Analytics to track web site traffic; EventBrite to handle event signup; and OpenBadges to keep track of who's accomplished what (currently in beta). We also use Vidyo for video conferencing (primarily tutorials), and run Etherpad beside it so that people have a text channel for asking questions, posting links and code fragments, taking minutes, and so on. As I pointed out in that aforementioned post, this means admins need at least six different logins to get things done, and need to copy a lot of information from place to place. For example, whenever we commit to a new bootcamp, someone needs to: create a new event on EventBrite; create a new page for it on WordPress; reorder the WordPress pages so that the new event page is in the right place chronologically; make sure that page links to the right EventBrite event; update the overall schedule page in WordPress; put a new pin on the Google Map; and add the event to our calendar. And then there are the mailing lists. In theory, they're nested (developers are a subset of potential tutors, tutors are a subset of interested parties, they're a subset of everyone who gets announcements), but since Mailman doesn't do nested lists, people have to be added to as many as five lists separately (the fifth being geographical, e.g., a list for everyone in the UK). I'm pretty sure those lists are inconsistent right now, but tracking down and fixing those inconsistencies would take time we don't have. In theory, the solution is to use some sort of content management system (CMS) to manage all the cross-referencing and maintain consistency. In theory, all an admin should have to do is fill in a form with a bootcamp's name, location, and date, and blammo—everything would be updated to reflect the change in the database. In practice, though, we haven't found a CMS that does the things we need: Plone and Drupal and what-not have some of the features we want, but not all of them. Neither do learning management systems (LMSes) like Moodle and Sakai: they implicitly assume multiple courses, with assignments, in one geographic location, so things like the geolocation of a set of events, or signup/registrations for them, aren't there out of the box. (They also add a couple of layers we don't want: as far as I can tell, there's no way for us to avoid having a course menu with exactly one course in it.) The other option would be to stop treating this as a content management problem and start treating it as a programming problem. (I have this hammer...) Titus Brown's blog uses Pelican, which compiles Jinja2 template files to create or update static pages, and Disqus to handle commenting. Since EventBrite and Google Whatever have web APIs, we could write some tools to pull content from a version control repository (for example, on GitHub), update pages that had actually changed, make sure that external services like EventBrite and Google are in sync, and so on. I honestly don't know which option to pursue. I really like the idea that content is in a repository rather than in a database, since that will make it easier for people to contribute new material (merging rocks), but it would take me 2-3 weeks to build what's needed (most of which would be spent figuring out how, testing that I got it right, and then creating run-once tools to import existing content). On the other hand, tools like Plone and Drupal have large developer communities, who have written hundreds of extensions, and will write hundreds more in future, so if we opt for one of those, the odds are that the next time we need something like a dead link checker or Gravatar display, we won't have to build it ourselves. As is so often the case, we'll probably make our choice based on who volunteers to help us first. Read More ›

Purdue
Mike Hansen / 2012-10-10
Here's the feedback we received for our workshop in Purdue this week. We discussed the shell and Python on day 1, and then git and debugging/testing/documentation on day 2. Good Interactivity (technology and people) Day 2 topics (version control, testing and documentation) Hints about what's out there Overall breadth Hands-on VirtualBox and using a Virtual Machine Topics — git and shell Online materials, breadth Testing Python Good depth, overview Documenting the bootcamp online Interesting content Good for beginners Examples on day 1 (shell, Python) Organization Tips about what not to do Bad Too basic for some Jump up in difficulty on day 2 VirtualBox problems Wanted Windows-specific examples Wanted C++ examples Wanted advantages/disadvantages of Python data structures Python subtleties not covered for beginners (when to use a colon) Too fast to follow sometimes (need to pause before executing commands) Wanted specific skills (i.e., Python) Wanted to know more about Valgrind Version control (too in depth — branches) git command-line was overwhelming (Github was good) More exercises, homework? Detailed schedule (so we can cherry-pick topics) Too fast sometimes Slides No printed documentation for notes What about SVN users? Read More ›

Dark Matter, Public Health, and Scientific Computing
Greg Wilson / 2012-10-10
This is the text of a talk given at the 8th IEEE International Conference on eScience, October 10, 2012. The slides are also available. Back in March, Scott Hanselman wrote a blog post titled Dark Matter Developers: The Unseen 99% that crystallized something I'd been thinking about for a while. In it, he said: [We] hypothesize that there is another kind of developer than the ones we meet all the time. We call them Dark Matter Developers. They don't read a lot of blogs, they never write blogs, they don't go to user groups, they don't tweet or facebook, and you don't often see them at large conferences... [A]s one of the loud-online-pushing-things-forward 1%, I might think I need to find these Dark Matter Developers and explain to them how they need to get online! Join the community! Get a blog, start changing stuff, mix it up! But...those dark matter 99% have a lot to teach us about GETTING STUFF DONE... They aren't chasing the latest beta or pushing any limits, they are just producing. I'm not as optimistic as Scott, at least, not when it comes to scientific computing. I agree that 95% spend their time with their heads down, working hard, instead of talking about using GPU clouds to personalize collaborative management of reproducible peta-scale workflows, or some other permutation of currently-fashionable buzzwords. It isn't even because they don't know there's a better way. It's because for them, that better way is out of their reach. Let me back up a few years. In 1997, while I was on holiday in Venezuela, that country took delivery of its first CT scanner. It was the lead story on the evening news, complete with a few seconds of video showing a military convoy escorting the device to the university hospital. Why a military convoy? Because to get from the airport to the center of the city, the truck carrying the scanner had to pass through a slum where three quarters of a million people didn't have clean water, much less first-world health care. That image has stuck in my head ever since because it's the most accurate summary of the state of scientific computing that I know. While you are here talking about the CT scanners of computational science, the 95% (and yes, I do think it is 95%) are suffering from computational dysentery. If you think I'm exaggerating, ask yourself: How many graduate students write shell scripts to analyze data sets in batches instead of running those analyses manually? How many use version control to track what they've done and collaborate with colleagues? (In the largest computer science department in Canada, the answer is only 10%.) How many of them routinely and instinctively break large computational problems down into pieces small enough to be comprehensible, testable, and reusable? For bonus marks, how many of them know those are really all the same thing? Now, you could say this isn't your problem, but you'd be wrong: it's actually the biggest problem you have. Why? Because if people are chronically malnourished, giving them access to a CT scanner when they're in their twenties doesn't make a damn bit of difference to their well being. And in many ways, you're the biggest problem they have. Why? Because you're the only "real" programmers they know, and when they come to you and ask for clean water, your answer is, "Let's talk about brain scans. They're cool." If you set aside googling for things, the overwhelming majority of scientists don't use computers any more effectively today than they did twenty-five years ago. They're no more likely to know that routine tasks can be automated; they're no more likely to understand the difference between structured and unstructured data, and it takes them just as long to write a 300-line data analysis script as it did when people would actually get a little giddy at the thought of having a 16 megahertz Sun-3 workstation with 8 megabytes of RAM on their desk. Let's pause for a moment and fill in some details. First, is it true that only a few percent of research scientists are computationally competent? As I said, I don't have data to put in front of you, but I've been helping scientists of all kinds do computational work since 1986, not just at supercomputing centers. Working in those gives you a biased view of the world, just like working in the CT lab at a third-world hospital whose patients can all afford first-world health care gives you a biased view of how the general population is doing. And I've been teaching scientists at universities and government labs as a full-time job for most of the last two and a half years, and talking to a wide variety of people who are doing the same thing. One percent would be pessimistic hyperbole, but there's no way the actual number is more than five percent. Second, what do I actually mean by "computationally competent"? We've all heard of "computational thinking", but that phrase has been completely devalued by people jumping on a bandwagon without actually changing direction. When I say that someone is computationally competent, I mean the same thing I mean when I say they're statistically competent: they know enough to do routine tasks without breaking a sweat, where to look to find answers they can understand to harder problems, and when to go and find an expert to solve their problems for them. More specifically, I think a scientist is computationally competent if she knows how to build, use, validate, and share software to: manage and process data, tell if it's been processed correctly, find and fix problems when it hasn't been, keep track of what she's done, share work with others, and do all of these things efficiently. You can't do these things without understanding some fundamental concepts—that's what "computational thinking" would mean if it still meant anything. But mastering those concepts is intrinsically dependent on mastering the tools used to put them into practice: you cannot use tools effectively you're working by rote, but equally, you cannot grasp abstractions without concrete examples to hang on to. Are you computationally competent? Let's find out. Please grab a pen and a piece of paper, or shut down Facebook and open an editor instead. I'm going to show an outline of the "driver's license" exam we put together for physicists who want to use the new DiRAC supercomputing facility. I won't ask you to actually answer the questions; instead, I'll show you what you need to do in order to get full marks. For each step, I'd like you to give yourself one point if you're sure you could it, half a point if you think you might come up with a solution after some struggle, zero if you're sure you couldn't, and -1 if you don't understand what it says. Ready? Here goes. Question 1: Check out a working copy of the examination materials from Subversion. Question 2: Use find and grep together in a single command to create a list of all .dat files in the working copy, and redirect the output to create a file called all-dat-files.txt, then commit that file to the repository. Question 3: Write a shell script that takes one or more numbers as command-line parameters and runs a legacy Python program once for each number. Question 4: Edit a Makefile so that if any .dat file in the input directory changes, the program analyze.py is run to create a corresponding .out file. Question 5: Write four tests using an xUnit-style unit testing framework for a function that calculates running totals. Explain why you think your four tests are the most likely to uncover bugs in the function. Question 6: Explain when and how the function that calculates running totals might still produce wrong answers, even though it passes your tests. Question 7: Do a code review of the legacy program used in Question 3 (which is about 50 lines long) and describe the four most important improvements you would make to it. How many of you think you'd get 7 out of 7? How many would get at least 5? How many had positive scores? Now, how many think the median score among graduate students in science and engineering would be non-negative? And before we go on: the point of the exam isn't the specific tools. We could use Git instead of Subversion, or MATLAB instead of Python, and in fact, we're preparing variants of the exam to do exactly that. Ten years from now, the exam might allow for direct neural interfaces, but the core ideas of automating repetitive tasks and being able to tell good code from bad will, I think, remain the same. Now, do you think that someone could use that GPU provenance peta-cloud without knowing how to do the things this test assesses? More importantly, do you think that someone who doesn't have these skills, and doesn't understand the concepts they embody, will be able to debug that GPU provenance whatever when something goes wrong? Or think of new ways to use it to advance their research? Because the real point isn't to give scientists a handful of tools—the real point is to give them what they need to build tools for themselves. And if you're only helping the small minority of scientists lucky enough to have acquired the skills that mastering your shiny toy depends on, your potential user base is many times smaller than it could be. All right: now that we've diagnosed the problem, the cure seems obvious. All we have to do is get universities to put more computing in their undergrad programs. However, we've been banging that drum for at least twenty-five years now, with no real success. Yes, there are a few programs in physics and computing or bioinformatics, but having worked with a few of their graduates, I don't think those programs do any better than the "soak it up by osmosis in grad school" approach. The problem is that everyone's curriculum is already full to bursting. If we want to put more computing into a four-year undergrad program in chemistry, we have to drop—what? Thermodynamics, or quantum mechanics? And please don't pretend that we can just put a bit into every course. Number one, five minutes out of every lecture hour adds up to four courses over the span of a degree. Second, those five minutes will be the first thing dropped when the lecturer is running late. And third, are you familiar with the phrase "the blind leading the blind"? Ah, but we have an Internet! Everything scientists need to know is online, and there are now dozens of free online courses as well. But neither forums nor a MESS (Massively Enhanced Sage on the Stage) are effective for most novices who are still trying to construct the conceptual categories they need to have before they can assimilate mere information. Somebody needs to get these people from A to B so that they can get themselves from B to M, Z, θ, and beyond. The only thing that works—at least, the only thing that has worked for us in fourteen years of experimentation—is to give graduate students a few days of intensive training in practical skills followed by a few weeks of slower-paced instruction. Let's break that down: Target graduate students students because they have an immediate personal need (particularly if they're six months or a year into their research and have realized just how painful it's going to be to brute force their way to a solution), and because they have time (which faculty usually don't). Teach them for a few days of intensive training because that's what they can actually schedule. At the low end, Software Carpentry's workshops are two days long (three if the host adds a day of discipline-specific material at the end). At the high end, Titus Brown's Next Generation Sequencing course at Michigan State runs for two weeks, which means there's time for volleyball and beer. Anything less than two days, and you can't cover enough to make it worthwhile. Anything more than two weeks, and people can't put the rest of their lives aside to attend. Focus on practical skills so that they see benefits immediately. That way, when we come to them and say, "Here's something that's going to take a little longer to pay off," they're more likely to trust us enough to invest the required time. Follow up with a few weeks of slower-paced instruction, such as meeting once a week for an hour to work through a few problems. We've tried doing this with online video conferencing, and while that's better than nothing, it's like old dishwater compared to the hearty organic beer of sitting side by side. What do we actually teach? It depends on the audience, but our core is: The Unix shell. We only cover a dozen basic commands; our real aim is to introduce people to pipes, loops, history, and the idea of scripting. Python. Here, our goal is to show them how to build components for use in pipelines (so that they'll see there's no magic), and when and why to break code into functions. Version control, for collaboration and reproducibility. Testing. We teach them to use tests to specify behavior and make refactoring safe as well as to check correctness. And we usually include one other topic as well, like a quick intro to SQL or matrix programming, depending on the audience and how much time is available. All of this is "merely useful". It's certainly not publishable any longer, which means that by definition, it's not interesting to most computer scientists from a career point of view. However, two independent assessments have found that it's enough to set between a third and two thirds of scientists on the road that leads to those reproducible peta-scale GPU cloud workflows I mentioned earlier. Even if you take the lower of those two figures, that's a six-fold increase in the number of people who understand what you're trying to do, and are able to take advantage of it. If you think that's not going to help your project, you're either incredibly arrogant, hopelessly naive, independently wealthy, or a die-hard Lisp programmer. Anatole France once wrote, "The law, in its majestic equality, forbids the rich and the poor alike to sleep under bridges, to beg in the streets, and to steal bread." Thanks to modern computers, every scientist can now devote her working life to wrestling with installation and configuration issues that she doesn't have the conceptual tools to deal with effectively. You can help. In fact, we can't succeed without your help. As Terry Pratchett said, "If you build a man a fire, you'll keep him warm for a night. If you set a man on fire, you'll keep him warm for the rest of his life." The first thing you can do is host a workshop. A growing number of our alumni have become instructors in their own right—there are even a few here in the audience today. They're all volunteers, so the only cost is a couple of plane tickets, a couple of hotel rooms, and a few pots of coffee. If you're willing to book a room and do some advertising, we can send people to you to get things started. This will particularly help those of you in support roles: the people who've been through workshops probably won't ask fewer questions, but they'll certainly ask better ones. The second thing you can do is teach a workshop yourself. All of our materials are open license, and we will help teach you how to use them, and how to teach more effectively in general. Finally, you can help shine some light on the "dark matter" of scientific computing. There's a lot of discussion now about requiring scientists to share their software. What I'd like even more is for scientists to share their computational practices. I'd like every paper I review to include a few lines telling me where the version control repository holding the code is, what percentage of the code is exercised by unit tests, whether the analyses we're being shown were automated or done by hand, and so on. I'm not suggesting that we should require people to meet any particular targets—not yet, anyway—but the first step in any public health campaign has to be finding out how many people are sick with what. To conclude, it isn't really a choice between increasing the productivity of the top 5% of scientists ten-fold or doubling the productivity of the other 95%. It's really a choice between seeing your best ideas left on the shelf because they're out of most scientists' reach, or raising up a generation of scientists who can all do the things we think are exciting? A few months shy of my fiftieth birthday, with a wonderful little girl at home who's going to inherit all the problems we didn't get around to solving, and my sister eight months dead from cancer, I know which matters more to me. If you'd like to help, please visit our web site or mail us at team@carpentries.org. We look forward to hearing from you. Read More ›

UCL Researchers to Get Help with Software Development
Ben Waugh / 2012-10-05
University College London (UCL) has hired a team leader to start a three-person team for Research Software Development. James Hetherington will spend a lot of his time in the next few months talking to those of us working in various areas of research, so we will get to provide some input into the team's activities. However, we do know what they will be offering advice and consultation on software development best practices, including the fundamentals as well as the headline-grabbing HPC work, and hosting version control and issue tracking services. Read More ›

Convergent Evolution
Greg Wilson / 2012-10-05
Earlier this week, a Google search turned up a Software Carpentry workshop at the ICHEC in Dublin—which was surprising, because we didn't know they were running one. It turns out they'd reinvented the term independently, though their "carpentry" was a lot more advanced than ours—according to ICHEC's Simon Wong: Half of the workshop was on software development concept and techniques but specifically aimed at deployment on the cloud, e.g. test-driven development, code management, SQL vs NoSQL databases. The rest of the workshop was more getting the attendees familiarised with various Amazon Web Services (e.g. Elastic MapReduce) They're planning more workshops of the same kind, and we're hoping to mount one of ours their in the near future as well. Read More ›

Wanted: An Entry-Level Provenance Library
Greg Wilson / 2012-10-04
One of the reason we keep teaching Subversion is that it allows us to show students a simple but useful trick. If you add the following to a text file: $Revision: $ and then tell Subversion to set the "Revision" keyword on that file, the next time you commit it, Subversion will automatically update the text to: $Revision: 423$ or whatever the revision number actually is. This is handy if you're mailing files around, and want people to be able to tell exactly which revision they have, but what makes it really useful is this: 1. Embed the revision number in a string: version_string = "$Revision: 423$" 2. Extract it (I'll show the Python, but the trick works in any language): version_number = int(version_string.strip("$").split()[1]) # version_number is now 423 3. Print this as a comment at the start of any output, along with parameters: print '#', sys.argv[0], version_number print '# ...alpha', alpha print '# ...beta', beta for result in all_results: print result so that the program's output is: # analyze.py 423 # ...alpha 0.5 # ...beta 1.7 22,43,17.5 22,44,18.5 ...,...,... # and so on This is a quick and easy way to keep track of the provenance of the data: if done systematically, it ensures that every result contains a record of how it was produced. Of course, a real provenance system needs to do more than this: it needs to track the inputs to the program, so that if analyze.py was run something preprocess.py produced, we can trace backward from analyze.py's output all the way to preprocess.py. There was an abortive effort a few years ago to standardize provenance information, but it got bogged down in XML schemas and ontologies and all the other details that standards committees love and working scientists find irrelevant. What the scientists we're trying to help actually need right now is something a lot simpler: a suite of inter-operable libraries for various languages that are no more complicated than the various xUnit libraries for testing, or the argparse and CLI libraries for parsing command-line arguments in Python and Java respectively. It's OK if those libraries don't capture all the information that anyone might conceivably want; what's most important is that they capture enough to be useful, with close to no effort on the scientist's part, so that we can get this ball rolling. If this sounds like something you'd be interested in helping with, please give us a shout. It would be a good contribution to the scientific programming community, and a good way to meet other believers in better scientific software. Read More ›

Transitioning to the IPython Notebook
Matt Davis / 2012-10-04
Here at Software Carpentry we've been teaching Python for a little while. We teach Python in our bootcamps and we have a library of instructional videos on Python topics including basics, dictionaries, regular expressions, and object oriented programming. As usual, though, we're not satisfied with the status quo. For both our bootcamps and our online material we're in the process of switching to the IPython Notebook. The Notebook lets the user write and execute Python code and see the results, all inside their favorite web browser. Bootcamps Software Carpentry bootcamp attendees are typically novice programmers who have yet to master many basic computational concepts. A priority when teaching Python to this audience is minimizing their mental load as they try to understand core concepts like loops, control flow, and functions. The Notebook offers several advantages for this audience while teaching Python in live, classroom settings: Code is written and run within the Notebook, eliminating any need to switch between an editor and the command line, or learn an IDE. As in the standard interactive Python prompt, code and output appear next to each other and it is easy to see the effect of changes to the code. Unlike the standard interactive prompt, the Notebook allows for easy editing and repeat running of multi-line code blocks. It also has some of the convenient features of code editors such as syntax highlighting, auto-completion, and automatic indenting. Notebooks have multiple ways of showing documentation while typing code. Notebooks have a consistent appearance across browsers and platforms so that students see instructors demonstrating concepts in the same environment they are using. Changing the magnification of notebooks for different projectors and rooms is as easy as changing the zoom for a browser page. These features allow students to spend nearly all of a learning session in the Notebook writing Python with a minimum of other distractions. The notebooks in which students work can be easily saved and reopened for later review. Online Material Software Carpentry's online material serves a broad audience from those interested in learning basic Python to those looking to learn more advanced topics like regular expressions or object oriented programming. Our current video based lessons can be effective for some, but videos make it difficult to browse or scan the material and code cannot be easily copied for later exploration or use. The IPython Notebook offers a wonderful format for these archived lessons for a number of reasons: Notebooks can have rich annotations including rendered Markdown and equations, allowing us to explain code more expressively than is possible in plain code comments. Notebooks are easily viewable online as rendered HTML thanks to the nbviewer service. Notebooks can be downloaded and rerun by our students should they wish to experiment. Notebooks can include embedded video, so we can still use video to provide broad introductions to concepts or as an alternative method for learning the material. We are just beginning the process of converting our Python material to the IPython Notebook, but we currently have examples based on our while loop lecture and for graphing in Python. In the future we hope to have all of our current material available as Notebooks on our GitHub repository. University Classrooms The same benefits for presenting online material for the Software Carpentry site also extend to the providing lecture notes for university courses. When demonstrating programming concepts in class it is often beneficial for the students to be able to focus fully on what is being done, rather than attempting to take detailed notes on the specific examples being demonstrated. Notebooks can be posted to the web to allow students to go back and review the details of the implementation later. These can either be the actual Notebooks used in the classroom, giving the student an exact record of everything that was shown, or a pre-prepared notebook with similar material. One of us (EPW) has had reasonable good responses from students to using Notebooks in this fashion in two university courses. As with workshops, one of the other major challenges for teaching beginning programmers in university courses is choosing a development environment that introduces as little additional cognitive load as possible. Based on Ethan's experiences this year he is planning on transitioning his introductory course from having students work in an Integrated Development Environment to having them work exclusively in Notebooks. Even in a simple IDE there is always substantial confusion about the differences between how the interactive prompt works and how it differs from code run from the editor. By switching to Notebooks the students will experience the benefits of both (as described above) without the additional load of trying to understand two different environments for running Python code. (This post co-written by Matt Davis and Ethan White.) Read More ›

Best Practices for Scientific Computing
Greg Wilson / 2012-10-03
The following pre-print is now available on arXiv: Best Practices for Scientific Computing D.A. Aruliaha, C. Titus Brownb, Neil P. Chue Hongc, Matt Davisd, Richard T. Guye, Steven H.D. Haddockf, Katy Huffg, Ian Mitchellh, Mark Plumbleyi, Ben Waughj, Ethan P. Whitek, Greg Wilsonl, and Paul Wilsong. aUniversity of Ontario Institute of Technology, bMichigan State University, cSoftware Sustainability Institute, dSpace Telescope Science Institute, eUniversity of Toronto, fMonterey Bay Aquarium Research Institute, gUniversity of Wisconsin, hUniversity of British Columbia, iQueen Mary University London, jUniversity College London, kUtah State University, and lSoftware Carpentry. Abstract Scientists spend an increasing amount of time building and using software. However, most scientists are never taught how to do this efficiently. As a result, many are unaware of tools and practices that would allow them to write more reliable and maintainable code with less effort. We describe a set of best practices for scientific software development that have solid foundations in research and experience, and that improve scientists' productivity and the reliability of their software. Write programs for people, not computers. A program should not require its readers to hold more than a handful of facts in memory at once. Names should be consistent, distinctive, and meaningful. Code style and formatting should be consistent. All aspects of software development should be broken down into tasks roughly an hour long. Automate repetitive tasks. Rely on the computer to repeat tasks. Save recent commands in a file for re-use. Use a build tool to automate their scientific workflows. Use the computer to record history. Software tools should be used to track computational work automatically. Make incremental changes. Work in small steps with frequent feedback and course correction. Use version control. Use a version control system. Everything that has been created manually should be put in version control. Don't repeat yourself (or others). Every piece of data must have a single authoritative representation in the system. Code should be modularized rather than copied and pasted. Re-use code instead of rewriting it. Plan for mistakes. Add assertions to programs to check their operation. Use an off-the-shelf unit testing library. Turn bugs into test cases. Use a symbolic debugger. Optimize software only after it works correctly. Use a profiler to identify bottlenecks}. Write code in the highest-level language possible. Document the design and purpose of code rather than its mechanics. Document interfaces and reasons, not implementations. Refactor code instead of explaining how it works. Embed the documentation for a piece of software in that software. Conduct code reviews. Use code review and pair programming when bringing someone new up to speed and when tackling particularly tricky design, coding, and debugging problems. Use an issue tracking tool. Read More ›

How to Help at a Bootcamp
Katy Huff / 2012-10-02
When we lead a bootcamp of more than, say, a dozen people, extra helpers are essential to making everything go smoothly. As our guide to running a bootcamp says: It's ideal to have one helper for every five attendees. Recruit helpers locally, from your own and neighbouring institutions. Helpers should have some knowledge of the material covered in the bootcamp. Ideally, a helper will already have attended a bootcamp and would like to become an instructor themselves. So, the question arises, what does a helper do? First of all, the helper shows up. Thanks! Second, the helper hangs out in the instruction room during the lecture waiting for hands to go up. Finally, the helper enjoys the workshop, sees what it takes to lead one, and perhaps decides to become an instructor for a future workshop. The bulk of the helper's job is the second bit and is the reasoning for the requirement that they be comfortable with the workshop material (e.g., the shell and version control). The typical in-workshop need for helpers is in this situation: We're all working along swimmingly and a student gets stuck. Maybe they didn't type a command properly or can't find the folder we're working in. Maybe they forgot to import some module before running a python command and now don't understand the error they're getting. Who knows. In a minute or two, they'll be way behind unless they can raise their hand and get someone to reel them back in. Thus, helpers will need to see the hand go up and be able to at least help begin to debug the drama quietly at the computer of the stuck student. A helper with limited comfort with the shell or version control would be unhelpful in this situation. We could ask for helpers will full comfort with our whole software stack, but that's not reasonable. We'd never get any helpers that way. It's usually sufficient for a helper to be someone who can problem-solve on-the-fly in a terminal. They also should be able to follow along conceptually with the lecture so that they can redirect a student who has fallen a few minutes behind. Read More ›

What Would You Like in an Instructor's Guide?
Greg Wilson / 2012-10-01
One of our goals for the next three months is to create an instructor's guide for Software Carpentry to help people teach this material more effectively: something akin to the SSI's "How to run a Software Carpentry bootcamp", but larger, and focused on the content. For example, when we teach the Unix shell (which is normally the morning of the first day), we actually don't introduce many commands—last time I actually checked, I only showed learners pwd, mkdir, cd, ls, rm, mv, cat, head, tail, wc, cut, and history. Instead, our real aims are to give them what they need to get through subsequent programming exercises in Python (i.e., creating data files, moving program files around, etc.), and to: familiarize them with the hierarchical folders-of-folders-of-files nature of the file system show them how tab completion and history let them do more with less typing (and fewer mistakes) show them how they can take what they've done interactively and turn it into a reusable script introduce the notion of combining pieces you have in a uniform way (the pipe and filter model) to create new tools as a way of reducing cognitive load That last point is the big one, but it doesn't appear explicitly anywhere in our online video tutorials (largely because I didn't realize that was the big lesson until earlier this year—it's a classic case of expert blind spot). So my question is, what would you like in an instructor's guide? If you took part in one of our workshops, what would you pass on to someone else who was about to teach the material to help them do a better job? Please send us comments (either on this post, or by email), and thank you in advance. Read More ›

The Real Hard Work
Greg Wilson / 2012-09-30
I spent a couple of thought-provoking hours at Codecademy's office in New York on Thursday, during which my host said, "It's so different from Silicon Valley, where a lack of experience is considered an asset." The next day, while teaching at Columbia, I used one of my favorite sayings, "A week of hard work can sometimes save you an hour of thought," and someone in the audience piped up, "Or an hour of reading." The two comments helped crystallize something that I've been thinking about for several months now about what's hard for creative people to do, and what implications that has. Most people don't like working long hours, or difficult problems. Creative people, on the other hand, thrive on both: they get an endorphin high from losing themselves in something gnarly for days at a time. What they find hard is boredom: if you give them something easy and repetitive, they'll complicate it to make it more enjoyable, find a way to get out of it, or invent a reason why it doesn't need to be done in the first place. And, being creative, they're very good at all three. In particular, given a choice between working around the clock for a month to build a beginner-friendly learn-to-program web site, or sitting down for three days and wading through a dozen papers and reports that describe what people have tried in the past, and how well it worked, most creative people choose to hack. Reading papers is dull, dull, dull, especially when the first nine don't actually say anything relevant (but you couldn't know that without reading them), and the gem in the tenth is buried in a sawdust pile of dry academic prose. So creative people create excuses: "We're not teaching computer scientists." A lot of research has looked at graphic designers, disadvantaged grade-school kids, and people from many other walks of life. "We're not teaching computer science, we're teaching programming." So are a lot of researchers. (Ironically, that's one of the reasons their work tends not to be valued by "real" computer scientists.) "The web changes everything!" No it doesn't—it doesn't change the way brains learn. And anyway, how do you know what it's changed if you don't know what was there before? "But today's schools/teachers/whatever are broken!" That's a gross exaggeration, but even if it wasn't, shouldn't a doctor learn something about an ailment before trying to cure it? "We're too busy." This argument is partly valid: you can't evaluate other people's experience until you have some experience of your own. But looking back at my own projects, I've always kept hacking long past the point when I should have paused for a while to find out what other people had done. I told the students at Columbia that one of the things that distinguishes serious programmers from amateurs and dilettantes is that serious programmers write tests. The politicians who get policies implemented are the ones who master their briefs, the lawyers who win cases are the ones who read the whole contract, and so on. That's the real hard work for people like us, and as Bernd Heinrich said of marathon runners, "The will to win is nothing without the will to prepare." So here's my simplified Audrey Test for tech types interested in education: Have you read the last two years of Mark Guzdial's blog? He does great work, he writes well, and he understands that doing is as important as knowing. Have you read How Learning Works, which condenses decades of research and experience into 300 easy-to-read pages? If the answers are "yes", I'll believe that you're willing to do the real hard work required to help other people learn. Read More ›

Oslo and Columbia
Greg Wilson / 2012-09-30
Here's some feedback from participants in our Oslo workshop: Good Discovered a dishwasher Test-driven development Lots of anecdotes Power plugs and coffee! Jokes (no, really) Reminded me why I should use version control Inspirational Discovering the web page Not slaughtered by Vikings (yet) Good teaching skills Free! Bad Lunch at twelve, please Didn't cover SQL Need to show code and output side by side Couldn't keep up with typing Greg speaks too fast Not good context for some examples Matrix programming lacked exercises and examples More exercises please 34 registered, 15 attended Tell us more about installation Lack of key point summaries Too short And from our Columbia University workshop in New York City (see also this great writeup from the course instructor, Dr. Rachel Schutt): Good Lot of new information Collaboration! Pair programming Programming can be elegant! Many practical hints on productivity New Python tricks Liked evidence-based anecdotes Clustered necessary things together Interaction with students Breaks were perfectly timed Epd installation (yay!) Bad Installation hell Too much focus on impractical details Struggled to get code off screen into my machine No conversation about base knowledge Too little on data scrubbing Not enough practical answers Got totally lost (not time to copy examples) No chance to write down URLs etc. Too early for Saturday Doesn't leave enough time for homework Epd was hell to uninstall How to apply to R Out-of-order examples Read More ›

Workshop at the University of Newcastle in October
Greg Wilson / 2012-09-29
The University of Newcastle is running a Software Carpentry bootcamp on October 22-23, 2012. For more information, or to register, please see their web site. Read More ›

How to Run a Bootcamp (new and improved)
Greg Wilson / 2012-09-27
The folks at the Software Sustainability Institute have written a guide to running a Software Carpentry bootcamp that includes a lot more detail than our own. It's a great resource—if you have suggestions for improvements or additions, please send 'em to us. Read More ›

Computational Thinking and Ice Floating in Bathtubs
Greg Wilson / 2012-09-26
I've been thinking about what "computational thinking" actually means ever since a workshop at Microsoft in 2007, where I found that everyone's definition seemed to boil down to, "Whatever I've been pushing for the last decade, but with a new name." My ideas keep evolving, but Lewis Epstein's wonderful book Thinking Physics remains their foundation. In it, he presents dozens of problems in freshman physics that can be solved without calculation, so long as you have a solid grasp of basic Newtonian mechanics. For example: put a large piece of ice in a bathtub, then fill the bathtub exactly to the rim. When the ice melts, will the water overflow, will the water level go down, or will it stay the same? The answer is that it will stay the same, because the ice displaced its own weight in water, so when it melts, it exactly fills the "hole" it created. But that's not the real question; the real question is, imagine you put a large cake of ice in the tub, then put a small iron weight on the ice cake, and then filled the tub to the rim. This time, when the ice melts and the iron weight sinks to the bottom of the tub, what happens to the water level? Someone who can answer problems like these actually understands physics; someone who has only learned how to pattern-match solution techniques to word problems won't be able to reason them through. I don't know of an equivalent book on computing: things like the Levitins' Algorithmic Puzzles aren't really the same, since there are tricks to the solutions of puzzles, and our driver's license exam isn't really the same thing either. If you know of something that is, we'd welcome a pointer. Read More ›

Why This Stuff Is Hard To Teach
Greg Wilson / 2012-09-20
If we get funding to continue our work (we hope to find out in a month), one of the first things we want to do is put together an introduction to web programming for scientists. As I've remarked many times before, we won't try to teach people how to build web applications: all we can do in the time we have, starting from what they know, is teach them how to create security holes. What we would like to show them is how to pull data off the web and post data of their own for others to consume, but even that turns out to be a lot harder than it should be. Here's one example. I want to parse a well-formed HTML page, change a few things in it, and save the result to disk. That ought to be simple, but if the document contains special characters like non-breaking spaces, Greek letters, and so on, it turns out to be rather tricky. In fact, it's taken a couple of hours (admittedly, spread out over several weeks) to come up with a solution that (a) works and (b) doesn't make me feel unclean. Here's what it looks like (using a string IO object instead of a file so that you can see what we're parsing): import cStringIO import xml.etree.ElementTree as ET ENTITIES = { 'hellip' : u'\u2026', # horizontal ellipsis 'pi' : u'\u03C0', # lower-case Greek letter pi 'sigma' : u'\u03C3' # lower-case Greek letter sigma } parser = ET.XMLParser() parser.parser.UseForeignDTD(True) parser.entity.update(ENTITIES) text = '<html>π...σ</html>' original = cStringIO.StringIO(text) tree = ET.parse(original, parser=parser) print ET.tostring(tree.getroot()) The output from this program is: <html>π…σ</html> which, when loaded into a browser, is displayed as: π...σ The problem is the breadth of knowledge someone has to have to put this together. My code is based on a response to this question on Stack Overflow, but along the way, I looked at, played with, and discarded four other non-solutions. It doesn't help that ElementTree's UseForeignDTD is undocumented, but that's not my real complaint: every XML library I've ever worked with in Java, C++, or Python had brick walls of its own just waiting for people to bang their heads against. I suspect it's going to take us several painful iterations to design an instructional sequence that works, and I'm not looking forward to the pain. Read More ›

Feedback and wrap-up from York
Chris Cannam / 2012-09-20
The SoundSoftware project arranged a Software Carpentry bootcamp for the week preceding DAFx'2012, the conference on Digital Audio Effects, at the University of York. This bootcamp featured two days of non-subject-specific Software Carpentry material, presented by Greg, and an additional day of material specific to audio and music researchers, presented by Adam and Becky of Codasign. Many thanks to Greg, Adam, and Becky for their hard work. Subject-specific third day We think this was well-received, though some attendees (and helpers!) were beginning to flag by the third day. With more practice it might be possible to start introducing more subject-flavoured material into the first two days as well. This seems like an experiment worth repeating. The third day was closed with a short but lively discussion session about applying Software Carpentry methods in the context of the researchers' own work. Python code from the audio and music day can be found on Github or code.soundsoftware.ac.uk. Co-location with a conference We had hoped that co-locating the bootcamp with the DAFx conference would get us a higher attendance and make things easier for participants, but at least in this case that doesn't seem to have happened. We had enough people (26) to fill the room in comfort but we were below capacity, and only 5 participants reported that they were also attending the conference. (The bootcamp was advertised mostly during the summer break, which probably explains this to some extent. Our fault; we could have done better here.) Software requirements This bootcamp had a particularly tall list of software requirements and prerequisites. So as an alternative to installing the software natively on their own machines, we also provided a virtual machine image that attendees could run up in VirtualBox. A handful of people took advantage of this and it seemed to work OK for them. In previous bootcamps we have been involved with, Windows users have had far more problems getting software installed than those on other platforms. This time around, we switched to MinGW (from Cygwin) which worked a little better, and at the same time we found that users with OS/X 10.8 were having far more problems with Python versions and dependencies than we had experienced previously — so the spread of difficulties was more even than usual. Good and bad These are from the end of the second day (i.e. non-subject-specific material). Good Bad Emphasis on peripheral stuff like time management Reading material suggestions Virtual machine provided with stuff already installed Well-designed website that supports the workshop Varied approach Learning Python The venue Working in pairs and teams a lot Good thinking about software development Range of difficulty: good entry level, plus harder stuff Very interactive Cookies provided Day not too long, divided up into good chunks, not overloaded Personal insight in anecdotes Meeting other people Ending the day in the bar Free to attend Speed of Python stuff — too fast to follow Practical examples left incomplete, not enough time Time constraints Didn't learn enough Python to go away and code it instead of other languages I already know Lecture notes not enough Breakfast not provided Bit preachy / evangelical at times Would like more detail on unit testing Assumptions about programming ability — standards too high Use of coloured sticky notes irritating and distracting Lighting — don't get on with strip lighting Started too early (9am) Would have liked a bigger project using version control and Python together Read More ›

What's In Your Stack?
Greg Wilson / 2012-09-18
As a long-delayed follow-up to a conversation with Travis Oliphant: based on our experience, researchers who are computational novices want more than numerical computing and visualization tools. They want a complete stack—an end-to-end, nachos-to-cheesecake [1] solution to their basic computing needs, including an editor, a debugger, a version control system, blogging tools, data management tools, some way to create presentations and papers, and so on. My personal stack includes: Emacs (despite the damage it does to my hands) Subversion (because it's still simpler for novices to use on day 1 than either Git or Mercurial) the PyCharm IDE (when I'm doing Django, because it can debug templates—to my shame, for day-to-day work, I use PDB or print statements) WordPress SQLite (but I don't have very much data) LaTeX (because I can diff/merge files) and LibreOffice Impress (even though I can't, but LaTeX-based presentation systems have all of PowerPoint's flaws and none of its advantages) These are all as important to researchers' day-to-day lives as Python, NumPy, Matplotlib, or Pandas, and like most of you, I use them all together—I use Python scripts to generate LaTeX tables from SQLite databases for inclusion in papers that I put under version control, and so on. What we're seeing in workshops, though, is that most computational novices (i.e., most research scientists) either don't think to integrate these things, or really struggle to do so (and reinvent a lot of wheels along the way). My question is, how far do you think we (NumFOCUS, Software Carpentry, the community as a whole) should try to go? I personally think that showing people how to write readable code is in bounds, but showing them how to create a comprehensible paper or presentation is not; everyone else will have different dividing lines, and I'm very curious what those are. So: What's in your stack? Which parts of that do you think we (the people who show up at SciPy conferences) ought to be trying to teach, and which are someone else's job? (Bonus marks if you can clearly identify who that someone else is.) [1] My preferred variation on "soup to nuts". Read More ›

Post-Mortem on the NGS Course
Greg Wilson / 2012-09-18
Titus Brown has posted a post-mortem on his Next Generation Sequencing course, and the independent assessment that was done of it. As with Jorge Aranda's assessment of Software Carpentry, the results were very positive—what we and other like-minded people are doing really does make a difference. Read More ›

Systematic Curriculum Design
Greg Wilson / 2012-09-16
Executive summary: we'd appreciate your help organizing and motivating our material better. One of the good things about traveling is that it gives me time to think. One of the bad things about thinking is that every time I do, I wind up with more work than I had when I started. For example, to organize and motivate our content, I'm using eight questions that scientists frequently ask: How can I manage this data? How can I process it? How can I tell if I've processed it correctly? How can I find and fix bugs when I haven't? How can I keep track of what I've done? How can I find and use other people's work? How can other people find and use mine? How can I do all these things faster? On the other side of the equation I have a syllabus for the core Software Carpentry material, which includes: the command-line shell (e.g., Bash) version control basic programming (variables, lists, loops, conditionals, and simple file I/O) functions and libraries databases (i.e., basic SQL queries) matrix programming (e.g., MATLAB or NumPy) quality assurance (defensive programming, testing, etc.) dictionaries (or hashes, if you're a Perl programmer) the development process (stepwise refinement, red-green-refactor, performance profiling) web programming (by which we mean using web APIs, not providing services yourself) In order to figure out how well we're helping scientists, we need to map their needs onto our content. Here's what I've come up with: Question Subject Answer How can I manage this data? The Shell Use directories and sub-directories with meaningful names. Use filenames that can easily be matched with wildcards. Use filename extensions that indicate the type of data in the file. Use text unless there's a powerful reason to use something else. Version Control If it's megabytes or less, put it under version control. Basic Programming Create and use data formats that are easy for programs to parse. Functions and Libraries — Databases Store it in a relational database. Store each atom of information in its own field. Make sure each record has a unique key. Make sure that information is never duplicated. Use foreign keys and joins to combine information from different tables. Number Crunching Represent it as a matrix, because that's easy to process. Quality — Sets and Dictionaries Store it in a set or dictionary so that elements can be looked up by value rather than by position. Development — Web Programming Format it as HTML (or XML, or some other widely-used format). Separate content from presentation (e.g., use CSS for styling). Question Subject Answer How can I process it? The Shell Use Unix commands that manipulate lines of text. Combine those commands using pipes and redirection. Use loops to perform the same operations on many files. Version Control — Basic Programming Write programs that use loops, file I/O, and string splitting to read data. Use floating-point numbers unless you are sure all values (including calculated values) will always be integers. Functions and Libraries Define functions to do simple operations, then combine those for more complicated effects. Equivalently, describe what you would do in a language customized to your problem, then fill in the missing bits of code by creating functions. Databases Write SQL queries to select, filter, aggregate, and sort data. Use a general-purpose programming language for everything else. Number Crunching Use a linear algebra package like NumPy. Quality — Sets and Dictionaries Use algorithms that don't depend on the order of items. Development Use the right data structures. Web Programming Use an HTTP library to fetch it. Use an XML or JSON library to parse it. Question Subject Answer How can I tell if I've processed it correctly? The Shell — Version Control — Basic Programming Test your programs with small data sets whose results can be checked by hand. Functions and Libraries — Databases Build queries in small steps. Run queries against small data sets whose output can be checked manually. Number Crunching Compare a program's output to analytic results, experimental results, simplified test cases, and previous programs. Use tolerances when comparing results. Quality Create simple data sets for which the right answer can be calculated by hand. Compare the results produced by the new program to results produced by older programs. Sets and Dictionaries — Development Make code testable by dividing it into functions, and then replacing some functions with others for testing purposes. Web Programming — Question Subject Answer How can I find and fix bugs when I haven't? The Shell — Version Control — Basic Programming — Functions and Libraries — Databases — Number Crunching — Quality Write test cases that fail when the bug is present, but pass when the bug is fixed. Add assertions to programs to check its internal consistency. Use a debugger. Sets and Dictionaries — Development Write tests. Web Programming — Question Subject Answer How can I keep track of what I've done? The Shell — Version Control Keep your work under version control. Check in whenever you've completed a significant change. Write meaningful check-in comments. Basic Programming Put version control IDs in programs (and data files), and copy them forward to results. Functions and Libraries Give functions meaningful names. Group related functions and related definitions into modules. Write docstrings to explain what functions and modules do and how to use them. Databases Store queries in files (just like programs). Number Crunching — Quality Turn bug fixes into assertions and test cases. Use a coverage analyzer to see what code is and isn't being tested. Sets and Dictionaries — Development — Web Programming Use meta headers in your HTML/XML data files. Question Subject Answer How can I find and use other people's work? The Shell — Version Control Get it from their version control repositories. Basic Programming — Functions and Libraries Use the help function to read their documentation. Databases — Number Crunching — Quality — Sets and Dictionaries — Development — Web Programming Ask them to use well-formed URLs. And to format it according to well-defined machine-readable standards (e.g., XML or JSON). Question Subject Answer How can other people find and use mine? The Shell — Version Control Put your work in a publicly-accessible version control repository. Basic Programming — Functions and Libraries Write docstrings to explain what functions and modules do and how to use them. Databases Raise exceptions to signal errors so that other people can handle them as they think best. Number Crunching — Quality — Sets and Dictionaries — Development — Web Programming Put it on the web at a stable URL. Format it according to well-defined machine-readable standards (e.g., XML or JSON). Include meta-data. Question Subject Answer How can I do all these things faster? The Shell Put commands in shell scripts so that they can be re-used. Version Control — Basic Programming Use appropriate variable names so that people will waste less time trying to read programs. Functions and Libraries Learn to recognize and use common design patterns. Databases — Number Crunching Use a linear algebra package like NumPy. Quality Design code for testing. Write test cases before writing new code. Sets and Dictionaries Use sets and dictionaries for sparse, irregular, or unordered data. Development Use a profiler to figure out why code is slow before trying to optimize it. Build code so that parts can be replaced easily. Web Programming — In parallel with this, a group of us have been working on a paper describing best practices for computational science. The list we've converged on is: Write programs for people, not computers. Programs should not require their readers to hold more than a handful of facts in memory at once. Names should be consistent, distinctive, and meaningful. Code style and formatting should be consistent. All aspects of software development should be broken down into tasks roughly an hour long. Automate repetitive tasks. Rely on the computer to repeat tasks. Save recent commands in a file for re-use. Use a build tool to automate scientific workflows. Use the computer to record history. Software tools should be used to track computational work automatically. Make incremental changes. Work in small steps with frequent feedback and course correction. Use version control. Use a version control system. Everything that has been created manually should be put in version control. Don't repeat yourself (or others). Every piece of data must have a single authoritative representation in the system. Code should be modularized rather than copied and pasted. Re-use code instead of rewriting it. Plan for mistakes. Add assertions to programs to check their operation. Use an off-the-shelf unit testing library. Turn bugs into test cases. Use a symbolic debugger. Optimize software only after it works correctly. Use a profiler to identify bottlenecks. Write code in the highest-level language possible. Document the design and purpose of code rather than its mechanics. Document interfaces and reasons, not implementations. Refactor code instead of explaining how it works. Embed the documentation for a piece of software in that software. Conduct code reviews. Use code review and pair programming when bringing someone new up to speed and when tackling particularly tricky design, coding, and debugging problems. Use an issue tracking tool. As you can see, this list only partially overlaps the "Answers" column in the table above. That makes me nervous: when two independent attacks on a problem yield two different answers, the odds are good that neither of them is right. I trust the "best practices" list more than I do the breakdown of our existing material, which leaves me with some awkward choices. Changing the motivating questions would feel like moving the goalposts so that I can declare victory with the content I have, but on the other hand, maybe there is a better way to carve up the space of things scientists want to do that will give a better mapping. Or are there connections between our content and those motivating questions that I'm just missing? Or do we really have the wrong content, i.e., are we teaching what we know, rather than what would actually be most useful to scientists? Stepping back for a moment, the real point of this exercise is to ensure that: we're teaching what's most useful to our learners; everything we teach makes sense, and is seen as useful, when it first appears; and learners see the connections between ideas and between ideas and their application. What we should really do is go one step further and figure out how to tell whether our learners can actually do the things embodied in our eight questions. We should then work backward from that assessment to figure out what demonstrable skills they need to acquire, then what understanding they need in order to become proficient with those skills, and then see how that maps onto our best practices. We've made a start toward this with the "driver's license" exam described in an earlier post; if you'd like to help us follow through, please get in touch. Read More ›

Number Crunching with Python: DC Python Workshop
Matt Davis / 2012-09-13
On November 11, 2012 I'll be giving a workshop in Washington, D.C. on number crunching with Python. The event is organized by DC Python and Data Science DC, who were kind enough to ask me to teach. The focus of the workshop will be on core scientific Python packages like NumPy, SciPy, matplotlib, IPython, and, with the help of Skipper Seabold, pandas. The workshop filled up fast but there is a wait list you can join in case people drop out or we find a bigger venue. Details are at http://meetup.dcpython.org/events/81931062/. Read More ›

The Software Is Open (even if the interviews aren't)
Greg Wilson / 2012-09-12
FLOSS for Science, a site devoted to free/libre/open source software for scientific computing, has just published two volumes of interviews with people in the community. The content doesn't appear to be open, but it's only $9.99 for the pair. Read More ›

Patterns Wanted
Greg Wilson / 2012-09-12
At some point or other, most programmers have encountered the idea of design patterns in software, and many (including myself) have been zealous about them, at least for a while. They haven't actually revolutionized either the practice of software development or the way we teach it, but becoming familiar with them is to programmers what learning the Beatles' greatest hits is to musicians. That presents us with a problem. We have deliberately chosen not to include object-oriented programming in the core of Software Carpentry: it's too big to fit into the time we have, and too far beyond what our learners bring with them. However, almost all discussion of design patterns is phrased in terms of classes and objects. It doesn't have to be—the ideas behind Proxy, Singleton, and Iterator are frequently used in procedural languages like C—but: patterns rose to prominence in the early 1990s partly because they helped procedural programmers make sense of OOP; most professional programmers use OOP, so that's the right way to talk to them; and some patterns really do only make sense in OO languages. The biggest problem, though, is that most discussion of patterns is over our learners' heads, i.e., it addresses problems they haven't reached yet. The scientists we're helping are still trying to figure out what aliasing is, or why it's usually better for a function to take an open stream as an argument rather than a filename. The patterns they need are so simple that most programmers have forgotten that they need to be learned. There are a couple of exceptions, though. One is the "Roles of Variables" work that Sajaniemi and others did a few years ago. By looking at the kinds of programs people write in introductory courses, they classified variables as follows: Fixed value and organizer contain the same data throughout the program; only the order of data elements may be changed. Most-recent holder and stepper record data flow sources; either coming from outside or generated internally. The net effect of all items in a data flow is represented by a one-way flag, most-wanted holder, or gatherer; while a manipulation of a single element is recorded in a follower or temporary. Data may be stored in a container which can be traversed with a walker. Finally, a data entity not covered by any of the previous roles is considered to have the role other. Their classification scheme is not unambiguous (i.e., different experts can label a particular variable in different ways) but the same thing happens with design patterns, but that's OK—defensible differences are informative. The real benefit of this scheme is that it gives novices a way to organize and plan programs: once they've learned to recognize roles, they can start to create variables with roles in mind, which saves them from having to reinvent or rediscover the idioms that distinguish experts from novices. Another piece of work aimed at the same level is Michael de Raadt's dissertation on novice-level programming plans. He described 18 of them: Average Divisibility Cycle Position Number Decomposition Initialisation Triangular Swap Guarded Exceptions Counter Controlled Loop Primed Sentinel Controlled Loop Sum and Count Validation Min/Max Tallying Search Algorithm Bubble Sort Algorithm Command Line Arguments File Use Recursion (single- and multi-branching) Both pieces of work are a great start, but we want to teach people the craft of (scientific) programming, we need more. That's where you come in: what patterns have you used in your programs? When do you use them? When don't you (i.e., what are their boundary or limiting cases)? And do you know of any other catalogs or summaries that we could link to? Read More ›

How Quickly Do Workshops Fill Up?
Greg Wilson / 2012-09-06
I had to compile some data on signups anyway, so here's a graph showing cumulative registration per workshop over time (counting from the day registration opened) for past workshops: and for upcoming workshops (excluding two that just opened up): Doing a bit more analysis, this spring's workshops filled up a lot faster than the ones we're running this fall. I think the main reason is that those ones were organized and advertised while school was on, while the upcoming series have been set up over the summer. We'll repeat the analysis in a few months to see if that hypothesis holds. Read More ›

Not Really Disjoint
Greg Wilson / 2012-09-04
The twinned discussions in bioinformatics about openness and software quality are heating up. A recent salvo on Gas Stations Without Pumps is titled "Accountable research software", and one statement in particular caught my eye: The rapid prototyping skills needed for research programming and the careful specification, error checking, and testing needed for software engineering are almost completely disjoint. I might agree that careful specification isn't needed for research programming, but error checking and testing definitely are. In fact, if we've learned anything from the agile movement in the last 15 years, it's that the more improvisatory your development process is, the more important careful craftsmanship is as well—unless, of course, you don't care whether your programs are producing correct answers or not. The sentence quoted above is commentary on a post by a different writer which is summarized as: ...most research software is built by rapid prototyping methods, rather than careful software development methods, because we usually have no idea what algorithms and data structures are going to work when we start writing the code. The point of the research is often to discover new methods for analyzing data, which means that there are a lot of false starts and dead ends in the process... The result, however, is that research code is often incredibly difficult to distribute or maintain. The first part is equally true of software developed by agile teams. What saves them from the second part is developers' willingness to refactor relentlessly, which depends in turn on management's willingness to allow time for that. Developers also have to have some idea of what good software looks like, i.e., of what they ought to be refactoring to. Given those things, I think reusability and reproducibility would be a lot more tractable. Read More ›

Free As In Pretty Much Whatever You Want
Greg Wilson / 2012-09-04
A couple of different people have asked us recently whether they can use our materials in their courses. The answer is an emphatic "yes": all of our slides, essays, posts, and what-not are covered by the Creative Commons — Attribution license, and all of our code is covered by the open source MIT License, so basically, you can do whatever you want, whenever you want, without seeking special permission. (And yes, this means you can charge for your courses, even though we don't charge for ours.) The only requirement (imposed by the CC-BY license) is that you cite us as the source, and of course we'd be grateful if you'd point us at your material and/or send us any changes you make. Read More ›

Final Results of Demographic Survey
Greg Wilson / 2012-09-04
192 people have now responded to the demographic survey we reported two weeks ago. Results are shown below; I'll hold off posting breakdown by discipline until I find a better (standard) classification scheme. Long story short, you are grad students, age 25-35, 28% female, with no clear preference for Linux, Mac OS X, or Windows. No idea what kind of music you listen to, though, or where you stand on the crucial "cave men versus astronauts" question—we'll include those in next year's survey. Age Gender Career Stage Preferred Computing Platform Read More ›

Lifted by the Audience
Greg Wilson / 2012-09-02
I spent Thursday and Friday recording most of the material we've been using in workshops for the past six months [1]. One thing that kept bugging me was how flat and uninspiring it was to talk to a camera: I had forgotten just how much a live audience energizes me. I also realized how much I rely on people's questions, and the expressions on their faces, to keep me on track: it took me almost 15 minutes to remember that I needed to explain that Python uses indentation to show nesting, whereas I always do that at the right time in a live class because someone either asks or looks puzzled. This has me wondering yet again about how to do online tutorials and support more effectively. The tutorials we've run over the summer have had anywhere from half a dozen to two dozen participants, but current multi-way "talking head" videoconferencing systems can't handle that many people [2]. I'd like to do one-to-one sessions instead, but that won't scale (unless dozens of you volunteer to help, which I think is unlikely). Looking around at various online education startups, no one else seems to have solved this either—if you know of someone who has, I'd welcome a pointer. [1] We did the shell, Python, SQL, and version control, but ran out of steam before we could do testing. [2] We've tried WebEx, BlueJeans, and Vidyo. Setting aside technical problems (lag, jitter, audio feedback, firewall issues, etc.), their "Hollywood Squares" interfaces don't give nearly the same presence or immediacy as being there physically. Read More ›

Please Help the Hunter Family
Greg Wilson / 2012-08-29
We are very sad to report that John Hunter, the principal author of the widely-used Python plotting package Matplotlib, is losing his battle with cancer. If you would like to help his family in this difficult time, please see Stefan van der Walt's post, or make a donation to the memorial fund set up to help with his children's education. Read More ›

Linking Forward From a Bibliography?
Greg Wilson / 2012-08-29
This web site's bibliography lists 116 papers related in some way to the practice of scientific computing. I'd like to know: what related papers I've missed, and when a new related paper appears. It seems to me that I ought to be able to upload the entire bibliography into some tool, then say, "Find everything that isn't in this list that cites things that are, minus everything in this other list over here that I've already looked at and decided aren't relevant." Going a step further, I ought to be able to prime some kind of robot to do this check automatically every week or so, and mail me when there's a hit so that I can either read it or add it to the exclusion list. But "ought" doesn't put dinner on the table. I've had a superficial look at CiteSeer, Google Scholar, and Mendeley, but what I want hasn't jumped out at me. If anyone has a pointer to a one-step solution, please post a comment. Read More ›

A Problem With Badges
Greg Wilson / 2012-08-29
We issued the first learning badges for Software Carpentry a couple of weeks ago, and in doing so, uncovered a significant flaw in the design of existing badging systems. My intent was to give a bunch of people "Instructor" badges; by mistake, I set the pulldown to "Organizer", then clicked "Issue". As a result, two dozen people got the wrong kind of badge—and there doesn't appear to be any way for me to undo that. According to Dave Lester, who created the WPBadger plugin for WordPress that I'm using: I don't believe there's a way for an issuer to remove issued badges from a backpack, however if you remove an award from WPBadger and it results in a broken link the assumption would be that it no longer exists. Since there will be multiple backpacks, I'm not sure how an issuer could notify all of them to let them know if a badge has been revoked. One idea: perhaps the assertion file that WPBadger generates could be updated when a badge is revoked, with a standardized message so the backpack knows that the award is no longer valid. Now, issuing the wrong badges was my fault. But I'm not the only person who will ever make that mistake, and as badging is scaled up to involve thousands or millions of people, there will inevitably be cases where someone's work turned out to be plagiarized, or where a buggy piece of software was given proof of X but issued badge Y, and so on. In the real world, there needs to be a way to cross badges off. This isn't actually a new problem. During the first dot-com bubble, I spent several years working on a single sign-on/access control product called SelectAccess (initially at a startup, then at Hewlett-Packard after we were acquired). That work was my first exposure to digital certificates, and to the certificate revocation problem. In brief, issuing a certificate is easy: you create a blob of digital data, then sign it with your own certificate, which has been signed with another certificate, and so on back to a trusted root certificate (which is very carefully guarded). Revoking a certificate, on the other hand, is a major pain—in fact, it's so difficult that many systems don't handle it at all, and others do so poorly. Once a valid certificate has been created, its bits can be copied any number of times, to any number of places. There's no central record of where those copies are, so there's no way to go and delete or modify them all. The best the issuer can do is create a list of invalidated certificates, and hope that users are periodically updating their copies of that list and checking incoming certificates against it—a "solution" which scales very badly. One non-solution to this problem is to embed an expiration date in each certificate. However, that creates a window of vulnerability between the time the issuer decides to revoke the certificate, and the time sites actually stop accepting it. Reducing the lifetime of each certificate shrinks this window, but imposes the burden of re-issuing and re-distributing certificates every day, hour, or whatever—which once again fails to scale. It seems to me that our current badging systems have exactly the same design flaw. I don't have a solution, but I think that if we don't come up with one, badging will turn out to be something that works a lot better in theory than in practice. Read More ›

An Interview with Titus Brown
Greg Wilson / 2012-08-27
The folks at Simply Statistics interviewed Prof. Titus Brown earlier this month; there's lots of good stuff in the post for people interested in doing computational science (or any kind of science) better. Read More ›

An Updated List of Upcoming Workshops
Greg Wilson / 2012-08-21
Here's an update to our upcoming workshop list: Venue Start Date Instructor(s) DAFX Conference (York, UK) 2012-09-13 Greg Wilson University of Oslo 2012-09-17 Hans Petter Langtangen, Greg Wilson Purdue University 2012-10-08 Michael Hansen, Anthony Scopatz, Jeff Shelton Lawrence Berkeley National Laboratory 2012-10-17 Matt Davis, Katy Huff University of British Columbia 2012-10-18 Laura Trembley-Boyer, Ian Mitchell, Peter Rawsthorne, Greg Wilson University of California Berkeley 2012-10-20 Matt Davis, Katy Huff Caltech 2012-10-23 Matt Davis Oxford 2012-10-30 Stephen Crouch, Mike Jackson Scripps Institute 2012-11-15 Titus Brown, Tracy Teal University of North Carolina 2012-11-28 Jason Pell, Ethan White, Greg Wilson University of Edinburgh 2012-12-04 TBD University of Texas (Austin) 2012-12-10 Rosangela Canino-Koning, Katy Huff, Anthony Scopatz University of Chicago 2013-01-12 Katy Huff, Anthony Scopatz Technical University Munich 2013-01-22 TBD Macquarie University 2013-01-31 Greg Wilson Virginia Tech 2013-01-31 Tommy Guy AMOS Conference (Melbourne) 2013-02-14 Greg Wilson We're currently trying to settle on dates (and instructors) for: Berlin Boston Charlottesville (University of Virginia) Eugene (University of Oregon) Hamilton (McMaster) Helsinki Montreal (McGill) New York City (Columbia University) Palo Alto (Stanford) Seattle (University of Washington) Tübingen Washington, D.C. (George Mason University) Waterloo If you're interested in helping us teach, and in taking part in an online study group this fall on how to teach this stuff, please get in touch. Read More ›

What We Talk About When We Talk About Software Carpentry
Greg Wilson / 2012-08-20
Read More ›

Who Are You?
Greg Wilson / 2012-08-17
We asked participants in this year's workshops to tell us a bit about themselves. So far, 121 have done so, and their responses are summarized below. Age <20 0.0% 20-25 18.3% 26-30 31.7% 31-35 25.0% 36-40 9.2% 41-45 4.2% 46-50 4.2% >50 7.5% Gender Female 26.9% Male 73.1% Occupation Undergrad 2.6% Grad student 55.7% Post-doc 13.9% Faculty 9.6% Support 13.0% Industry 5.2% Preferred Platforms Linux 54.2% Mac OS 50.0% Windows 35.8% Platforms sum to more than 100% because multiple responses were allowed; I was surprised by the number of Linux responses, since we only ever had a handful of Linux laptops in the room, but looking at the responses in more detail, it appears that many of the people who use Mac laptops use Linux servers for production runs. We had a bit more trouble classifying respondents' by discipline, but our best guess is: 11: bioinformatics 10: ecology 8: chemistry, astronomy 7: neuroscience, evolutionary biology 6: support, marine biology 5: planetary science, physics, biology 4: engineering, computer science 3: oceanography, nuclear engineering 2: programming, mathematics, geophysics, geography, genetics, electrical engineering, economics 1: veterinary medicine, social science, oncology, immunology, geology, genomics, fisheries and wildlife, epidemiology, education, civil engineering, business, biophysics, biomedical imaging, agriculture Read More ›

Alpha Test of Driver's License Exam
Greg Wilson / 2012-08-16
As we announced back in June, we're working with the Software Sustainability Institute to create a "driver's license" exam for the DiRAC supercomputing facility. Mike Jackson at the SSI alpha tested our exam on four people last week; the exam itself, and his comments, are below. We would be very grateful for feedback from the community on the scope and level of the questions. Exam You have been given the URL of a version control repository, along with a username and a password for it. You have one hour to complete the following tasks. If at any point you would like clarification, please do not hesitate to ask. If at any point you are unable to complete a task, you may also ask for help so that you can proceed to the next task, but doing so will be considered equivalent to not completing that task. Version Control Check out a working copy of the version control repository. You will do all of your work in this working copy, and use version control to commit your solutions as you go. Solution: svn checkout $URL Shell Once you have a working copy, use the cd command to go into it. Use a single shell command to create a list of all files with names ending in .dat in or below this directory in a file called all-dat-files.txt. Solution: find . -name '*.dat' -print > all-dat-files.txt Make The analyze.py program takes exactly two arguments: the name of its input file and the name of its output file, in that order. For example, if inputs/a.dat changes, running make will execute the command: ./analyze.py inputs/a.dat outputs/a.out Edit the file Makefile in the root directory of the working copy so that if any .dat file in the inputs directory changes, the program analyze.py is run to create a file named .out in the outputs directory. Solution: outputs/%.out : inputs/%.dat ./analyze.py $< $@ Version Control Commit your changes to Makefile to the version control repository. Solution: svn commit -m "Building .out files for .dat files" Makefile Note: examinees may use an editor instead of -m to provide a log comment. Testing The analyze.py program contains a function called running_total, which is supposed to calculate the total of each increasing sequence of numbers in a list: running_total([1, 2, 1, 8, 9, 2]) == [3, 18, 2] running_total([1, 3, 4, 2, 5, 4, 6, 9]) == [8, 7, 19] In the file test_analyze.py, write the four (4) unit tests that you think are most important to run to test this function. Do not test for cases of invalid input (i.e., inputs that are strings, lists of lists, or anything else that isn't a flat list of numbers). Solution: def test_empty(): assert running_total([]) == [] def test_equal(): assert running_total([1, 1]) == [1, 1] def test_negative(): assert running_total([1, 5, -5, -3]) == [6, -8] def test_float(): assert running_total([1.0, 5.0, 2.0]) == [6.0, 2.0] Submit your work by committing your changes to version control. Solution: The test for [] must be in the set the examinee writes. The behavior is not explicitly stated in the spec, but is reasonable, and can be inferred from reading the source. The spec says 'increasing sequence', but neither of the examples has consecutive equal values, so the second test must be in the set the examinee writes as well. Examinees may write other tests than test_negative and test_float, so long as they are interesting and useful in the eyes of the assessor. Numerical Programming (and Version Control) Edit a file called answer.txt in the root directory of your working copy and write a brief explanation of one situation in which running_total would produce an incorrect answer. Submit your answer by committing the file answer.txt to version control. Solution: running_total will produce the wrong answer if a sequence includes consecutive floating-point values with very small and very large magnitudes. and: svn commit -m "Answer to numerical programming question" answer.txt Note: While the examinee is working on previous questions, the examiner will edit answer.txt, and will check in changes so that the examinee must resolve a conflict before being able to commit. Code Review The program power2.py takes a single non-negative integer as a command line argument and produces the powers of two that total to that number. For example: ./power2.py 27 produces: 16 8 2 1 Edit this program to improve its structure and readability without changing its behavior. The file test_power2.py has tests for power2.py. You can check that your changes have not changed power2.py's behaviour by running: nosetests test_power2.py Solution: This one requires subjective judgment by the examiner, but the starting code is awful enough to leave lots of room for improvement. Shell Write a shell script called do-many.sh that runs power2.py for many different numbers. For example: ./do-many.sh 27 9 35 must produce: 16 8 2 1 8 1 32 2 1 as its output. You do not need to do error-checking on the command-line parameters, i.e., you may assume that they are all non-negative integers. Solution: for number in $* do ./power2.py $number done Feedback Introduction A dry-run a test for a "driver's licence" for researchers wishing to use the DiRAC integrated supercomputing facility was done on Thursday 9th and Friday 10th August 2012. 4 participants (P1..4) took part. P1 and P2 are members of EPCC and have backgrounds in software development, HPC and project management. P1 is a DiRAC project leader. P3 and P4 are members of the SSI with backgrounds in software development. P3 could be viewed as equivalent to a "professor", P4 as a software development and sustainability consultant. It was expected that all 4 would be able to complete the test in an hour. Each participant was given 1 hour followed by up to 30 minutes for discussions on how they did, the nature of the test and suggestions as to how it could be improved. Starting the Test The participants logged into an account with all the tools available. The exam text and Subversion URL/username/password were in text files. Each participant had their own Subversion URL. After P2 and P3 did their check-out I committed an update to power2.py. I forgot to do this for P1. P4 didn't use Subversion due to account access issues so was given a ZIP file with the exam and source code. P1 and P2 asked if they could use the web/whether it was an "open book" test. I said it was since the test assesses working practices, not recall. P1-4 all used the web. Version Control P2,3 used Subversion without problems. P1 ran commit commands but forgot the add commands. They put this down to unfamiliarity with Subversion. P2 commented that they hadn't used Subversion for a long time but it was sufficiently straightforward. P1 asked if alternatives to Subversion would be offered to examinees e.g. Mercurial. I said Git would be offered. They also suggested CVS and warned that some won't know what version control or any of these tools are. P2 also said it would be better if there was a choice of tools. Shell Everyone completed this with no problem or comment. Make No-one managed the expected solution. P1-4 all did solutions for hard coded file paths. The wild-card was the problem. P3 had looked up the syntax, since they only ever use others' directory-based patterns, but it wasn't clear what to do. P1 and P4 assumed that once they'd implemented their target, a "make" with no file would call it to apply it to all the matching files. When shown that it had to be invoked as "make outputs/run1.out" P1 commented that they'd never seen make used that way before. P1 commented that Make is so complex that everyone knows it in a different way, and that it is a very hard tool to set questions on. P4 said they'd moved on from Make, it was a tool they'd come back and learn if/when needed but was too arcane to retain. P3 questioned whether it was reasonable to expect an HPC user to know this. They'd be more likely to modify Make files/targets than create from scratch. Version Control P1,2,3 all committed their updates to the repository with a commit message. Testing P1 asked whether examinees would know Python. I explained that C/C++, FORTRAN and MATLAB would also be offered. P2 said examinees should be able to choose their tools. P3 also said that examinees need to know the libraries and tools used. P1 commented that some examinees won't know what a "unit test" is. P1 had never written Python unit tests and asked if there was something like JUnit. I described how to run "nosetests". They admitted to searching the web a lot and also copying the test format from test_power2.py. P1 missed a test for [1,1] but argued that their test for [0] was a test for a non-increasing sequence. P2 asked whether "increasing" meant "strictly increasing". Related to this, P3 defined a test where [0,0,0,0,0] == [0]. This test fails, but P3 said if they had time they would then have investigated the failure. P2 wrote 7 tests including the 2 expected tests but as stand-alone asserts and not in functions, due to unfamiliarity with writing Python unit tests. They committed a version that had 4 of their tests, including the expected ones. P4 omitted the test for []. P3 commented that the nosetests doc wasn't useful but the code on Software Carpentry was. P3 ran into problems due to a typo ("sssert" not "assert") which took me a while to spot too though the tests included those expected. Numerical Programming (and Version Control) P1 answered "The case where you use real numbers rather than integer values." and elaborated that it would give a read error for any floating point number. P2 commented that this question has a different context from the others and that the heading "numerical programming" is a bit cryptic. They commented that the examples plus description of the previous question implies a focus on integers only. Examinees may be more likely to answer if it's explicitly stated that the function can handle floating point numbers. Similarly, P4 said it was "kind of a trick question". When informed of the answer, P3 did say they'd expect such accumulation errors. Code Review P1 made few changes e.g. added a header comment, added other meaningful comments, removed bit shifts, started to remove prints and reverse the range. When discussing the test, P1 listed numerous other refactorings e.g. changing variables (but they said that FORTRANers like/expect/traditionally use short variable names), removing the prints (but they argued that printing as you go is more optimal). P1 also said they wouldn't touch the loops as that's the compilers job. P1 felt the example was too "computer science-y" and most examinees won't be. They suggested an example based on linear algebra or a matrix vector multiply. P2 only added a redundant comment and removed the shift but not the associated comment, which was now incorrect. They felt it was not immediately obvious what to do and suggested quantifying the number of possible improvements to give examinees a goal to aim for. They get their "aha" moment only when seeing an example solution or having possible refactorings suggested. They commented that they most likely would have rewritten the function from scratch if the question proposed that as an option. P2 did not commit their updates to-date so did not encounter the conflict in the repository. P3 did not get to this question in time. P4 made changes such as renaming variables. Intended changes included separating logically-grouped code fragments; using function names as comments; adding more comments including a header comment on what it does, how it runs and what's expected (with the intent to add more on valid inputs); a code analysis to see if some code is irrelevant; adding a Subversion pragma; adding a debug mode. P4 commented that some examinees might optimise the code first and others clarify it, it would depend on the individual. Shell P2 wrote a shell script that worked but this had not been added to the repository. P1, 3 and 4 didn't do this as they were aiming to complete the test in an hour. P3 and P4 said they'd've done a loop, shift, while, $* or a web search. Results The marking criteria is: Pass: the researcher passes all the exercises, and comments (e.g. alternative ways of doing things, other issues to consider) are given. Pass with condition: the researcher fails an exercise but is given a pass subject to undertaking some training activity (e.g. working through an online Software Carpentry tutorial) or working with a DIRAC developer. Fail: the researcher fails more than one of the test exercises. They will be given pointers to online resources that would help them to pass the exercises they failed, names of individuals at their home institution who may be able to help or mentor them (or, generally, recommended to seek someone out) or advised to attend a Software Carpentry bootcamp. Under a strict interpretation all the participants Failed since they each failed more than one exercise: P1-4 failed Make since they did not derive the wild-card solution. P1 and P3 failed Testing since they did not provide a test for a strictly increasing sequence and [] respectively. P2-4 all failed to answer the Numerical Programming question. P1, 2, 4 all failed to write the final shell script. Being more lenient and taking into account that non-optimal solutions for Make were given, possible code refactorings were enumerated and that all participants understood solutions when given, this would be a Pass with condition for each. General Comments On the Current Exam P1 felt the test was too difficult because there were too many assumptions e.g. language, test framework and version control. P3 felt the test was difficult in the time available. P3 felt that if an examinee knows these things then they're fairly competent, but cautioned that if the examinees must get all the answers exactly right (e.g. for Make) then it would be very hard. P4 added that if the test was "closed book" then the number of passes would be very small. P3 and 4 felt the basic mix of shell, maintainable code, version control, build and test was good. P2 felt the test was quite well balanced. P2 and P4 felt that more detail in the questions would avoid blind alleys. On the Concept of an Exam P1 thought it would be useful as a self-assessment for (particularly early-stage) researchers to identify their training needs. P2-4 also thought it was a good idea in principle. On Making the Exam a Pre-Requisite for DiRAC Access P1's view was it would not be reasonable to expect someone to sit this before getting access due to the diversity of researchers in both their background and intended use. They commented that a lot of users would be package users and more concerned with how to submit jobs. P1 commented that the criteria should be are they doing good science, not how good/bad their code is and they would be very wary about making it a condition of access since there are already a number of hoops researchers must go through to get access (e.g. a detailed Technical Assessment). P1 also raised the question of what happens if a researcher has their Technical Assessment accepted but refuses to sit the test, or fails it and is denied access? P2-4 felt that it was reasonable to expect the test to be sat. P2 said it could annoy some people but if one decides that only skilled users can access a facility then there needs to be some way to ensure that. P3 and P4 agreed. P3 commented that if the users objective is research then they may not want to bother and that if what they do and how they do it works for them then they're only wasting their own time if they're inefficient. P3 and P4 felt the pass/conditional pass/fail marking scheme was harsh. P3 commented that attending a bootcamp might not be enough e.g. P3 had attended a bootcamp but the shell content was skipped and Make wasn't covered at all. P4 said that if DiRAC has capacity for everyone who ever may be given access then the barrier could be reduced. The barrier could be increased as more use DiRAC. P3 questioned whether those applying for access (e.g. a PI) would be the same as those who actually use it (e.g. an RA). They made the distinction between someone buying a car and someone actually driving it. Suggestions for Revision General Explicitly state that the examinee can use the web, man pages etc. Version Control Allow use of CVS and Mercurial in addition to Subversion and Git. Shell Tell examinees to add their all-dat-files.txt file to the repository. Make Drop or provide an easier example. Provide an ANT alternative. Testing State that the sequence is "strictly increasing". Move to after Code review so examinees are familiar with the xUnit format and how to run tests in their chosen language. Provide them with a test function and the command-line invocation to compile/run tests. Numerical Programming (and Version Control) Drop. Code Review Quantify the number of possible refactorings e.g. "Improve this code in at least 3 ways". Provide a more "scientific" example. Shell Move this exercise before code review so it doesn't get dropped through lack of time. Marking Criteria Should consider responses given by examinee when discussing results with them as this could take a Fail to a Pass with condition to a Pass. Read More ›

Interview about Software Carpentry (and Education)
Greg Wilson / 2012-08-14
Chris Gammell and Jeff Shelton interviewed me a couple of weeks ago for their Engineering Commons podcast series. The interview turned out to mostly be about education, online and otherwise—I hope readers will find it interesting. Read More ›

Applying Pedagogical Principles in This Course
Greg Wilson / 2012-08-14
I've talked a bit about instructional design and educational principles in this blog in the past; here's a concrete example of how we try to apply those ideas. As well as teaching the basics of the Bash shell and Python, we also (try to) teach people that every programming system must have: Individual things. A typical "thing" in the shell is a file; in Python, it's a number or string. Groups of things. In the shell, these are the lines in a file, or the files that match a pattern like *.dat. In Python, they're lists and dictionaries. Commands that operate on things, like addition or the cp command. Ways to repeat commands, either explicitly (like for loops in the shell and Python) or implicitly (like wc *.dat). Ways to make choices, like if statements in Python or file-matching patterns in the shell. (We're not going to try to teach people how to do conditionals in the shell.) Ways to create chunks, like functions and shell scripts. Ways to combine those chunks, like function composition and pipes. These ideas are connected: groups of things and ways to repeat commands tie together naturally, as do creating functions and combining them. Those connections: help people learn: loops are easier to digest when you know that they exist to work on groups of values. help people solve problems: if they know how to do something in the shell, they might be able to reason their way to a Python solution by analogy. In fact, connections matter as much as facts. What distinguishes experts from novices is not just how many facts the former know, but the density of connections between those facts. The more densely connected someone's knowledge is, the eaiser (and faster) it is for them to bring the next set of facts they need to solve a problem into working memory. Another example of how we try to tie ideas together (which we haven't used in courses yet, but should) is the essay on counting things. It works through successively more complex scenarios, and uses the shell and SQL as well as Python, to give learners context for complexity. Our biggest goal in the next major revision of the course material is to do more of this: switching to the IPython Notebook will be cool, but putting a sound pedagogical basis under the course will make a much bigger difference. Read More ›

A Question and Answer Matrix for Software Carpentry
Greg Wilson / 2012-08-14
Following up on yesterday's post about applying educational principles to this course, here's a not-yet-completed Q&A matrix for this course. The section headings are questions people ask (or equivalently, tasks they want to perform). The headings underneath are the major topics Software Carpentry covers, and below each of those is my attempt to relate those topics to the questions. "TBD" means "I haven't written it yet", while "N/A" means "I can't think of any relationship." This matrix is going to be the basis of our next big reorganization of material (which should start this fall), so we would be very grateful for your input: What have we missed? What's in the wrong place? Most importantly, can we reframe our key questions to divide things up more usefully or more logically, and if so, how? Thanks for your help! Q01: How can I manage this data? Q02: How can I process it? Q03: How can I tell if I've processed it correctly? Q04: How can I find and fix bugs when I haven't? Q05: How can I keep track of what I've done? Q06: How can I find and use other people's work? Q07: How can other people find and use mine? Q08: How can I do all these things faster? Q01: How can I manage this data? The Shell Use directories and sub-directories with meaningful names. Use filenames that can easily be matched with wildcards. Use filename extensions that indicate the type of data in the file. Use text unless there's a powerful reason to use something else. Version Control If it's megabytes or less, put it under version control. Basic Programming Create and use data formats that are easy for programs to parse. Functions and Libraries TBD Databases Store it in a relational database. Store each atom of information in its own field. Make sure each record has a unique key. Make sure that information is never duplicated. Use foreign keys and joins to combine information from different tables. Number Crunching Represent it as a matrix, because that's easy to process. Quality N/A Sets and Dictionaries TBD Development N/A Web Programming Format it as HTML (or XML, or some other widely-used format). Separate content from presentation (e.g., use CSS for styling). Q02: How can I process it? The Shell Use Unix commands that manipulate lines of text. Combine those commands using pipes and redirection. Use loops to perform the same operations on many files. Version Control N/A Basic Programming Write programs that use loops, file I/O, and string splitting to read data. Use floating-point numbers unless you are sure all values (including calculated values) will always be integers. Functions and Libraries TBD Databases Write SQL queries to select, filter, aggregate, and sort data. Use a general-purpose programming language for everything else. Number Crunching Use a linear algebra package like NumPy. Quality N/A Sets and Dictionaries TBD Development Use the right data structures. Web Programming Use an HTTP library to fetch it. Use an XML or JSON library to parse it. Q03: How can I tell if I've processed it correctly? The Shell N/A Version Control N/A Basic Programming Test your programs with small data sets whose results can be checked by hand. Functions and Libraries TBD Databases Build queries in small steps. Run queries against small data sets whose output can be checked manually. Number Crunching Compare a program's output to analytic results, experimental results, simplified test cases, and previous programs. Use tolerances when comparing results. Quality Create simple data sets for which the right answer can be calculated by hand. Compare the results produced by the new program to results produced by older programs. Sets and Dictionaries TBD Development Make code testable by dividing it into functions, and then replacing some functions with others for testing purposes. Web Programming N/A Q04: How can I find and fix bugs when I haven't? The Shell N/A Version Control N/A Basic Programming N/A Functions and Libraries TBD Databases N/A Number Crunching N/A Quality Write test cases that fail when the bug is present, but pass when the bug is fixed. Add assertions to programs to check its internal consistency. Use a debugger. Sets and Dictionaries TBD Development Write tests. Web Programming N/A Q05: How can I keep track of what I've done? The Shell N/A Version Control Keep your work under version control. Check in whenever you've completed a significant change. Write meaningful check-in comments. Basic Programming Put version control IDs in programs (and data files), and copy them forward to results. Functions and Libraries TBD Databases Store queries in files (just like programs). Number Crunching N/A Quality Turn bug fixes into assertions and test cases. Use a coverage analyzer to see what code is and isn't being tested. Sets and Dictionaries TBD Development N/A Web Programming Use meta headers in your HTML/XML data files. Q06: How can I find and use other people's work? The Shell N/A Version Control Get it from their version control repositories. Basic Programming N/A Functions and Libraries TBD Databases N/A Number Crunching N/A Quality N/A Sets and Dictionaries TBD Development N/A Web Programming Ask them to use well-formed URLs. And to format it according to well-defined machine-readable standards (e.g., XML or JSON). Q07: How can other people find and use mine? The Shell N/A Version Control Put your work in a publicly-accessible version control repository. Basic Programming N/A Functions and Libraries TBD Databases Raise exceptions to signal errors so that other people can handle them as they think best. Number Crunching N/A Quality N/A Sets and Dictionaries TBD Development N/A Web Programming Put it on the web at a stable URL. Format it according to well-defined machine-readable standards (e.g., XML or JSON). Include meta-data. Q08: How can I do all these things faster? The Shell Put commands in shell scripts so that they can be re-used. Version Control N/A Basic Programming Use appropriate variable names so that people will waste less time trying to read programs. Functions and Libraries TBD Databases N/A Number Crunching Use a linear algebra package like NumPy. Quality Design code for testing. Write test cases before writing new code. Sets and Dictionaries TBD Development Use a profiler to figure out why code is slow before trying to optimize it. Build code so that parts can be replaced easily. Web Programming N/A Read More ›

We're Going to Be Busy
Greg Wilson / 2012-08-01
We're going to be busy—and these are just the ones we have confirmed. Sep 13-14, 2012: DAFX Conference (York, UK) Sep 17-18, 2012: University of Oslo Oct 8-9, 2012: Purdue University Oct 17-18, 2012: Lawrence Berkeley National Laboratory Oct 18-19, 2012: University of British Columbia Oct 20-21, 2012: University of California Berkeley Oct 23-24, 2012: Caltech Oct 29-30, 2012: Oxford Nov 15-16, 2012: Scripps Institute Nov 28-29, 2012: University of North Carolina Dec 4-5, 2012: University of Edinburgh Dec 10-11, 2012: University of Texas (Austin) Jan 19-20, 2013: University of Chicago Jan 22-23, 2013: Technical University Munich Feb 15-14, 2013: AMOS Conference (Melbourne) Read More ›

That Was Quick
Greg Wilson / 2012-08-01
Registration for our October 2012 workshop at UC Berkeley opened mid-afternoon yesterday. We were sold out and putting people on the waitlist in less than four hours. Read More ›

Record and Playback
Greg Wilson / 2012-07-30
The biggest bottleneck Software Carpentry faces right now is a shortage of experienced instructors. To help fix that, we are going to record a complete presentation of our core two-day material so that people who want to teach it themselves can see how we say things, as well as what we say [1, 2]. As soon as we say "record", though, we have to ask, what exactly are we recording? Audio and video of a presenter in front of a whiteboard? Sure—that helps humanize the presentation. But what about the presenter's desktop? Viewers definitely need to see it, but should they see an MP4 in which the text on the presenter's screen appears as colored pixels arranged in the shapes of characters, or should we record the characters directly? I think the latter is by far the best option, since: it's much more compact (compare the size of an MP4 of an hour's typing with the size of the text typed); it can be copied and pasted (when you freeze a movie and copy what's on your screen, what you get is an image rather than a chunk of program text you can run yourself); it's searchable (same reason as above); it's more accessible to people with visual disabilities; and it's more likely to be future-proof and device-proof. If I record a video, I'm specifying a display mode as well as content; if I record what I've typed, and present that to you, you (or someone mediating between us) can decide how to style it, whether to use a one- or two-column display, and so on. Enter the Unix script command. As its man page says, it records everything printed to a terminal in a file for later inspection. Suppose, for example, that I run the following commands at a shell prompt (with italics showing output): $ script ~/log.txt Script started, file is /home/gvw/log.txt $ pwd /home/gvw/swc $ ls 3.0 4.0 5.0 LICENSE.txt book data links.html papers research scraps $ cd papers $ svn st $ exit Script done, file is /home/gvw/log.txt $ When I'm done, the file ~/log.txt contains: Script started on Mon Jul 30 11:21:24 2012 $ pwd^M /home/Owner/swc^M $ ls^M 3.0 4.0 5.0 LICENSE.txt book data links.html papers research scraps^M $ cd pp^H^[[Kapet^H^[[Krs^M $ svn st^M $ exit^M Script done on Mon Jul 30 11:21:42 2012 The ^M and ^H^[[K text is a literal transcript of what happens when the Enter and Backspace keys are pressed. In theory, this can be replayed to show people later exactly how something was done, keystroke by keystroke. All we need is timing, and script can deliver that: Options: ... ... -t Output timing data to standard error. This data contains two fields, separated by a space. The first field indicates how much time elapsed since the previous output. The second field indicates how many characters were output this time. This information can be used to replay typescripts with realistic typing and output delays. So in theory, if we redirect script's standard error to a file, we can use it to replay text at the correct speed. But if we actually do that, any error messages produced by the commands we're typing wind up in that file as well, instead of in our log file. That's a problem... There's another problem too. script is designed to capture line printer sessions, not interactive cursor-based work. Its man page even warns about this: Certain interactive commands, such as vi(1), create garbage in the typescript file. Script works best with commands that do not manipulate the screen, the results are meant to emulate a hardcopy terminal. This means that a recording of an interactive editing session, even one using something as simple as nano, is much harder to replay. And we do want to replay this kind of work, because (a) our chances of typing in a 20-line function interactively without mistakes are low, and (b) we want people to see that we don't actually enter code in print order, but instead create placeholder lines that are later filled in, indent things under if or else statements when we realize there are extra cases to handle, and so on. (Remember, we're trying to teach the "how" as well as the "what".) This leaves us with a few options: Abandon the idea of recording the text itself, and only record pixels. I'm going to cross this one off the list unilaterally. Figure out how to do what we want with the existing script command. Your help would be appreciated. Hack script (which is, after all, open source) to do what we want. If we go down this path, we'd appreciate help with it as well. Find another way to do what we want. By this point, you probably aren't surprised by me inviting pointers and proposals. No matter which of these options we pick, we're going to want to synchronize replay of interactive typing sessions with audio voiceovers in the browser. Luckily, Popcorn.js has been designed to do (almost) exactly that: it can tweak the content of a web page in sync with (for example) time marks in an audio file, so rewind/pause/fast forward would all do what we want. Before we can do that, though, we need to capture raw data; if you'd like to assist, please get in touch. [1] We have such a recording from a March 2012 workshop in Indiana, but our delivery has evolved a fair bit since then. [2] People who want to learn the material might find these recordings useful too, but both our past experience and a whole lot of educational research tells us that canned presentations aren't actually very effective for most novices. Read More ›

Software Carpentry Needs You!
Greg Wilson / 2012-07-28
Software Carpentry needs you! If you found our workshops, tutorials, or online material useful, there are many ways you could help us grow: Send us a testimonial to add to this page, or write a longer one that we can post on our blog. (We show these to potential sponsors as evidence that we're doing good things, and to people who are thinking about organizing or attending workshops to give them an idea of the benefits.) Help organize and host a two-day workshop. All we need is a room big enough to hold 30-100 people, a reliable network, coffee, and travel expenses for a couple of instructors (who can usually give academic talks while they're in town as well). Based on the last six months, workshops work best when they're aimed at people from the same labs, departments, or disciplines, so we're also interested in running workshops in conjunction with conferences and other get-togethers of like-minded people. Help us teach! It's the best way to learn, and we can also give you a chance to learn how to deliver online tutorials (which is a pretty useful bullet point to have on your resume these days if you're thinking about an academic career). If you're interested, we'll get you to help out at a workshop or two, then co-teach with a more experienced instructor until you're ready to fly solo. Help us update our content. Some of our lectures have bugs in them, while others make less sense to novices than we thought they would when we created them, and there are always new things to add. Help run things. We have maps and mailing lists to update, workshop ads to write and publish, questions to answer, and a web site to update. You don't need mad hacking skills to do any of these things, but we do ask that anyone who takes this on commits to doing a couple of hours a week for at least 3-4 months at a stretch to amortize the training overhead. Write software! This post is probably more than you want to bite off, but there are lots of other things you could help build that would help us. It would be a great way to build your reputation, too. Do something we haven't thought of. If you can think of a way to help us that we haven't thought of ourselves, please let us know. Like every open source project, Software Carpentry will only thrive if people like you lend a hand. If you'd like to help out, please get in touch. Read More ›

IPython Notebook + Towtruck + Etherpad + Slide Drive = Win
Greg Wilson / 2012-07-22
Executive summary: the tool I want for teaching programming doesn't yet exist, but could be built by combining the IPython Notebook, Towtruck, Etherpad, and Slide Drive. This would be a better web-based tool for teaching programming than anything currently available, and would alleviate the most critical bottleneck programming-for-everyone efforts face: the shortage of competent tutors. A lot of people say harsh things about PowerPoint, but as I've discussed before, the alternatives are all unsatisfying as well. HTML5 slideshows don't allow authors to mix text and graphics free-hand the way people do on a whiteboard; instead, the two have to be segregated into blocks. Screencasts are unsatisfying too: they don't display first-class content [1], and, like slideshows, aren't interactively hackable. But all is not lost: the pieces we'd need to build a first-class online collaborative teaching and presentation tool now exist, so long as we're willing to compromise a bit. Here's what I think they are: 1. The IPython Notebook is a "living" lab notebook for computational science that allows users to mix HTML with snippets of runnable code. One way to think about it is that it's a tutorial whose code examples can be run and modified in place; another is that it's an interactive re-thinking of Donald Knuth's idea of literate programming. People are already using notebooks for teaching, and while setup continues to be a hassle, it's clearly better than traditional, static, alternatives. 2. If you haven't seen Towtruck, you should check it out. Initially, it looks just like a two-pane in-the-browser editor showing CSS and HTML on one side and the rendered page on the other. What it adds is interactivity: at any point, I can share my session with you, so that everything either of us types is immediately visible to the other person. Based on what I've learned in the last four months running online tutorials for Software Carpentry, this is an excellent way to do one-to-one or one-to-few tutoring, especially when there's an audio channel to go with it (like Skype). It's more efficient than "mail me your code and I'll mail you back my thoughts" (and bulletin-board workalikes like Stack Overflow), and I believe it will prove to be much better at fostering the kind of ad hoc peer-to-peer instruction that's common in online gaming. 3. Etherpad is also a real-time collaboration tool, but is discussion-oriented rather than task-oriented. Most conference calls at Mozilla use have an Etherpad session running at the same time so that people can take minutes (in the main document pane), ask questions (in the chat pane), and socialize (ditto). Like Towtruck, it doesn't ship pixels around as video, but instead sends playback sequences (basically, the instructions needed to make my browser reproduce what you see in yours and vice versa). 4. Finally, the Slide Drive project that David Seifried started, and which Jeremy Banks has been enhancing as part of GSoC 2012, has been exploring a promising alternative: create slides in LibreOffice Impress, export them as SVG using the tools Marco Cecchetti has been building, then connect those to a voiceover using Popcorn.js. There's a lot of duct tape holding everything together, but it's proof that voiceover replays can be integrated with first-class web content. The tool I want to teach programming would combine all four of these. Learners would start with an IPython Notebook containing explanatory text and hackable code snippets. An audio voiceover would be sync'd with the content, highlighting paragraphs and triggering execution of snippets at the right times (though of course people could switch it off if they'd rather just skim). Whenever learners want help, they would be able to share their session with peers or tutors (classmates, strangers found on Stack Overflow, or paid tutors—lots of models are possible). The helper's browser would then be slaved to the learner's [2], and there'd be supplementary panes for note-taking and text chat (plus a button for launching Skype or some other VoIP tool). What wouldn't it include? Whiteboarding. I really like being able to mix words and pictures freely in my presentations, but HTML and web browsers (still) suck at that—as mentioned above, the best they let you do is embed an image in a block of text [3]. That said, embedded images have gotten us through the last twenty-odd years, and there's no reason to delay the good until better comes along. A debugger. Debuggers are great tools for helping novices understand what actually happens when their programs run—for example, the visualization and rewind capabilities in Philip Guo's Online Python Tutor are just jaw-dropping. I definitely want 'em, but again, what's already in the IPython Notebook is pretty cool on its own. Anything other than programming. Software Carpentry is not just a Python course; its core includes material on the shell, version control, and SQL, none of which fit comfortably into the model described so far. This interactive Git tutorial shows that it can be done, but once more, I'd be OK leaving that out of Version 1. Even without these features, this tool would help remedy the most critical bottleneck that Software Carpentry and other programming-for-everyone efforts have: the shortage of competent instructors. I didn't learn how to play Homeworld in a classroom; instead, a 14-year-old in Florida took pity on me after crushing me a few times and showed me how to maneuver my ships more efficiently [4]. This tool's Towtruck/Etherpad features would directly aid casual online mentoring, i.e., they'd help someone who knows something to tutor someone who doesn't over the web for a few minutes at a time, rather than just in blocks of a few hours on the weekend at the local library. Yes, Stack Overflow and other bulletin board systems support asynchronous over-the-web transfer of knowledge as well, but their write-it-all-down model is less effective than direct real-time collaboration [5]. It's less fun, too, and if we want to draw people into a community, we need to make it fun. Another way this tool would help is through record and playback. This is the web-native equivalent of screencasting: rather than sending you a painted-in-pixels video of what I did, I send you a transcript of how I did it that you can replay in your browser. This comes almost for free as a side effect of the Towtruck/Etherpad functionality, since they depend on one browser being able to tell another what to do. If those instructions are recorded, then synced with a voiceover or other multimedia using Popcorn (in the way that Slide Drive syncs things), it becomes possible for someone to record a how-to or a what-to-do so that others can play it back days, weeks, or months later, without needing to install something like Camtasia [6]. Is it going to happen? I don't know. The pieces may be there, but at a guess, one talented programmer-year (TPY) would be needed to create a usable prototype, which makes this roughly a hundred thousand dollar project. I do know that it would be a lot more interesting than most of the current crop of online education projects. [1] Stuff in screencasts that looks like text isn't really text: if you pause the screencast and try to select a few "lines", what you'll get is pixels arranged in patterns that resemble letters rather than the letters themselves. The same is true for diagrams: those things that look like boxes and arrows are just colored pixels too. This might not seem like a big deal, but if the content can't be selected and copied, it probably can't be indexed by search engines, and is probably opaque to accessibility aids as well. [2] For security reasons, we probably wouldn't let observers actually type things into the IPython Notebook—not unless we were sure the background Python process was really well sandboxed. [3] As Jeremy has been discovering, SVG-in-the-browser is not a way forward. Neither is plopping a canvas on top of your rendered page. [4] At least, he said he was a 14-year-old in Florida... [5] BBS-style systems may be more cost-effective, though, since good answers are findable months or years later—I think there's a role for both. [6] And if properly designed, those "recordings" would be forkable and mergeable, which would move us one step closer to the "GitHub for education" that keeps coming up. This in turn would make it easier for people to elaborate and comment on this kind of how-to presentation. Read More ›

Software Carpentry in Paris !
Nelle Varoquaux / 2012-07-21
A couple of weeks ago, we organized the first Software Carpentry Bootcamp in Paris. We were lucky enough to be able to host it at INRIA's research center, Place d'Italie. The focus of this bootcamp was Python, with 3 interventions on python: beginner's introduction, by Christophe Combelles, numerical computing with numpy by Konrad Hinsen and the python scientific ecosystem by Alexandre Gramfort, and an introduction to DCVS with git, by myself. 22 people joined us for 2 days of intense tutorials (24 places, 2 didn't show up, 15 people on the waiting list). Out of those 22 people, 4 travelled from abroad (UK, Belgium and Germany) to participate, and many travelled from distant cities in France (Rennes, Toulouse, Orléans, Nantes). 10 of the attendees filled in our only survey. Software Carpentry being initially a north american project, and not benefiting from a joint organisation with a conference, like the italian bootcamp, nor the Greg Wilson effect, we were a bit worried about the communication around the event. We communicated on INRIA's and IRILL's website, and we flooded several universities of posters. We also posted on LinuxFR, a french website about free and opensource technologies and events, on AFPy, the French speaking Python community, and our two sponsors, Majerti and Logilab twitted about the event. We filled in the bootcamp quite late, and many people subscribed on the mailing list only a couple of days before the event. As the other bootcamps, we targeted mostly researchers. People worked in different fields: applied mathematics, bio informatics, structural biology, finance, ecology, solid state physics, bio physics, linguistics, and at differents levels: researchers, master students, postdocs, phd students, engineering students. The variety of fields made it hard for speakers to evaluate the levels of the talks and the subjects. One common point of interest of the attendees seemed to be to find a replacement of matlab (because of the licence price), and the Python part of the bootcamp was very attractive. The bootcamp was pretty intensive, with a lot of quite advanced exercise. I was worried that the level was too high, but it appears that most the attendees found the level just right or hard, but understood the concepts (none of them checked the "Too hard, I didn't understand anything" box!). Here are some suggestions on how to improve the bootcamp: Ask attendees before hand the python modules they would be interested in. Give pre-bootcamp reading list to smoothen the levels of the participants Don't use vim to edit files! It's confusing for windows participant that don't have any knowledge in using the shell. Make the bootcamp longer (5 days). Make different level bootcamps. "I think the bootcamp was great, I learnd a lot and especially found it useful that the lecturers pointed towards certain modules coping with the variety of interests and backgrounds of the participants. I will definitely delve deeper into Python now. Hopefully there will be more bootcamps in Europe, maybe even with different levels from beginners to more advanced users. Thanks for this great initiative!" Overall, attendees were very satisfied with the bootcamp, the facilities! Our sponsors greatly helped us making a successful event. They sponsored coffee breaks, croissant, orange juice, and meals for both days! Thanks once again to the speakers, our sponsors and Feth Azreki, who came to help with the exercises. Read More ›

How Robust Is Your Programming Language?
Greg Wilson / 2012-07-21
One of the biggest problems in teaching novices how to program is that most programming systems are not robust. A car can go quite a long distance on a slightly-flat tire, and people can live for years with just one kidney or half a liver, but get one character out of place in a program, and boof—it's game over. And spotting that one character can be very hard, particularly if you're a novice and are learning mostly through copy-paste-and-tweak. Here's an example from a recent workshop. Learners had a bunch of data files that looked like this: Date,Species,Count 2012-07-03,raccoon,3 2012-07-03,deer,1 2012-07-04,squirrel,5 2012-07-04,raccoon,1 2012-07-05,squirrel,7 2012-07-05,mouse,1 I'd shown them how to build a pipeline using cut and grep -v to extract the names of the animals that had been seen (and discard the title "Species"). Separately, I had also shown them how to use sort and uniq -c to count the number of distinct items in a list, and how to use a for loop to do something for each file in a set. Their capstone task was to put the three ideas together to count the number of distinct species in each data file separately. Here's what one student wrote: can you spot the bug? #!/bin/bash/ for filename in *.dat do cut -d , -f 2 $filename | grep -v Species | sort | uniq -c done Give up? It's the trailing '/' on the '#!' line at the start: it makes /bin/bash/ look like a directory, which of course can't be used to execute a script. But that "of course" took me a minute to spot, and that was after the learner had spent (at a guess) five or ten minutes tweaking things in the body of the script. Almost by definition, novices don't have a good mental model of how things work, but that's exactly what they need in order to diagnose and fix problems. Real (physical) tools mostly aren't like this: you don't have to be a perfect driver in order to drive a car, or Picasso in order to paint a wall, because the things you're using are fairly forgiving. I'd therefore like to throw out a challenge to programming language designers. Forget about parallelism or the esoteric corner cases of various type systems; instead, focus on robustness. How forgiving is your language? How well do programs written in it work when people make minor mistakes? Or to switch to industrial engineering terminology, what are your language's tolerances? And to help people along this path, I'd like to propose a metric. Consider the set of all variants of your program in which a single typing mistake has been made (like the trailing '/' in the example above). The Strong Robustness Measure is the percentage of those programs that correctly reproduce the output of the intended program. The Weak Robustness Measure is the percentage for which the exact location of the error, and the fix required, are reported in terms a novice can understand. (I realize that what a novice understands is ill-defined, but you get the idea.) At a guess, Python's SRM score is close to 0%; its WRM score is around 20-50%, but that's based solely on recall of personal experience. I suspect that supposedly "forgiving" languages like Perl and Ruby don't do any better on either measure, and that "strict" languages like Java and Haskell do markedly worse on the second (without improving the first). I also suspect that as long as most languages and tools have an SRM score of 0, programming will continue to be hard to learn... Read More ›

Workshop wrap up from the Rutherford Appleton Laboratory
Tommy Guy / 2012-07-19
Last week, Software Carpentry hosted a bootcamp at the Costener's House, a converted monastery owned by the Rutherford Appleton Laboratory in Oxfordshire, England. Fifteen participants took part, with experience ranging from "gap year" (about to enter University) to people with years of professional development experience. One of the things that stood out at this workshop was the more advanced background of many of the participants, which allowed us to be more advanced with the material. The topics that seemed most popular were Software Engineering and Testing. From my perspective, it seemed like the hardest part of the workshop was the exercise we designed for the workshop, which was a scaled down version of the recommendation engine we use as an example in the Matrix Computation portion of our material. We did not use numpy, so placed a greater emphasis on preparing the data for further computation. While it took longer than I anticipated to explain and motivate the exercise, the payoff was that we could use the same problem on day 2 to introduce testing and databases. One of the difficulties in a 2-way workshop is avoiding the feeling of jumping topics every hour, and carrying this example through all of our Python work seemed to help tie the topics together. Special thanks for Alistair Mills, a project manager in the eScience department at the Lab, for hosting us and recruiting participants. Also, thanks to Stefano Cozzini from the Democritos National Simulation Center for flying in from Trieste to teach version control and software engineering. Read More ›

Wrapping Up in Halifax
Greg Wilson / 2012-07-17
Things went pretty well here in Halifax—thanks to the local helpers, and to Justin Ely from STScI for coming up to teach. Next stop, Scarborough! Good Bad Pedagogy Work more efficiently (shell) Support staff pair computing / working in groups on problems Version Control / logging Planning workflow methods provenance data processing website citations passing on good habits good teaching tools presentation style post-its stories scheduling transfer concepts to science "practical" code practices were very portable too fast/too short more on code organization more handouts / outline svn binary examples / how to create repository how do i organize repository? wanted more background on everything testing too confusing incomplete system requirements for examples stories greg works too fast / code disappears on run how to put all together? public relations/ getting the work out introduce everyone and helpers Some people were wait-listed and people didn't show up use teaching tools more contrived examples Read More ›

Wrapping Up in Boston
Greg Wilson / 2012-07-10
We just wrapped up a two-day workshop in Boston with learners from several universities. It seems to have gone pretty well; we look forward to coming back soon. Many thanks once again to Jessica McKellar, Geraldine van der Auwera, and Gus Muench for setting it up, and to Jessica for teaching yesterday afternoon. Good Bad learning shell databases running text editor from shell "I feel relatively functional" integrating different tools on the shell learning software testing unit testing instructor cares about learning how we learn emphasized conceptual aspects shell scripting having ways to learn more about each subject database + integration with shell Python programming nice length version control for the win! modularization for the win! Greg's anecdotes liked the level of teaching well integrated with web site liked specific recommendations (provenance) couldn't keep up not enough coffee in the room already familiar with some stuff Greg types too fast then covers up the window sometimes ducks questions not having room information ahead of time software installation list please setup on Windows afternoons too fast didn't know what level to expect coming in wanted more Python want final products how to speed up 9:00 is early for MIT people wanted to learn more about paths and shell stuff Greg's anecdotes explanation of Unix organization where to find help more assistants around one-page primer Read More ›

Independent Assessment of the Past Six Months
Greg Wilson / 2012-07-05
As many of you know, Dr. Jorge Aranda has been doing an independent assessment of Software Carpentry's effectiveness over the past six months. His brief was to tell us whether we were having an impact on scientists' lives, and if so, what kind, and how we could do better. He just submitted his final report (PDF), and we would welcome your thoughts. Execute Summary This report summarizes a six-month effort to assess the efficacy of the Software Carpentry program. Through a mixed-methods approach, including surveys, pre- and post-workshop interviews, workshop observations, and screencast analysis, this assessment concludes that the key premises for the usefulness of Software Carpentry instruction hold true: most scientists are self-taught programmers, they have fundamental weaknesses in their software development expertise, and these weaknesses affect their ability to answer their research questions. More importantly, this assessment concludes that Software Carpentry instruction helps scientists eliminate these weaknesses. The program increases participants' computational understanding, as measured by more than a two-fold (130%) improvement in test scores after the workshop. The program also enhances their habits and routines, and leads them to adopt tools and techniques that are considered standard practice in the software industry. As a result, participants express extremely high levels of satisfaction with their involvement in Software Carpentry (85% learned what they hoped to learn; 95% would recommend the workshop to others). While the outcome is largely positive, there are areas for improvement. Two of note are the spread in expertise among participants and the barriers that they face to change their software practice. The first problem leads to some participants feeling that the instruction is too advanced or too basic for them. The second problem damages the impact of Software Carpentry instruction, as participants learn valuable material, but find that for other reasons they are unable to adopt the relevant tools and techniques. Both problems, and other minor issues identified, can be addressed by the Software Carpentry team in subsequent versions of their workshop. Read More ›

Where We Are (June 2012 edition)
Greg Wilson / 2012-06-27
Last week, a double dozen friends and colleagues gathered physically and virtually to review what we've done and where we should go next. I'll postpone discussion of the biggest "what next" items to separate posts, but here's the status report itself. Successes We've reached a lot of people We have delivered 12 workshops to 376 learners in 4 countries (curriculum) Numbers will be 18 workshops, 520 learners, and 5 countries by the end of July Have actually lost count of the number of online tutorials we've run... What we're doing is helping 96% of participants happy with training Significant improvement in computational understanding We get mail Scientists want more 17 workshops in the pipeline 2-4 new contacts/month We can scale 20 co-instructors 7 have gone on away missions 5 have run online tutorials We can become self-sustaining Have raised $25K in other funding Shifting to "host pays costs" model for workshops Will try to count Software Carpentry training toward NSF requirements for training in Responsible Conduct of Research Working with the Software Sustainability Institute to create a driver's license for the DiRAC supercomputing facility Room for Improvement Very little new video content (because very little demand) But some interesting experiments using video for formative assessment No learner-led challenges Most grad students feel their research is challenging enough Having more mentors, from more disciplines, may help us help more people with their research problems Badging delayed pending a software release (mid-July) Initially badge instructors rather than learners (greater immediate value to recipient) Never-ending installation/setup/configuration difficulties Particularly for Windows users (20-40% of our learners) All technical solutions have proven to be unsatisfactory in different ways Some material is going stale and/or not as useful as we thought (e.g., Subversion, databases) Will update/replace Pacing continues to be a problem 20% find it too fast while 20% find it too slow More instructors will allow us to stream people appropriately Online tutorials have had mixed success "Only" 10-20% of learners still participating after 3 weeks Which is actually higher retention than many online courses Partly because of technical difficulties and time pressures But online instruction remains less effective than in-person instruction with today's technology Workshop participants are predominantly male (even when drawing from disciplines that are gender-balanced) Now in contact with WiSE/diversity groups in the US to try to arrange workshops for their members Still don't know what to teach scientists about the web Read More ›

Fortran Format Statements and Regular Expressions
Greg Wilson / 2012-06-27
One of our learners has a large number of Fortran format statements embedded in a program he's inherited, and would like to come up with equivalent regular expressions in order to parse the program's data with tools written in other languages. If you know of a tool to translate one into the other, or a Fortran-to-regexp dictionary, we'd welcome a pointer. Read More ›

A Supercomputing Driver's License
Greg Wilson / 2012-06-27
Software Carpentry and the Software Sustainability Institute are working together to develop a "driver's licence" for researchers who wish to use the DiRAC integrated supercomputing facility. The aim is to assess whether a researcher has the programming skills needed to use DiRAC productively, and if not, give them feedback on what further training they need. The hour-long test, which will include writing, coding, and interactive elements, will focus on fundamental skills including version control, unit testing, writing maintainable code, using Makefiles, and code review. Trials will begin this summer, and the completed test will be presented to DiRAC in September 2012. Read More ›

Two Posts on Scientific Workflows
Greg Wilson / 2012-06-26
Carly Strasser recently posted two articles on informal and formal scientific workflows, both of which are likely to be interesting to readers of this blog. Read More ›

Pessimism and Doom
Greg Wilson / 2012-06-26
According to recent research, an absence of optimism plays a large role in keeping people trapped in poverty: This hopelessness manifests itself in many ways. One is a sort of pathological conservatism, where people forgo even feasible things with potentially large benefits for fear of losing the little they already possess. The parallels with scientific computing practically jump off the page, don't they? Read More ›

Handling Variant Configuration Files
Greg Wilson / 2012-06-26
One of our learners came to us with a problem last week. The program she uses depends on some complex configuration files, which she'd like to store in version control. However, a couple of parameters change depending on the machine the program is running on. She doesn't want to check those changes into version control over and over again; what should she do? To make this more concrete, imagine that her configuration file is a Makefile containing instructions to rebuild a set of files. Initially, it looks like this: summary.dat : left.dat right.dat summarize left.dat right.dat > summary.dat That works fine on one machine, but on another, the program summarize has been installed as sum7. She could do this: SUMMARIZE=summarize # on Linux # SUMMARIZE=sum7 # on Mac OS X summary.dat : left.dat right.dat ${SUMMARIZE} left.dat right.dat > summary.dat but then she'd have to edit the file to uncomment one line, and recomment the other, whenever she switched from her Mac laptop to her desktop Linux machine and vice versa. Here's what she can do instead: ifeq ($(shell uname),Linux) SUMMARIZE=summarize else SUMMARIZE=sum7 endif summary.dat : left.dat right.dat ${SUMMARIZE} left.dat right.dat > summary.dat The trick is that Makefiles (and most other grown-up configuration files) allow conditionals and functions, just like programs. The function $(shell whatever) runs a shell command; ifeq then checks if that command's output is the string Linux, and the Makefile's variable SUMMARIZE is set accordingly. Another way to approach the problem is to put the machine-dependent parameters in separate files, and include one of those in the main file. For example, the Makefile could be written like this: include settings.mk summary.dat : left.dat right.dat ${SUMMARIZE} left.dat right.dat > summary.dat We would then put two files in version control—one for Linux: # settings-linux.mk SUMMARIZE=summarize and one for Mac OS X: # settings-macosx.mk SUMMARIZE-sum7 Notice that neither of these is called settings.mk. The first time we check out on a machine, we manually copy either settings-linux.mk or settings-macosx.mk to create settings.mk. The include in the main Makefile then finds what it's looking for, and everything runs. If someone changes the settings for a particular platform, our next version control update will get the new platform-specific file, and we'll have to re-copy it to install it. That manual copying step is why I'm not a fan of this second approach. There are ways to have the copying done automatically, but they all basically come down to including a conditional in the main Makefile, and if we're going to do that, we might as well use that for setting the parameter values anyway. However, some configuration file formats don't support conditionals, so the "include if available and fail if not" trick is still worth knowing. Read More ›

If You Want to Teach, Isn't It Only Fair to Learn a Few Things First?
Greg Wilson / 2012-06-25
Carla, a high school student, is doing a class project comparing climate change in the Northern and Southern hemispheres. She wants to see whether the gap between the average annual temperatures in Canada and Australia increased during the Twentieth Century. The raw data she needs is available online; all she needs to do (for some value of "all") is get it, load it into her program, do her calculations, and create an HTML to display her results for other students to see. Here's a program that does what she wants: 01 import xml.etree.ElementTree as ET 02 import urllib2 03 04 def main(first_country, second_country): 05 '''Show ratio of average temperatures for two countries over time.''' 06 first = get_temps(first_country) 07 second = get_temps(second_country) 08 assert len(first) == len(second), 'Length mis-match in results' 09 keys = first.keys() 10 keys.sort() 11 for k in keys: 12 print k, first[k] / second[k] 13 14 def get_temps(country_code): 15 '''Get annual temperatures for a country.''' 16 doc = get_xml(country_code) 17 result = {} 18 for element in doc.findall('domain.web.V1WebCru'): 19 year = int(find_one(element, 'year')) 20 temp = float(find_one(element, 'data')) 21 result[year] = kelvin(temp) 22 return result 23 24 def get_xml(country_code): 25 '''Get XML temperature data for a country.''' 26 url = 'http://climatedataapi.worldbank.org/climateweb/rest/v1/country/cru/tas/year/%s.XML' 27 u = url % country_code 28 connection = urllib2.urlopen(u) 29 raw = connection.read() 30 connection.close() 31 doc = ET.fromstring(raw) 32 return doc 33 34 def find_one(node, pattern): 35 '''Get text of exactly one child that matches an XPath pattern.''' 36 all_results = node.findall(pattern) 37 assert len(all_results) == 1, 'Got %d children instead of 1' % len(all_results) 38 return all_results[0].text 39 40 def kelvin(celsius): 41 '''Convert degrees C to degrees K.''' 42 return celsius + 273.15 43 44 if __name__ == '__main__': 45 main('AUS', 'CAN') And here's an incomplete list of what she needs to understand in order to write the first two lines of this program: Many of the things you want in a program are in a language's libraries, not in the language itself. You have to import a library in order to use those things. You can (and sometimes should) rename a library while you're importing it. A library is a namespace. To keep this article readable, I won't bother to list everything that the concept "namespace" depends on. There's a library called ElementTreethat you can use to handle XML data. Oh boy: now we have to explain what XML is. And what a "data format" is. And the difference between data and metadata. There's another library called urllib2that you can use to read data from the web. Here, I'm going to cut myself a break, and assume that everybody knows what a URL is (which is cheating, because most people don't understand the structure of URLs, which they need to know in order to understand what this program is doing on line 26). But we'll still have to explain that "data" and "pages" are really the same things, and that the browser is "just" interpreting a particular kind of data (for some value of "just"). All that, and we're only at line 2. We still need to explain variables, functions, call stacks, types, type conversions (strings aren't numbers, even when they look like numbers), loops, lists, dictionaries, indexing, assertions, defensive programming, member variables (those pesky namespaces again), a bit of XPath, and voila! Carla will be ready to tackle what most people would consider to be an entry-level exercise in open data. (She still won't have done any visualization, though, but you get my point.) Of course, we could cut some corners here. She could download the data as a CSV file, load it into Excel, define a couple of new columns to hold the Kelvin equivalents of the Celsius temperatures, and then plot a graph. It would take five minutes to show her how to do this, and 3-4 minutes for her to do it on her own the second time (half of which would be spent trying to figure out how to get rid of the default key in her Excel chart). If our aim is to teach quantitative thinking, and to show her that she can do the sorts of things that David Mackay did so well in Sustainable Energy Without the Hot Air, that's the easiest route by far. But our aim is to teach her programming so that she can start hacking the web; we're just using this problem to motivate her. And we're going to have to motivate her a lot to get her through all this—remember, Carla's bright, but she's no more excited about programming than she is about chemistry. This is what instructional design is all about. What do we want people to learn? How do those things depend on each other? How do we introduce them in ways that will keep learners interested and motivated rather than bored and discouraged? It's a highly skilled craft, just like software design, and just like software design, it's hard to explain where the time you spent on it went to someone who hasn't done it themselves: "thinking things through" isn't something you can easily put six hours against on a timesheet. There are two more similarities between instructional design and software design that are relevant to me right now. The first is that you don't actually have to do either: you can always just dive right in and start coding or teaching. The second is that when you do, the result is almost always a mess. There are spaghetti lessons, just like there's spaghetti code, and there's curriculum out ther that's as hard to understand and maintain as any piece of legacy software. But wait a sec. Isn't the very notion of "design" old-fashioned? Aren't we supposed to be agile here? Aren't we supposed to iterate rapidly, correcting course constantly based on near-realtime feedback? Well, yes, we are, but: that's not at all the same thing as ignoring everything people have learned about a particular problem before, and it only works if there's actually a reliable feedback function that we're actually paying attention to. Unfortunately, a lot of the "teach everyone programming" projects that have sprung up in the past year or two fail both of these tests. Most of them don't know what we've learned as a community over the past thirty-odd years about teaching programming, and most of them aren't doing any kind of systematic follow-up assessment to determine whether the techniques they're using are effective or not. It's as if a bunch of bright, passionate, idealistic people decided hey, none of us have ever written a large program before, but if we hack ten hours a day, we can create a world-beating browser. And to complete the analogy, imagine that when those people were told, "You know, this has been done before—maybe you should take a look at some of the prior art," their answer was, "Pfah! That stuff's the problem we're trying to fix!" So here's my challenge to you. If you want to teach programming—to scientists or anyone else—go and read a couple of reports on peer instruction (like this summary of ten years' use with physics, or its application to programming instruction). Find out what a concept inventory is, whether online code reviews help learners as much as their face-to-face equivalents, and whether program animation can help people learn more quickly (hint: using animations is much less effective than building animations). Read Mark Guzdial's discussion in Making Software of what makes programming hard to teach (and subscribe to his blog), and learn at least a little about why there aren't more women in computing by reading Whitecraft and Williams' chapter in the same book. Because after all, if you're going to ask Carla to learn about XML and URLs and lists and loops, isn't it only fair that you learn a few things yourself first? Read More ›

Feedback from Johns Hopkins
Matt Davis / 2012-06-20
Today we wrapped up our 2-day bootcamp at Johns Hopkins University in Baltimore. We had a pretty small group of about 12 each day (out of 20 signed up). In a really pleasant surprise we had students come from Brooklyn, NY and Virginia, plus two students who commuted up from Sigma Space in Lanham, MD. Overall feedback seemed quite positive: Good Bad advanced bash (find, pipes) git and GitHub are cool all the bootcamp material is online (at GitHub) intro to Python free intro to shell saw some advanced Python testing/debugging branching in git bootcamp was local examples relevant to scientists watching how someone else works we covered everything we planned (from Matt) show/use more IPython features (from someone who already uses it) shell went too slow, Python went too fast projector small/dim (+fonts small) had to take time off work (about half said they would have come on the weekend.) would like domain specific Python examples would like a third day (majority would have come for another day) hard if you miss the first day want more NumPy, SciPy, etc. get left behind if you make a mistake don't see complexity when writing examples from scratch, could provide some pre-written code more step-by-step instructions online wanted to see professional workflow with IPython notebook (from Josh) could use pre-class surveys to tailor lessons Big thanks to the other instructors Joshua, Sasha, and Mike! Read More ›

A Busy Week (And Schwag!)
Greg Wilson / 2012-06-18
We have a busy week coming up: Monday: Start of a two-day workshop at Johns Hopkins University Online tutorial for participants in the London workshop Tuesday: Online tutorial for participants in the Edmonton workshop Wednesday: Online tutorial for participants in the Newcastle workshop And another one for participants in the Vancouver workshop And one more for participants in the Michigan State workshop Thursday: Combined tutorial for participants in previous workshops Friday: All-day strategy meeting in Toronto In amongst all of this, we're revising a paper on recommend practices for scientific computing, summarizing our assessment of the impact our workshops have had, filling in a few more blanks in our instructors' guide, and talking to four sites about running workshops in the fall. We also now have t-shirts, coffee mugs, stickers, and other schwag—please check out http://www.cafepress.com/swcarpentry and send us pictures of you showing your Software Carpentry pride :-) Read More ›

This Week's Tutorials
Greg Wilson / 2012-06-15
We ran five and a half online tutorials this week: one each for the workshop participants from UCL, Newcastle, UBC, and Edmonton, one for students combined from five previous workshops, and a make-up mini-tutorial for students from Plymouth who'd had A/V problems. The topics included: creating simple HTML pages (both by hand and programmatically) Python dictionaries (a.k.a. "hashes" or "maps" in other languages) the Pandas statistical package People who would like to review should look at the first half of the Version 3 notes on HTML (slides 1-17), the Version 4 videos on sets and dictionaries, and—well, we don't have anything on Pandas yet, but its creator has a book that you can preview. Next week, we'll be combining the Edmonton and UBC groups into one, and trying to sort out participants' never-ending troubles with web conferencing software: Skype fell over every 7.5 minutes, Vidyo was better than BlueJeans but four groups still couldn't connect at all, and on and on it goes. (Honestly, given a choice between a flying car and a "just plain works" web conferencing system, I wouldn't have to think at all...) Read More ›

Pretty Well Sums It Up
Greg Wilson / 2012-06-15
Speaking at a conference in the UK today, Andrew Eland (of Google) said: Think of the things scientists won't build because they're not exposed to computing in school. +1 to that. Read More ›

All Entries for the Executable Paper Grand Challenge
Greg Wilson / 2012-06-14
I don't know how I missed this, but at last year's International Conference on Computational Science, two dozen different groups presented papers in response to Elsevier's Executable Paper challenge. PDFs of their work are all online, and links are included below. As I've said before, one of our goals is to give researchers the skills they need to do this kind of thing; looking through these papers, we've still got a ways to go. Ann Gabriel, Rebecca Capone: Executable Paper Grand Challenge Workshop Konrad Hinsen: A data and code model for reproducible research and executable papers Pieter Van Gorp, Steffen Mazanek: SHARE: a web portal for creating and sharing executable research papers Michael Kohlhase, Joseph Corneli, Catalin David, Deyan Ginev, Constantin Jucovschi, Andrea Kohlhase, Christoph Lange, Bogdan Matican, Stefan Mirea, Vyacheslav Zholudev: The Planetary System: Web 3.0 & Active Documents for STEM Piotr Nowakowski, Eryk Ciepiela, Daniel Harezlak, Joanna Kocot, Marek Kasztelnik, Tomasz Bartynski, Jan Meizner, Grzegorz Dyk, Maciej Malawski: The Collage Authoring Environment Friedrich Leisch, Manuel Eugster, Torsten Hothorn: Executable Papers for the R Community: The R2 Platform for Reproducible Research Wolfgang Müller, Isabel Rojas, Andreas Eberhart, Peter Haase, Michael Schmidt: A-R-E: The Author-Review-Execute Environment Matan Gavish, David Donoho: A Universal Identifier for Computational Results David Koop, Emanuele Santos, Phillip Mates, Huy T. Vo, Philippe Bonnet, Bela Bauer, Brigitte Surer, Matthias Troyer, Dean N. Williams, Joel E. Tohline, Juliana Freire, Cláudio T. Silva: A Provenance-Based Infrastructure to Support the Life Cycle of Executable Papers Grant R. Brammer, Ralph W. Crosby, Suzanne J. Matthews, Tiffani L. Williams: Paper Máché: Creating Dynamic Reproducible Science J. Siciarek, B. Wiszniewski: IODA — an Interactive Open Document Architecture Sandor M Veres, J. Patrik Adolfsson: A natural language programming solution for executable papers Antonio T.A. Gomes, Diego Paredes, Frédéric Valentin: Supporting the Perpetuation and Reproducibility of Numerical Method Publications Guillaume Jourjon, Thierry Rakotoarivelo, Christoph Dwertmann, Maximilian Ott: International Conference on Computational Science, ICCS 2011 LabWiki: An Executable Paper Platform for Experiment-based Research Rudolf Strijkers, Reginald Cushing, Dmitry Vasyunin, Cees de Laat, Adam S.Z. Belloum, Robert Meijer: Toward Executable Scientific Publications Nicolas Limare, Jean-Michel Morel: The IPOL Initiative: Publishing and Testing Algorithms on Line for Reproducible Research in Image Processing Tomi Kauppinen, Giovana Mira de Espindola: Linked Open Science-Communicating, Sharing and Evaluating Data, Methods and Results for Executable Papers Kenton McHenry, Michal Ondrejcek, Luigi Marini, Rob Kooper, Peter Bajcsy: Towards a Universal Viewer for Digital Content Nicola Ferro, Allan Hanbury, Henning Müller, Giuseppe Santucci: Harnessing the Scientific Data Produced by the Experimental Evaluation Search Engines and Information Access Systems Steven R. Brandt, Oleg Korobkin, Frank Löffler, Jian Tao, Erik Schnetter, Ian Hinder, Dennis Castleberry, Michael Thomas: The Prickly Pear Archive Rubens C. Machado, Leticia Rittner, Roberto A. Lotufo: Adessowiki — Collaborative platform for writing executable papers Jim Austin, Tom Jackson, Martyn Fletcher, Mark Jessop, Bojian Liang, Mike Weeks, Leslie Smith, Colin Ingram, Paul Watson: CARMEN: Code analysis, Repository and Modeling for e-Neuroscience Y.-A. Le Borgne, A. Campo: Open Review in computer science Elsevier grand challenge on executable papers Read More ›

First Workshop on Maintainable Software Practices in e-Science
Greg Wilson / 2012-06-10
First Workshop on Maintainable Software Practices in e-Science 9 October 2012 Co-located with 8th IEEE International Conference on eScience, Chicago This workshop will focus on the issues relating to the development and maintenance of software that can endure past the limited periods of defined project durations and project funding, and go beyond software engineering best practice to address aspects of cultural, organisational and policy change. By bringing together all those with an interest in ensuring the longer term development and use of software for research, including researchers, developers, research computing specialists, software engineers, infrastructure providers, facilitators, and funders, the goal of this workshop is to understand what software practices can be successfully applied and which lead to long-term improvements in the development of software for e-Science. As part of the workshop we will also be running a panel on the topic of culture change in software management for research, featuring invited speakers from a variety of disciplines who have experienced or instigated these changes, to talk about their real life experiences of scientists of what worked and didn't work for them. Topics of Interest We invite the submission of work that is related to the topics below. The papers can be either short (4 pages) position/experiences abstracts, or full (8 pages) research papers featuring original, unpublished work. Topics of interest include: software engineering and software product management best practice as applied to e-Science and computational science; community development, collaborative development, and widening adoption; licensing, funding, and business models for eScience and research software; managing governance and organisational change during the software lifecycle; measuring and analysing the impact of software and software processes; software attribution, citation, and credit; interaction between researchers, developers and stakeholders; transferable software practices from industry. Read More ›

We Get Mail
Greg Wilson / 2012-06-08
Hi Greg and Adina! Two days ago I filled out the Software Carpentry survey which asked me, among other things, how much time I had saved using my new Software Carpentry skills. I filled out the I had not really saved that many hours yet (but that I was approaching computational problems in a different way and began implementing version control!). However, that number of hours changed dramatically later that evening! In preparation for a talk I will give at a conference in a few weeks I was re-doing some analyses. I came to one which was very painful to me because it required using 3 different python scripts for several different files and repeating this for a range of numbers (so we're talking running 3 scripts about 20 times). I did not find this particular analysis very useful or informative, but it really makes my supervisor happy so I thought I should at least try to be a silver (or bronze) student and give it a whirl. I started to do this analysis the way I had previously done it, by using the script at the command line, figuring out where the files were that I wanted and then waiting for each manipulation to finish before starting the next one. BUT then I realized that you folks had taught me a different way! I made a bash script to use these 3 scripts together and then I even made it successfully loop through all the range of values! I saved so much time (hours at least on this one analysis and lots of mental anguish), have a clear record of what was done and feel so excited to automate more of the complex analysis I'm doing. I feel like I am looking at these problems differently and it is starting to seem a bit clearer! So a huge THANK YOU! I am also going to discuss at my next lab meeting some of the concepts we discussed in the workshop! Sincerely, Julia (PhD candidate, University of British Columbia) Read More ›

But the Greatest of These Is...
Greg Wilson / 2012-06-08
In her EAGE keynote earlier today, Victoria Stodden talked about the central role of geophysics in the reproducible research movement. After discussing the problem, she identified five interlocking solutions: Tools Intellectual property barriers Funding agency policy / federal regulation Journal policy Institutional expectations I think these are all important, but I think they are all less important than something which doesn't appear anywhere in her slides: Training There are two reasons: If people have the skills to do reproducible research, they can find ways around problems 1-5. (Proof: they're already doing so.) If people don't have those skills, then providing tools, changing intellectual property rules and journal policies, and everything else won't matter. The obvious response to the second claim is to say that if the tools make it easy, and the incentives are right, people will learn what they need to. But people have been saying that for the last twenty years (or possibly longer—I only started paying attention in the early 1990s), and it hasn't happened. I don't have data to back up this claim, but based on personal experience, and the experience of many other people, I believe that the average scientist is no more computationally literate today than he or she was in 1992. I think we need to accept that osmosis hasn't worked, doesn't work, and isn't likely to work, and that if we actually want to change the way people do science, our top priority has to be giving them the skills they need to implement those changes. Read More ›

Tutorial: NumPy, SciPy, and matplotlib
Matt Davis / 2012-06-07
Today I did a toy data analysis of some annual temperature data in Australia and Canada over the last ~100 years. The goal of the exercise was to demonstrate loading data, inspecting it, and fitting trends. My last tutorial didn't involve any real data so this week we wanted to change that. Like my previous tutorial I used the IPython HTML notebook to present. I hadn't been planning to use pylab mode or inline plots but some issues with my matplotlib forced me in that direction. There's an awkward situation here because I actually recommend people not use the pylab interface to matplotlib because the behind-the-scenes magic can cause problems (difficult to debug problems), but for doing demos the inline plots are really the way to go. The obvious upside is that the plots I made as part of the tutorial are embedded in the notebook for you to see now. The data was stored in a well behaved CSV format so it was simple to load with numpy.loadtxt. I used the matplotlib plot function for all the figures, even the one where I probably should have used scatter. I demonstrated fitting with scipy.stats.linregress, scipy.optimize.curve_fit, and scipy.interpolate.UnivariateSpline. The linregress function is useful for doing just a quick linear fit, while curve_fit allows you to fit arbitrary functions to the data since you give it a function you define. We just scratched the surface of three modules in SciPy today. Skimming the docs you can see there are a vast array of tools in there. And for a quick look at what matplotlib can do, take a look at the thumbnail gallery. Read More ›

Ten Simple Rules
Greg Wilson / 2012-06-07
The "Ten Simple Rules" series being run in PLoS Computational Biology has a lot of useful gems. Written by editor-in-chief Philip Bourne and others, the entire collection to date is available as a single PDF, but for those who prefer bite-sized reading, here are ten simple rules for: ...Starting a Company (PDF) ...Getting Involved in Your Scientific Community (PDF) ...Teaching Bioinformatics at the High School Level (PDF) ...Developing a Short Bioinformatics Training Course (PDF) ...Getting Help from Online Scientific Communities (PDF) ...Building and Maintaining a Scientific Reputation (PDF) ...Providing a Scientific Web Resource (PDF) ...Getting Ahead as a Computational Biologist in Academia (PDF) ...Editing Wikipedia (PDF) ...Organizing a Virtual Conference—Anywhere (PDF) ...Chairing a Scientific Session (PDF) ...Choosing between Industry and Academia (PDF) ...Combining Teaching and Research (PDF) ...Organizing a Scientific Meeting (PDF) ...Aspiring Scientists in a Low-Income Country (PDF) ...Graduate Students (PDF) ...Doing Your Best Research, According to Hamming (PDF) ...a Good Poster Presentation (PDF) ...Making Good Oral Presentations (PDF) ...a Successful Collaboration (PDF) ...Selecting a Postdoctoral Position (PDF) ...Reviewers (PDF) ...Getting Grants (PDF) ...Getting Published (PDF) So, what would your ten simple rules for good computational science be? Read More ›

What Skills Are Required to Implement Open Access?
Greg Wilson / 2012-06-04
Late last night, local time, the Access2Research petition at whitehouse.gov got 25,000 signatures, which means that the White House is committed to responding. As someone who no longer reviews for journals that aren't OA, I obviously think that's great news, but it once again raises the question: what concepts and skills do researchers need in order to implement open access? Jon Udell has written at length about some things (most recently here), but to sharpen up the discussion: If we added a third day to our standard two-day workshop, what could we put in it to help scientists use the web in their work? I've asked this question before, and even answered it, but now that I'm part-way through writing up curriculum for that answer, I'm no longer satisfied with it. Yes, I can explain the HTTP request/response cycle, the structure of HTML, and how to use a web service, but that's like explaining the syntax of Python rather than how to structure a program, and over the past few years, we've moved from the former to the latter because the latter is what really matters. So let me reframe the question. I think that half a day (i.e., the morning of day 3) is enough to teach: the structure of HTML/XML, how to create/parse documents programmatically, and the HTTP request/response cycle, the structure of URLs, etc. There is absolutely, positively not time to teach them enough about security for them to start writing their own server-side applications: frameworks like Django do make some things simpler, but even experts can still create vulnerabilities for villains to exploit. If we add a third day to our standard two-day workshop, can we teach people enough about the web for them to use and hack the IPython Notebook? "And hack" is the important part: tools like these are still in their infancy, and every researcher's needs are slightly different, so I think our goal should be to accelerate evolution in this arena by teaching people how to extend the Notebook, connect it to other systems, etc. As always, the acid test must be whether they can debug the things they build using the skills they have—we know from both our own experience and the work of others that cookbook/template approaches fail in this regard. Is the intersection of these constraints non-empty? Is there something we actually can teach in the time we have, to the audience we have, that's useful and debuggable? And if so, would you like to help us build it? Read More ›

Software Carpentry: The E-Book Version?
Greg Wilson / 2012-06-04
If I could send email two years into the past, I'd tell myself to spend a lot less time making videos for Software Carpentry, and a lot more time exploring interactive formats. One that I've been following recently is the IPython Notebook, a "live" lab notebook that allows scientists to mix, share, and explore text, code, graphs, and the like. Another, which I've just come across through this post on Mark Guzdial's ever-informative blog, is an interactive version of "How to Think Like a Computer Scientist". As Brad Miller says in his description: this book is really a triumph of open source. Here are the open source components we've used and modified for this project: The text in the book is based on the original How to Think Like a Computer Scientist by Jeff Elkner, et. al. You can find the non-interactive version here The Python Interpreter is by Scott Graham and you can find out information about it at Skulpt.org. This was a really key piece and although I've spent a ton of time creating a turtle graphics module for it, the book wouldn't have gotten off the ground without it. In the book the pieces we refer to as activecode all make use of Skulpt, along with the really nice Javascript editor Codemirror. What we call codelens in the book is based on work by Philip Guo. I really like this part because students can step through the code a line at a time both forward and backward. Finally, the glue that holds the whole thing together is the excellent Sphinx publishing system. This really turned out to be the key as Sphinx lets you create new directives to use in writing. I was able to add directives to Sphinx to make it really easy to include the interactive features without getting in the way of the writing. To get a feel for what it can do, try working through the turtle graphics chapter. It's pretty impressive... My questions are: Given a choice between a traditional (static) book, either printed or electronic, and something like this, which would you prefer us to build? How do we adapt this to teach things like the Unix shell, version control, databases, etc.? Like the IPython Notebook, this is a single-language system, but real-world computing is almost always multilingual. Read More ›

Git tutorial links
Carlos Anderson / 2012-06-03
I promised to post some links related to the Git tutorial I taught on May 24, 2012. Most of my material was based on this excellent online book (free): http://git-scm.com/book/ch1-3.html. Here's another resource with help on specific and common Git commands: http://gitref.org/. Someone asked the differences between Git and Mercurial, but because I haven't used Mercurial I wasn't able to answer. I found these links that I hope are useful: http://www.wikivs.com/wiki/Git_vs_Mercurial and http://www.rockstarprogrammer.org/post/2008/apr/06/differences-between-mercurial-and-git/. Also, here's a video of Linus Torvalds himself (creator of the Linux kernel and Git) promoting Git (and putting down everything that is not Git): Tech Talk: Linus Torvals on git Finally, someone asked how to set up your own repository on a remote server so you can push and pull from your local computer. This is explained in the Git online book (section 4.2). Briefly, have your project in a directory on the server, then go into that directory and run "git init" (if you haven't already). Go up one directory (your project directory's parent), and run "git clone —bare project project.git" (where "project" is the name of your project directory). This will create a directory called project.git, in which a copy of your repository will be in, minus the "working" files (everything will be hidden, so that running the "ls" command won't show anything). Now from your local computer, you can clone this project by running "git clone user@server.univ.edu:projects/project.git", where "user@server.univ.edu" is your username and server address and "projects/project.git" is the location (relative to your home directory on the server) of your project. Read More ›

Introduction to NumPy Tutorial
Matt Davis / 2012-06-01
Today I did a tutorial moving quickly through the basic usage of NumPy, the essential library for doing numeric computing with Python. We covered building arrays; indexing; array math; NumPy's element-wise functions; array attributes and methods; random numbers; masked arrays; and array comparison. I presented using the IPython HTML Notebook. I enjoyed it because I never had to switch between a terminal and text editor, but I wonder what the people viewing thought. A nice feature of the notebook is that I can export it, both as a PDF and in the .ipynb format importable by IPython. The PDF is here, and the .ipynb file here. One thing we didn't cover was input/output. NumPy has functions for loading data from/saving to text files, and functions for saving to/loading from special NumPy binary files. The latter are useful for saving intermediate results. I've previously discussed numpy.loadtxt and numpy.genfromtxt on my personal blog. One question I got was for advice for people switching from Matlab. I've never really used Matlab so I can't answer that question, but I found a relevant page on scipy.org: NumPy for Matlab Users. Read More ›

Dictionaries are a Scientist's Friend
Dan McGlinn / 2012-05-31
[Code] Today I would like to share with other scientists the power of dictionaries. I recently learned about this data structure during a Software Carpentry bootcamp that Greg Wilson and Ethan White organized. Greg and Jason Pell (from Michigan State) visited Utah State University for a two day bootcamp where I am a postdoc studying patterns of biodiversity. During that time we covered a whole slew of topics, but today I wanted to demonstrate how dictionaries helped to significantly speed up my Python code. Before we begin this demonstration let's import two necessary libraries into our Python interpreter. We'll import division from __future__ so that Python carries out 'true' division instead of 'classic' division, and we'll import time from time so that we can time how much time our functions take to run. # import division so that we are using 'true' division from __future__ import division # import time so that we can clock how fast our functions run from time import time A dictionary is a data structure that is used to look-up a value given an associated key (i.e. its name). Check out the Software Carpentry lecture on dictionaries for a full and technical introduction. Let's do a quick warm-up with dictionaries in Python: # create an empty dictionary my_dict = dict() # add key : value pairs to the dictionary my_dict['a'] = 3 my_dict['b'] = 4 print my_dict # {'a': 3, 'b': 4} # examine if specific keys are contained in your dictionary print 'a' in my_dict # True print 'c' not in my_dict # True # return a value in the dictionary for a given key print my_dict['a'] # 3 print my_dict['b'] # 4 The concept of dictionaries is not restricted to the Python programming language, and more generally dictionaries are referred to as hash tables. For example in the R programming language there is a package called hash that implements a dictionary type of data structure. For my own research purposes I have been struggling with how to efficiently compute very computationally intensive recursion equations. Specifically I have been attempting to code John Harte et al.'s Hypothesis of Equal Allocation Probabilities (HEAP) model. This is a model for how probable it would be to observe n individuals of a species in a sample of area A located within a larger area A0 that contains a total of n0 individuals. Harte et al.'s solution to this problem is encapsulated in the following equations: which can be encoded in Python by a very simple function: def heap_prob(n, A, n0, A0): """ Calculates the probability that n individuals are observed given A, no, and A0 under the HEAP model Equation 4.15 in Harte 2011 Inputs: n: integer, number of individuals in sample A: integer, area of the sample n0: integer, number of individuals total in A0 A0: integer, the area of the area within which A is placed Returns: float, probability between 0 and 1 """ if A0 / A == 2: return 1 / (n0 + 1) else: A = A * 2 prob_sum = 0 for q in range(n, n0 + 1): prob_sum += heap_prob(q, A, n0, A0) / (q + 1) return prob_sum This function does the job but notice as A0 increases the function will call itself thousands of times. Additionally it will compute some of the same values many times, for example to calculate Pr(3 | 1, 5, 8) the following probabilities are computed: Pr(3| 1, 5, 8) = Pr(5 | 4, 5, 8) * Pr(5 | 2, 5, 4) * Pr(3 | 1, 5, 2) + Pr(5 | 4, 5, 8) * Pr(4 | 2, 5, 4) * Pr(3 | 1, 4, 2) + Pr(4 | 4, 5, 8) * Pr(4 | 2, 4, 4) * Pr(3 | 1, 4, 2) + Pr(3 | 4, 3, 8) * Pr(3 | 2, 3, 4) * Pr(3 | 1, 3, 2) so as you can see identical probability values are computed multiple times. The number of multiple identical values increases primarily as A0 increases relative to A (i.e., as the number of recursions increases). Here is where dictionaries enter the picture. If we could either 1) build a dictionary from scratch as we move through the computation or 2) supply a ready made dictionary we could recoup any speed losses that we experienced by having to compute the same probability value multiple times. In Python we can accomplish this with the following function: def heap_prob_dict(n, A, n0, A0, pdict={}): """ Determines the HEAP probability for n given A, n0, and A0 Uses equation 4.15 in Harte 2011 Returns the probability that n individuals are observed in a quadrat of area A Note: this version uses a dictionary to speed computation """ i = A0 / A if (n, n0, i) not in pdict: if i == 2: pdict[(n, n0, i)] = 1 / (n0 + 1) else: A = A * 2 prob_sum = 0 for q in range(n, n0 + 1): prob_sum += heap_prob_dict(q, A, n0, A0, pdict) / (q + 1) pdict[(n, n0, i)] = prob_sum return pdict[(n, n0, i)] Note that in this function we use a three valued key (n, n0, i). This means that for each specific combination of these three values there is a specific probability that is stored in the dictionary. Now that we have our two functions that compute the same probability lets see how much of a speed boast using dictionaries gives us. We'll define a new function to carry out these time tests: def time_trial(n, A, n0, A0): results1 = [0, 0] results2 = [0, 0] start = time() results1[0] = heap_prob(n, A, n0, A0) end = time() results1[1] = end - start start = time() results2[0] = heap_prob_dict(n, A, n0, A0) end = time() results2[1] = end - start return [results1, results2] Let's actually see which function is faster. test_time = time_trial(0, 1, 100, 2 ** 5) print test_time # [[0.437, 2.964], [0.437, 0.016]] print test_time[0][1] / test_time[1][1] # 185.252384216 # Note that both functions returned the same probability: 0.437, # but that the dictionary function appears to be about 2.25 # orders of magnitude faster than the naive approach. To quote Greg this demonstrates that dictionaries are "more gooder". As we increase A0 relative to A the number of recursions increases and the relative speedup offered by the dictionary approach will increase exponentially... even more more gooder? # Let's vary the number of recursions and examine how the ratio # of the time trials vary time_ratio = [0] * 6 for i in range(0,6): time_test = time_trial(0, 1, 105, 2 ** (i+1)) if time_test[1][1] == 0: time_ratio[i] = 'Inf' else: time_ratio[i] = time_test[0][1] / time_test[1][1] print time_ratio # ['Inf', 0.0, 0.667, 22.500, 508.431, 15542.385] # it appears that the first two trials where too fast to # meaningfully compare the two approaches, but notice how the # ratio increases exponentially as i increases. That's all I have for today on dictionaries. If you have questions or suggestions please post them below. I do have a quick additional technical note on dictionaries that contained a Python surprise for me and could have resulted in a nasty bug if I had not caught this. Technical Note: Now if you've been following very closely you may ask why is the three valued key "(n, n0, i)" needed in the function "heap_prob_dict" when the equation only varies two values, n and i and not n0. In other words, for a given set of starting values all the keys in the dictionary will have the same n0 so why include it in the key tuple? The reason we have to include n0 in the key tuple is for two reasons: 1) we may want to supply "heap_prob_dict" a larger dictionary that we have created ahead of time for many different values of n0, and 2) Python will store the dictionary "pdict" in memory even after the function has returned its result. Therefore if you call the function again, even though you may not supply "pdict" explicitly, Python will pull it from memory and supply the key — value pairs that it computed on the last run of the function. If the two subsequent runs of the function are for different value of n0 then the wrong answer will be returned if n0 is not supplied in the key id. To demonstrate this compare two successive time trials with the same staring parameters. The second trial should give you a much larger apparent speed up from using dictionaries because the dictionary is being called from memory. # We'll use compute time_trial twice with the same input values # and examine the results of the second trial test_time1 = time_trial(0, 1, 100, 2 ** 5) test_time2 = time_trial(0, 1, 100, 2 ** 5) print test_time2 # [[0.437, 2.792], [0.437, 0.0]] # So the naive approach resulted in approximately the same amount # of time but the dictionary based approach was essentially # instantaneous because the appropriate dictionary was already # in memory from the last call of time_trial. Read More ›

SoundSoftware 2012: Workshop on Software and Data for Audio and Music Research
Luis Figueira / 2012-05-30
The SoundSoftware project — a UK based project aiming towards software sustainability in the Audio and Music Research community — is organising a one day workshop on "Software and Data for Audio and Music Research". The aim of this workshop is to discuss issues such as robust software development for audio and music research, reproducible research, management of research data, and open access. Although the SoundSoftware project is concentrating on the audio and music research community, many of these issues have wider interest, and this workshop could be a way to link with other communities for which proper handling of research software and data are important. The workshop is taking place at Queen Mary, University of London, on Monday, 18 June 2012. The registration is free and ends on the 31 May 2012. You can see the full details of the workshop (including registration) here: http://soundsoftware.ac.uk/soundsoftware2012 Read More ›

A Poster for the Software Carpentry Workshop at INRIA
Greg Wilson / 2012-05-30
Read More ›

How to Run a Bootcamp
Greg Wilson / 2012-05-29
With so many people setting up and running Software Carpentry workshops, I thought it was time to put together a more complete how-to. If I've forgotten anything, please let me know. (Later: please also see this PLoS paper"Ten Simple Rules for Developing a Short Bioinformatics Training Course".) Assumptions bootcamp is 2-3 days long For 20-50 learners... ...who are typically grad students in science and engineering There is 10:1 ratio of helpers to learners (or better) Goals Learners leave with a basic set of skills... ...and a big-picture view of how to apply them A handful of learners/helpers are ready to run workshops on their own We have feedback on how to improve content/format Venue Prefer flat seating to banked seating (makes it easier for helpers to reach people) Power, network, and air conditioning 40 people plus laptops generate a lot of heat Prefer places where people can drink coffee/eat snacks while they're working Note: make sure venue is accessible to people with disabilities Dates Weekends, weekdays, and splits have all worked (adjust to local needs) Start of term works better at universities than end of term Start of second/subsequent term works better for grad students than start of first term Too many other things going on at the start of a new academic year By second term, people know whether they need this or not Mid-term breaks (reading weeks) have not worked well: very high no-show rate Can also schedule bootcamps right before/right after major conferences Usually means extra accommodation expenses for participants... ...but can be easier to teach when all learners are from the same community Times We typically run 9:00-4:30, but this will vary to meet local needs Note: remember to take childcare needs into account Instructors and Helpers Recruit instructors beforeannouncing event We're happy to help Recruit helpers locally Typically grad students who already know this stuff... ...who might be thinking about becoming instructors themselves Remember:one person can't talk eight hours straight for two or three days Not coherently, anyway Registration Your institution/venue may insist you use their system Otherwise, we use EventBrite We're happy to host registration (just ask us for access) Make sure to allow a waiting list Note: usually best notto charge even a nominal amount for registration As soon as any money is changing hands, academic institutions will often charge for space Advertising Create a page with bootcamp details Usually separate from the registration page (to give more control over content and style) We're happy to host this... ...or to link to your page if you'd rather host it yourself Tweet and blog about the event Send mail to departmental mailing lists, disciplinary mailing lists, and specific specific people (e.g., lab directors) Note: remember to include links to the advertising page and the registration page! Note:notify participants if the event is being broadcast or recorded A signed photo release will be required from each participant if pictures are being taken Catering Decide whether to provide coffee/snacks at breaks and/or lunch Typically budget $5 per person per snack, $12-15 per person for lunch... ...which adds up quickly Note:remember to take dietary restrictions into account Vegetarian/kosher/halal Nut/dairy allergies The Week Before Request confirmation from participants Notify people on the wait list if they're going to be able to attend May invite 10 or so people from wait list to show up If there are still empty seats at mid-morning on the first day, they're welcome to stay If not, they're welcome to come back for the next one Note: make this clear to wait listed people you invite before they come Send software/network setup instructions Include instructions on how to check that things have installed properly Include a contact address for people who are having trouble Send pre-workshop questionnaire (if any) The Day Before Are the building and room going to be unlocked on the day of the event? Do you know where the washrooms are? Is there any noisy construction/cleaning going on? Is the projector working? Do you have a spare bulb? Is the network working? Do you have enough power cords? Do you have a contact number for maintenance/tech support? Have you double-checked with catering (if you're having snacks/lunch brought in)? Have you emailed a reminder to participants? Have you set up any accounts/web sites/repositories you will need? Do you have sticky notes in two different colors to hand out? Use these instead of clickers to answer yes/no questions, signal need for help/completion of exercises, etc. On The Day Give people a few minutes to plug in, get on the network, etc. Put network connection instructions on handouts, on the projected screen, etc. Tell people what Twitter hash tag you're using (if any) Circulate the attendance sheet/photo release form Hand out multi-colored sticky notes Follow-up Collect attendance sheet/photo release forms Create mailing list for contacting participants after the event Send post-workshop questionnaire (if any) Wrote blog post summarizing event Read More ›

What to Read If You're Teaching Software Carpentry
Greg Wilson / 2012-05-27
One of our goals—in fact, our biggest goal—is to grow the pool of people who can teach Software Carpentry workshops, so that we can reach more researchers in future. We've learned that having scientists teach scientists works better than having professional programmers (or computer science grad students) teach scientists, but that raises a question: what do we teach the average neuropsychologist or geophysicist about software engineering to prepare her for teaching our stuff? Ideally, we'd like them all to have read a double dozen books, and to have ten or more years of experience in industry, but that's not going to happen. Instead, we've settled on a very short list: Robert Glass: Facts and Fallacies of Software Engineering. Karl Fogel: Producing Open Source Software. We'd like people to read a lot more than this, of course, but I think that if someone has digested these two books, they're ready to teach the big-picture stuff we want to get across. I also think that would-be instructors should read one or the other of: How People Learn How Learning Works Both books are group efforts, which is why I haven't listed authors. The first dates from 2000, but all the material is still relevant; the second is more recent, but less applied. Both have lots to chew on, and they complement the software-specific material nicely. Read More ›

Spot the Workshops
Greg Wilson / 2012-05-24
Read More ›

No CT Without PL
Greg Wilson / 2012-05-24
In a blog post earlier today, Mark Guzdial argues that computational thinking requires learning with a programming language. Unlike many such claims and counter-claims, his is based on a wealth of research, most recently an excellent dissertation by Juha Sorva. I strongly agree with Mark's position: our real goal in Software Carpentry is to teach computational thinking, but the only way to do that successfully is to teach some basic programming skills and use those to convey larger ideas. If we're successful in getting follow-on funding at the end of our current grant, we're going to work hard to re-cast everything in this way. Read More ›

Feedback from the University of British Columbia
Greg Wilson / 2012-05-24
Aaaand that's a wrap in Vancouver, ladies and gentlemen: our workshop at the University of British Columbia seems to have gone well: Good Bad clarity data management sticky notes workflow overview/coverage resources coffee/snacks exercises how to use programming correctly learning terminal testing learning science Python helpers one step further version control modules on website anecdotes general philosophy (organic) not enough Python too fast (esp 2nd day) hands-on (second day) needed warmup primer Greg's jokes resources (overwhelming) text editor software install issues too short crossover confusion too much on data management communication prior to course (software install) screen real estate more version control why better than MATLAB mystify setup That's my last road trip until mid-July—it's good to have ended this round on a high note. My thanks again to Ian Mitchell, Adina Chuang Howe, and everyone else who helped or attended. Read More ›

Responsible Conduct
Greg Wilson / 2012-05-23
Titus Brown, Ethan White, and I have been talking about what Responsible Conduct of Research (RCR) standards would look like in computational science. Titus has posted a four-point summary; we would welcome your input over on his blog. Read More ›

Alone and Misunderstood
Greg Wilson / 2012-05-23
Jeffrey Mirel and Simona Goldin's recent article in The Atlantic titled "Alone in the Classroom" initially struck a chord with me, particularly when they said, "A recent study by Scholastic and the Gates Foundation found that teachers spend only about 3 percent of their teaching day collaborating with colleagues. The majority of American teachers plan, teach, and examine their practice alone." But then Mirel and Goldin blew it by saying: So what would it take structurally to enable teachers to work collaboratively for improved learning outcomes? Perhaps the most important change is in school curricula. One of the key differences between public education in the U.S. and elsewhere is the lack of a common curriculum. In other countries common curricula unite the work of teachers, school leaders, teacher educators, students, and parents. With a common curriculum there is agreement about what students are expected to learn, what teachers are to teach, what teacher educators are to instill in potential teachers, and what tests of student learning should measure. A common curriculum for the nearly 100,000 K-12 schools in the U.S. could be a major step towards productive teacher collaboration. No—a common curriculum won't improve teaching, and the things that will don't need a common curriculum. To understand why, have a look at another article in The Atlantic from last December titled, "What Americans Keep Ignoring About Finland's School Success", which is based in part on Pasi Sahlberg's book Finnish Lessons. Sahlberg and others believe that the key to Finland's success are (a) the fact that teachers are respected as professionals in a way they no longer are in North America, and (b) the fact that when they say, "No child will be left behind," they actually mean it: It is possible to create equality. And perhaps even more important—as a challenge to the American way of thinking about education reform—Finland's experience shows that it is possible to achieve excellence by focusing not on competition, but on cooperation, and not on choice, but on equity. The problem facing education in America isn't the ethnic diversity of the population but the economic inequality of society, and this is precisely the problem that Finnish education reform addressed. More equity at home might just be what America needs to be more competitive abroad. The question of how much commonality in the curriculum should be enforced vs. how much freedom instructors should have to adapt to local needs is clearly relevant to Software Carpentry, particularly when we start thinking about standards for responsible conduct of computational research. More important, I think, are the questions of professionalism and equity: how do we develop a cadre of instructors who know what to teach, and how to teach it, and how do we level the computational playing field so that everyone doing research has a fair shot at acquiring and using these skills? Read More ›

Citing Versions
Greg Wilson / 2012-05-22
We got mail yesterday from a workshop participant saying, "My question is how does one show in a research paper that the underlying data and the software is version controlled?" Cameron Neylon's answer, slightly edited, was: My approach in an idea world would be to have all of my data (or links to it) under version control along with the code. When the version to be used for the publication is clear I would give it a tag (I'm a Git user but there is similarly functionality in all version control systems) and then push that to an online repository. You can then give a link or reference to the appropriate repository version online. If you don't want to put your main repository online then you could just put up the version from the publication. Of course this is not so easy if you are doing it in retrospect. Your data may be in other places in systems that aren't under proper version control. If the data is small enough I would grab a copy and put it in the repository version you are using for publication. If its big and stored remotely then you are a bit limited. In that case I would try and refer to a specific version if it is possible, or if you can't do that then you can try and get a checksum. But basically the main thing is to create and refer to a specific version of your repository and make sure it is available in a useful form to people who want to check it out. Read More ›

Being More Systematic About Publicity
Greg Wilson / 2012-05-21
Several people have suggested that we need to be more systematic about publicizing workshops and other events: blogging and tweeting reaches people who already know about us, but doesn't reach those who don't. If you know of mailing lists and/or news aggregators aimed at researchers who might be interested in what we do, please mail pointers to team@carpentries.org. Read More ›

An Exercise With Matplotlib and Numpy
Mike Hansen / 2012-05-21
For this tutorial, we'll be plotting some weather data from a site call Weather Underground. You can download temperature readings and weather events for your local area in a comma-separated file. I've put weather data for Bloomington, IN in a file called weather.csv. Each row is one day, and there are columns for min/mean/max temperature, dew point, wind speed, etc. We'll be plotting temperature and weather event data (e.g., rain, snow). 0. Installing matplotlib I covered installing matplotlib in a previous tutorial. The matplotlib site also has installation instructions. I'll assume for the rest of the tutorial that you have matplotlib installed and working. If you can type this code at a Python shell: from matplotlib import pyplot and not receive any errors, then you're good to go. 1. Numpy Crash Course The numpy module is how you do matrix-y stuff in Python. I'll give a quick example of why we'll need it. Imagine you were to type the following code into a Python shell: x = [1, 2, 3, 4] print x * 5 What does this print? Why, this of course: [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4] By default, the * operator in Python copies the contents of a list however many times you specify. So x * 5 copied the contents of x five times and stuck them all together. When we're doing matrix math in Python, it would be nicer if x * 5 produced [5, 10, 15, 20]. We could do this manually with a loop: for i in range(len(x)): x[i] = x[i] * 5 We could even get fancy with Python's list comprehension syntax: x = [x_i * 5 for x_i in x] For a list with only four elements, this won't be so bad. For larger lists, however, it will be quite slow. Using numpy avoids the performance hit by doing the heavy lifting in C instead of in Python. Here's how we'd do the previous example with numpy: import numpy as np x = np.array([1, 2, 3, 4]) x = x * 5 print x This prints array([ 5, 10, 15, 20]) which is what we would expect. The array(...) lets you know that x is a numpy array. Onward! 2. Reading the Data As with many programming problems, our first step is to read the data into memory. I've started a script called plot_data.py with a few import statements and some utility functions. I'll explain these functions in detail as we go forward. import numpy as np import matplotlib.pyplot as pyplot from datetime import datetime import os event_types = ['Rain', 'Thunderstorm', 'Snow', 'Fog'] num_events = len(event_types) def event2int(event): return event_types.index(event) def date2int(date_str): date = datetime.strptime(date_str, '%Y-%m-%d') return date.toordinal() def r_squared(actual, ideal): actual_mean = np.mean(actual) ideal_dev = np.sum([(val - actual_mean)**2 for val in ideal]) actual_dev = np.sum([(val - actual_mean)**2 for val in actual]) return ideal_dev / actual_dev In past tutorials, we've either manually parsed our data file(s) or used Python's csv reader. Because of our focus on numpy here, we're going to use the loadtxt function. By passing in the right options, we can get loadtxt to parse our weather.csv file directly into a numpy array. For a first pass, I've written the following function to read in the weather data: def read_weather(file_name): data = np.loadtxt(file_name, delimiter=',', skiprows=1, converters = { 0 : date2int }, usecols=(0,1,2,3,21)) return data #-------------------------------------------------- data = read_weather('data/weather.csv') print data The first two parameters, delmiter and skiprows tell loadtxt to split fields based on commas and skip the first row of the file (which contains column names). numpy doesn't handle dates, so I've used the converters parameter to have have loadtxt convert column 0 (a date string) into an integer using my date2int function. The last parameter, usecols, tells loadtxt to ignore all columns in the file except the first, second, third, forth, and twenty-second column (the date, temperature, and weather event columns). Unfortunately, running this code produces the following error: $ python plot_data.py Traceback (most recent call last): File "plot_data-2.py", line 34, in <module> data = read_weather("data/weather.csv") File "plot_data-2.py", line 28, in read_weather usecols=(0,1,2,3,21)) File "/usr/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 796, in loadtxt items = [conv(val) for (conv, val) in zip(converters, vals)] ValueError: could not convert string to float: Rain The final line tells us that numpy can't convert the string "Rain" into a floating point number. This is from the weather events column in our data, which contains text like "Rain" or "Snow-Fog". We could try and write a converter for this column too, but I've chosen to simply have numpy bring the column in as a string (which we'll manually parse later). To do this, we pass in a special object for the dtype parameter of loadtxt. This object can be constructed by giving a dictionary to the numpy.dtype function. The code below provides names and data types for all of the columns we'll be using: def read_weather(file_name): dtypes = np.dtype({ 'names' : ('timestamp', 'max temp', 'mean temp', 'min temp', 'events'), 'formats' : [np.int, np.float, np.float, np.float, 'S100'] }) data = np.loadtxt(file_name, delimiter=',', skiprows=1, converters = { 0 : date2int }, usecols=(0,1,2,3,21), dtype=dtypes) return data The last column format is given as "S100", which means "string up to 100 characters in length." numpy needs to know the maximum size of the column for efficiency, so I gave myself plenty of room with 100 characters. Running this new code produces the following output: $ python plot_data.py [(734503, 53.0, 43.0, 32.0, 'Rain') (734504, 32.0, 25.0, 18.0, 'Snow') (734505, 27.0, 20.0, 12.0, '') (734506, 42.0, 34.0, 26.0, '') (734509, 52.0, 40.0, 28.0, '') (734510, 47.0, 36.0, 24.0, '') (734511, 51.0, 38.0, 24.0, '') (734512, 57.0, 43.0, 28.0, '') (734513, 45.0, 43.0, 40.0, 'Rain') (734514, 43.0, 29.0, 15.0, 'Fog-Snow') (734515, 19.0, 17.0, 15.0, 'Snow') (734516, 27.0, 18.0, 9.0, 'Snow') ... We're finally in business. Each row in our data set consists of a timestamp (the date converted to an integer), the maximum, mean, and minimum temperature, and the weather events that occurred that day. We're going to start by plotting the mean temperature versus the day of the year. Since we've given names to each of our columns, we can pull them out easily: def read_weather(file_name): dtypes = np.dtype({ 'names' : ('timestamp', 'max temp', 'mean temp', 'min temp', 'events'), 'formats' : [np.int, np.float, np.float, np.float, 'S100'] }) data = np.loadtxt(file_name, delimiter=',', skiprows=1, converters = { 0 : date2int }, usecols=(0,1,2,3,21), dtype=dtypes) return data #-------------------------------------------------- data = read_weather('data/weather.csv') min_temps = data['min temp'] mean_temps = data['mean temp'] max_temps = data['max temp'] dates = [datetime.fromordinal(d) for d in data['timestamp']] events = data['events'] for date, temp in zip(dates, mean_temps): print '{0:%b %d}: {1}'.format(date, temp) Each column can be extract individually from the data array by using data['column name']. I've used the datetime.fromordinal function on the timestamp column to convert the integers back into datetime objects. Using the handy built-in zip function, I've printed out pairs of dates and mean temperatures. I use advanced string formatting to print the month, day, and temperature (see the datetime documentation for date formatting information). The program now gives the following output: Jan 01: 43.0 Jan 02: 25.0 Jan 03: 20.0 Jan 04: 34.0 ... May 11: 59.0 May 12: 62.0 May 14: 69.0 Everything looks good, so let's get started plotting. 3. Temperature Plot We're going to start with a simple line plot that has the day of the year on the x-axis and the mean temperature for that day on the y-axis. Our plotting function, called temp_plot, will take in dates and times, and give us back a matplotlib figure object. Here's the code: def temp_plot(dates, mean_temps): year_start = datetime(2012, 1, 1) days = [(d - year_start).days + 1 for d in dates] fig = pyplot.figure() pyplot.title('Temperatures in Bloomington 2012') pyplot.ylabel('Mean Temperature (F)') pyplot.xlabel('Day of Year') pyplot.plot(days, mean_temps, marker='o') return fig We start by computing the day of the year for each date. The datetime module lets us subtract dates from each other, producing a timedelta object. We subtract each date from January 1st of 2012, adding 1 so that our count will start from 1 instead of 0. The days field on a timedelta object gives the total number of days (in this case, from January 1st). Next, we create a new matplotlib figure. In between calls to pyplot.figure, matplotlib's plotting functions will draw new plots on top of old ones. We'll use this fact to add a trend line to our plot shortly. After adding a title and some axis labels to our figure, we call pyplot.plot with our days (x values) and mean_temps arrays (y values). I've also passed in 'o' to the optional marker parameter so that small circles will be plotted for each data point. In the main body of the program, we use the os module to create a "plots" directory (checking if it exists first). Next, we call our temp_plot function and then use savefig to save the figure out to a png file: data = read_weather('data/weather.csv') min_temps = data['min temp'] mean_temps = data['mean temp'] max_temps = data['max temp'] dates = [datetime.fromordinal(d) for d in data['timestamp']] events = data['events'] if not os.path.exists('plots'): os.mkdir('plots') fig = temp_plot(dates, mean_temps) fig.savefig('plots/day_vs_temp.png') Running $ python plot_data.py should create a "plots" folder and put a file inside called "day_vs_temp.png" that looks like this: Not bad! Let's add a trend line to the plot based on a simple linear model of the data. 3.1 Adding a trend line By using numpy's polyfit function, adding a trend line is a snap. This function takes our x and y values (days and mean_temps), and gives us back a slope and intercept (the final parameter is the degree of the fitted polynomial — we pass 1 for a linear fit). slope, intercept = np.polyfit(days, mean_temps, 1) Using the slope and intercept, we can plot a trend line by computing "ideal" temperatures for each day according to the old y = mx + b formula. With our variables below, this will be ideal_temps = (slope * days) + intercept. Note that I've changed the days = ... line to days = np.array(...) so that we can do mathematical operations directly on the array. def temp_plot(dates, mean_temps): year_start = datetime(2012, 1, 1) days = np.array([(d - year_start).days + 1 for d in dates]) fig = pyplot.figure() pyplot.title('Temperatures in Bloomington 2012') pyplot.ylabel('Mean Temperature (F)') pyplot.xlabel('Day of Year') pyplot.plot(days, mean_temps, marker='o') slope, intercept = np.polyfit(days, mean_temps, 1) ideal_temps = intercept + (slope * days) r_sq = r_squared(mean_temps, ideal_temps) fit_label = 'Linear fit ({0:.2f})'.format(slope) pyplot.plot(days, ideal_temps, color='red', linestyle='--', label=fit_label) pyplot.annotate('r^2 = {0:.2f}'.format(r_sq), (0.05, 0.9), xycoords='axes fraction') pyplot.legend(loc='lower right') return fig To make the plot a little more useful, I've annotated the plot with the R-squared value of the fit. pyplot.annotate lets you put text on the figure in a variety of ways. Here, I've set the xycoords parameter to "axes fraction" so that annotate interprets my coordinates (0.05, 0.9) as fractions between 0 and 1 relative to the figure axes. The (0.05, 0.9) means to place the text horizontally 5% from the y-axis (left) and 90% from the x-axis (bottom). The final call to pyplot.legend places a legend on the figure. You must include a label parameter on at least one plot object for this to work (I've included it on the trend line plot call). By default, the legend will show up in the upper-right corner of the figure. This will get in the way on our current plot, so I moved the figure to the lower-right with the loc parameter. With the changes above, here's the new plot: Notice that the string formatting ({0:.3f}) has rounded the R-squared value and slope label for us to three decimal places. 3.2 Adding "error" bars Since we also have the min and max temperatures in our data, let's add "error" bars to our plot to show the temperature range on each day. We'll modify temp_plot to take in two additional parameters (min_temps and max_temps), and plot the temperature range if they both have values (i.e., are not None). Adding error bars requires us to use the pyplot.errorbar function instead of pyplot.plot. It takes additional parameters (xerr and yerr) for the x and y errors. We will use yerr, and pass in an array with two rows: one for error above each data point, and one for the error below. This array is easily computed by subtracting the max and min temperatures from the mean, and then stacking the two arrays together row-wise with numpy.vstack. def temp_plot(dates, mean_temps, min_temps = None, max_temps = None): year_start = datetime(2012, 1, 1) days = np.array([(d - year_start).days + 1 for d in dates]) fig = pyplot.figure() pyplot.title('Temperatures in Bloomington 2012') pyplot.ylabel('Mean Temperature (F)') pyplot.xlabel('Day of Year') if (max_temps is None or min_temps is None): # Normal plot without error bars pyplot.plot(days, mean_temps, marker='o') else: # Compute min/max temperature difference from the mean temp_err = np.row_stack((mean_temps - min_temps, max_temps - mean_temps)) # Make line plot with error bars to show temperature range pyplot.errorbar(days, mean_temps, marker='o', yerr=temp_err) pyplot.title('Temperatures in Bloomington 2012 (max/min)') slope, intercept = np.polyfit(days, mean_temps, 1) ideal_temps = intercept + (slope * days) r_sq = r_squared(mean_temps, ideal_temps) fit_label = 'Linear fit ({0:.2f})'.format(slope) pyplot.plot(days, ideal_temps, color='red', linestyle='--', label=fit_label) pyplot.annotate('r^2 = {0:2f}'.format(r_sq), (0.05, 0.9), xycoords='axes fraction') pyplot.legend(loc='lower right') return fig #-------------------------------------------------- data = read_weather('data/weather.csv') min_temps = data['min temp'] mean_temps = data['mean temp'] max_temps = data['max temp'] dates = [datetime.fromordinal(d) for d in data['timestamp']] events = data['events'] if not os.path.exists('plots'): os.mkdir('plots') # Plot without error bars fig = temp_plot(dates, mean_temps) fig.savefig('plots/day_vs_temp.png') # Plot with error bars fig = temp_plot(dates, mean_temps, min_temps, max_temps) fig.savefig('plots/day_vs_temp-all.png') The new plot is saved to a file named day_vs_temp-all.png and looks like this: If you need to compute standard error for your errorbar plot, you can use scipy.stats.sem from the scipy module. For our next plot, we'll do a multi-part histogram of the weather events for each month. 4. Event Histogram Histograms in matplotlib are generated using the pyplot.hist function. This function takes an array of data, which can itself contain arrays (for a multi-part histogram). We want to count events per month, so we'll need to create an array for each type of event. Inside these arrays will be observations like [1, 1, 2, 3, 3] for "January", "January", "February", "March", "March". Here's a diagram to help out: When pyplot.hist receives our data, it will attempt to "bin" the month observations automatically. By default, it will break observations into 10 bins. We want a bin for each month instead, and we want the bins aligned properly to the month numbers (1 = January, 2 = February, etc.). The bins parameter to pyplot.hist takes either a number (representing the desired number of bins) or a sequence (representing the desired bin edges). In the code below, we pass range(1, 5 + 2) to ensure that our bins start at 1 (for January) and go through 5 (for May). def hist_events(dates, events): event_months = [] for i in range(num_events): event_months.append([]) # Build up lists of months where events occurred for date, event_str in zip(dates, events): if len(event_str) == 0: # Skip blank events continue month = date.month # Multiple events in a day are separated by '-' for event in event_str.split('-'): event_code = event2int(event) event_months[event_code].append(month) # Plot histogram fig = pyplot.figure() pyplot.title('Weather Events in Bloomington 2012') pyplot.xlabel('Month') pyplot.ylabel('Event Count') bins = np.arange(1, 5 + 2) pyplot.hist(event_months, bins=bins, label=event_types) pyplot.legend() return fig The main body of the program is updated to call hist_events and save the resulting figure to plots/event_histogram.png. data = read_weather('data/weather.csv') min_temps = data['min temp'] mean_temps = data['mean temp'] max_temps = data['max temp'] dates = [datetime.fromordinal(d) for d in data['timestamp']] events = data['events'] if not os.path.exists('plots'): os.mkdir('plots') fig = temp_plot(dates, mean_temps) fig.savefig('plots/day_vs_temp.png') fig = temp_plot(dates, mean_temps, min_temps, max_temps) fig.savefig('plots/day_vs_temp-all.png') fig = hist_events(dates, events) fig.savefig(os.path.join('plots', 'event_histogram.png')) When we run $ python plot_data.py, the new plot looks like this: Each collection of bars represents a month, and the individual bars represent the number of Rain, Thunderstorm, etc. events observed for that month. The figure's legend was populated by passing event_types in for the label parameter of pyplot.hist. The plot looks good, but it would be nice to properly label the months. We could do this manually with pyplot.xticks as follows: pyplot.xticks( (1.5, 2.5, 3.5, 4.5, 5.5), ('January', 'February', 'March', 'April', 'May') ) This will label each bin in the center (hence the .5 added to each number) with the proper month name. If our data grows to include more months, however, we'll have to manually extend the number of bins and our labels. Let's change hist_events to keep track of the range of months in the data. Additionally, we'll use Python's calendar module to automatically get the month names. At the top of the program, we'll import the calendar module: import calendar and then redefine hist_events as follows: def hist_events(dates, events): event_months = [] for i in range(num_events): event_months.append([]) # Build up lists of months where events occurred min_month = 13 max_month = 0 for date, event_str in zip(dates, events): if len(event_str) == 0: # Skip blank events continue month = date.month min_month = min(month, min_month) max_month = max(month, max_month) # Multiple events in a day are separated by '-' for event in event_str.split('-'): event_code = event2int(event) event_months[event_code].append(month) # Plot histogram fig = pyplot.figure() pyplot.title('Weather Events in Bloomington 2012') pyplot.xlabel('Month') pyplot.ylabel('Event Count') pyplot.axes().yaxis.grid() num_months = max_month - min_month + 1; bins = np.arange(1, num_months + 2) # Bin edges pyplot.hist(event_months, bins=bins, label=event_types) # Align month labels to bin centers month_names = calendar.month_name[min_month:max_month+1] pyplot.xticks(bins + 0.5, month_names) pyplot.legend() return fig During the process of building our observation arrays, we now track the minimum and maximum months observed. This allows us to automatically create our bin edges, and let's us grab months names from the calendar module by indexing into the calendar.month_name list. Note that the bins variables was created using numpy.arange, which is a shortcut for bins = numpy.array(range(1, num_months + 2)). Making bins a numpy array lets us call pyplot.xticks with bins + 0.5, centering month_names on each bin. As a bonus, I've also added a horizontal grid using the axis.grid function. You can add both a horizontal and vertical grid at the same time by calling pyplot.grid. Here's the updated plot: Looks ready for publication! Read More ›

What's Wrong With All This?
Greg Wilson / 2012-05-20
Titus Brown doesn't like this web site. He's OK with the content (I think), but he finds it awkward to use, and while I don't feel as strongly as he does, I accept that we have outgrown WordPress. The question is, what should we use instead? We need a lot more than just a blog and some static web pages, but learning management systems like Moodle weren't built with our ad hoc model in mind (they're really teaching administration systems), and newer tools like P2PU feel like a step backward. I started thinking about requirements for a replacement back in April, but got distracted. Here's a longer look. Who are we? A learner learns new tools and skills. A tutor passes on their knowledge. A workshop host organizes and runs a bootcamp. An author creates content (lessons, blog posts, exercises, etc.). An admin manages the web site. Innocent bystanders watch and comment from the sidelines :-) An individual might assume any of these roles at different times or in different contexts. For example, workshop hosts are often tutors, a tutor for one topic may be a learner for another, authors are often admins and vice versa, etc. What do we do? A workshop is a live event, typically running all day for two days. A workshop is made up of several lessons, which may use the content we have online (or at least improvise around it), but which usually remix the order. A course is a slower-paced event, typically running for a few hours once a week for several weeks. Courses use our online material (or don't) just like workshops. A tutorial is an ad hoc real-time session with one tutor and several learners. Tutorials can be online or live. A help session is an ad hoc session between one tutor and one learner. Help sessions can be online or live. A content jam is a live get-together to create or update content. We haven't actually had one of these yet, but I'm hopeful... What do we use to interact? Skype and desktop sharing for real-time online events. We've been using BlueJeans for one-to-many tutorials; it works pretty well, but doesn't seem to allow people to use Skype text chat while in a session, and we've never been able to make recording work. It's also very expensive, but cheaper alternatives (WebEx, Google+ hangouts) haven't scaled as well. Our WordPress blog. We manually echo posts to Twitter. Twitter (tweets aren't currently archived on our site, but should be). Web pages in our WordPress site. This includes the online course material, ads for workshops, and a few bits of advertising. Comments on the WordPress material. (People have suggested adding forums as well, but I don't believe there would be enough traffic, and we all have too many places to pay attention to already.) Videos (hosted at YouTube, embedded in the WordPress site). Point-to-point email. (This is usually from and to people's personal accounts, so it isn't archived.) Our own mailing lists: one for workshop organizers and content developers, and others for various regions and workshops. These are archived, but since we use MailMan, they're not integrated with WordPress. (I've experimented with various mailing list plugins for WordPress, and haven't been impressed by any of them.) We manage these lists through the Dreamhost control panel. Subversion. We have a publicly-readable repository for the course material and a members-only repository for administrative stuff like grant proposals. We also set up one repository for each workshop group, which we keep live for a couple of months. We also manage repos through the Dreamhost control panel, but there's no way to automatically keep their membership and permissions in sync with the group mailing lists. EventBrite for event registration. We link to EventBrite sign-up pages for events from the corresponding WordPress pages, but the linkage is done manually. EventBrite also gives us a mailing list for each event; we should use these to contact workshop participants immediately before and after workshops rather than our MailMan lists. Google Calendar and Google Maps to show when and where upcoming workshops are. Our calendar and map are linked into a page on the WordPress site, but updates have to be done manually. In particular, we have to remember to add events to both the calendar and the map, and when an event is over, we have to change the map as well as moving the event's page to the "past" section of the site. Doodle to schedule tutorials. One thing we don't have yet is badges. We'd like to issue these to people who have taken part in workshops and the follow-up tutorials (i.e., our "graduates"), and also to instructors and content creators. The Open Badges team is working on a WordPress plugin to do this, which we hope to deploy in June. How do we interact? Synchronously, i.e., taking part in or delivering workshops, courses, tutorials, help sessions, and content jams, both live and online. Scheduling events using Doodle. Registering for (and unregistering from) events using EventBrite. Advertising events using MailMan lists, the blog, and Twitter. Updating people on changes to workshops and courses using MailMan lists and EventBrite lists. Writing blog posts. Writing pages. Commenting on blog posts and pages. Tweeting. Creating or updating content in the main Subversion repository (and then updating the web site if needed). Creating and uploading videos, and then linking to them in a blog post or from a page. Discussing things on the "dev" list. There's almost never discussion on the per-workshop lists: I feel like there should be (or should be forums or something), but help sites need critical mass, and I doubt we'll ever have it, so I'd rather put energy into teaching people how to use existing online Q&A sites well. Giving feedback about events. Right now, we collect good and bad points from people at the end of every workshop, then post them to the blog. We really need to collect feedback on tutorials, and to follow up with people months or years later. What's wrong with all this? Speed and design: the existing web site is slooooow, and no one would call the existing site beautiful... Identity: scheduling is separate from registration is separate from the mailing lists and from repositories. Badging will only make that more complicated. Mozilla Persona (formerly BrowserID, and not the same thing as OpenID—are you confused yet?) isn't a complete solution: it handles authentication, but not authorization, and "who can do what?" is an authorization issue. OAuth is supposed to take care of the latter, but it it's a looong way from meeting our needs. Integration: connecting our blog to Twitter would be easy—I just haven't bothered to set it up. But tweets should be archived on the web site (both the ones we make and mentions of us), the mailing list archives should be integrated into the site, and so on. Again, there's a lot more to this than just managing identities. Features: I'd like a live table of registration stats (how many people have signed up for all upcoming events, and how many tickets remain) on the web site, but EventBrite doesn't have embeddable HTML for that. I'd also like a person-by-list table showing who's on which mailing list, and who has access to which repository, but Dreamhost and MailMan don't offer that. And I'd like the colors of map pins to change automatically once a workshop is over, but—you get the picture. All of these things can be fixed with the right glue code, but I have bigger yaks to shave. Conversation: the most important missing element is regular back-and-forth with the people we're trying to help. Again, I think that our goal should be to get them onto existing Q&A sites like Stack Overflow; in particular, we should help them feel confident enough to hang out there, so they don't become part of the dark matter of computational science. What do I want? I've written before about the idea of a GitHub for education, but that wouldn't address all of the issues laid out above. (Event registration, for example, doesn't feel like a GitHub kind of thing; nor does scheduling tutorials.) If we had a truly programmable web, I could hire a summer student to assemble what I want, but that's not a yak, it's a herd of angry mammoths: managing identities and permissions for MailMan, EventBrite, Subversion, and the blog in a single place would require a lot of hacking (or a time machine—if I could go back to 1999 and persuade the startup I was part of to open source SelectAccess, we'd be done by now). So that leaves me looking for an off-the-shelf solution which I don't think exists. If I'm wrong, I'd welcome a pointer—and if there's something we should be doing that isn't in the discussion above, I'd welcome a pointer to that too. Read More ›

Space at Upcoming Events
Greg Wilson / 2012-05-19
Here's how registration is going for upcoming events: University of British Columbia May 22-23 39/40 Johns Hopkins University June 18-19 7/20 Paris June 28-29 9/25 Boston July 9-10 23/40 University of Waterloo July 12-13 1/40 Halifax July 16-17 8/40 University of Toronto (Scarborough) July 19-20 14/40 If you'd like to join us, there's still plenty of space—and if you have friends who could use some training in basic software skills, please point them our way. Read More ›

The Most Important Scientific Result Published in the Last Year
Greg Wilson / 2012-05-18
J.M. Wicherts, M. Bakker, and D. Molenaar: "Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results". PLoS ONE, 6(11): e26828, 2011, doi:10.1371/journal.pone.0026828. Background The widespread reluctance to share published research data is often hypothesized to be due to the authors' fear that reanalysis may expose errors in their work or may produce conclusions that contradict their own. However, these hypotheses have not previously been studied systematically. Methods and Findings We related the reluctance to share research data for reanalysis to 1148 statistically significant results reported in 49 papers published in two major psychology journals. We found the reluctance to share data to be associated with weaker evidence (against the null hypothesis of no effect) and a higher prevalence of apparent errors in the reporting of statistical results. The unwillingness to share data was particularly clear when reporting errors had a bearing on statistical significance. Conclusions Our findings on the basis of psychological papers suggest that statistical results are particularly hard to verify when reanalysis is more likely to lead to contrasting conclusions. This highlights the importance of establishing mandatory data archiving policies. (See also this discussion by Robert Trivers.) Read More ›

Feedback from Alberta
Greg Wilson / 2012-05-18
Our two-day workshop at the University of Alberta wound up a couple of hours ago. We had quite a few no-shows this time (which was annoying, given how many people were waitlisted), but those who did come seemed to get a lot out of it: Good Bad Room Mix of talking & doing Stickies Version control Hands on Link on online video Python Clear speaking Computer in lab (using linux) Automatic versioning Programming in windows in Cygwin Philosophy Discussion of productivity Good reading suggestions Functional programming Overall workflow I feel more competent (morale boost) Researched anectodes, backed with data Website TDD Instructor's body language Helpers Coffee hard Need more projectors Having to keep stickies No testing Not enough depth Not convinced about version control Too fast on day 1, too slow on day 2 Need levels Came late Not enough Python No lunch No time for notes More version control Too short break No shows Pace (a little fast) Supervisor wasn't here (need to convince her) Where is the code (dropbox?) Bad chairs Windows alienation Mailing list Making DB (no info) Many thanks to Rose, Neil, and Paul for making it possible. Read More ›

Halifax in July
Greg Wilson / 2012-05-17
We have just added another workshop to the summer's list, this one at Saint Mary's University in Halifax, Nova Scotia, on July 16-17. Please let friends and colleagues know—I look forward to meeting them. Read More ›

And One More: Johns Hopkins in June
Greg Wilson / 2012-05-16
We're pleased to announce that we will be running a two-day bootcamp at Johns Hopkins University in Baltimore on June 18-19, 2012. We only have space for 20 participants, so please register early. Read More ›

Two Bootcamps in Ontario in July
Greg Wilson / 2012-05-15
We are pleased to announce that we will be running two bootcamps in Ontario in July: one at the University of Waterloo on July 12-13, and another at the University of Toronto (Scarborough) on July 19-20. If you'd like to take part, please sign up, and please let friends and colleagues know about them as well. Read More ›

Fooling the Internet
Greg Wilson / 2012-05-15
A recent article in The Atlantic titled, "How the Professor Who Fooled Wikipedia Got Caught By Reddit" describes how GMU's Prof. T. Mills Kelly has had students fake history online, and how their most recent effort unraveled. There's lots to think about here regarding what scientists should know about using the web, trusting it, and making it their own... Read More ›

Feedback from Newcastle upon Tyne
Chris Cannam / 2012-05-15
This week's Newcastle bootcamp, organised by the Digital Institute at Newcastle University with the Software Sustainability Institute and SoundSoftware, was the first Software Carpentry bootcamp run entirely locally in the UK. For the organisers it was a slightly nervous experience, hoping we could get the material to hold together in presentation without Greg's experience at hand. Feedback from the learners was generally good on the material, the venue and the structure. The most common complaint was that it was hard to follow along at times, and I think there are several areas where we'll be able to improve the "flow" for future events. Notably, this was the first bootcamp I've attended at which nobody found the room too crowded or the wrong temperature. Result, Newcastle! Here are the good and bad feedback points. Some points were close duplicates, and I've put the additional ones in brackets (e.g. Python was cited three times). Good Bad Python(+ Choice of Python as easy scripting language)(+ Gives me confidence to start using Python) Use of coloured sticky notes(+ coloured notes as an unobtrusive way to request help) The "Bringing it together" section Good mix of content Version control(+ integration with Bitbucket)(+ version control tips e.g. archive, bisect)(+ use of recipes as version control material) Coding along with the presenters Lots of helpers Good temperature in room, open window Arrangement of room into groups for collaborative work Self-guided exercises spaced out through the presentations Easy to ask the helpers for help Use of open source software Test-driven development Online lecture content to back up teaching Lots of breaks Good course description Inclusion of general advice for coding (as opposed to specific syntax) SQL Felt like we ran out of time at end of first day Would have liked more about testing Cygwin Sometimes problem material got in the way of the subject (more time worrying about overlapping rectangles than how to program a test) No handouts, and screens difficult to read as forgotten my glasses Should have introduced Python lists and other structures earlier (presenters forgot to do this before using them in an exercise!) Not enough window real-estate Couldn't always follow material before it disappeared off screen Presenters sometimes forgot we were not necessarily interested in software engineering Pace too intense for non-expert programmers Interrupted by fire alarm Coloured notes would have worked better in the other order (that is, holding up "not OK" first — didn't always dare if everyone else had just held up "OK") More use of microphones Went a bit fast Half the class was facing back wall! Would have liked some harder exercises More consistency of laptop presentation(i.e. always same laptop with same window layout) Shell scripting section a little easy Didn't always notice when a presenter had started typing, they should read it out More pointers to additional material online please Some exercises had too much literal typing (from a presenter) Would like to have improved the presentation of functions Read More ›

Solution to Indented List Problem
Greg Wilson / 2012-05-14
Last week's homework was to convert a two-level bullet-point list like this: * A * B * 1 * 2 * C * 3 into an HTML list like this: <ul> <li>A</li> <li>B <ul> <li>1</li> <li>2</li> </ul> </li> <li>C <ul> <li>3</li> </ul> </li> </ul> so it would display like this: A B 1 2 C 3 My solution is shown in the video below; the code follows. Converting an Indented List to HTML import sys def do_inner(lines, current): need_to_start = True need_to_close = False while (current < len(lines)) and \ lines[current].startswith(' * '): if need_to_start: print ' <ul>' need_to_start = False text = lines[current].lstrip(' * ').rstrip() print ' <li>' + text + '</li>' need_to_close = True current += 1 if need_to_close: print ' </ul>' return current def do_outer(lines): print '<ul>' current = 0 while current < len(lines): assert lines[current].startswith('* ') text = lines[current].lstrip('* ').rstrip() print '<li>' + text current = do_inner(lines, current+1) print '</li>' print '</ul>' lines = sys.stdin.readlines() do_outer(lines) Read More ›

Feedback from Michigan State
Greg Wilson / 2012-05-12
Our workshop at Michigan State University this week was three days long instead of two, and included two topics (Git and the IPython notebook) that we haven't tried before. Feedback was generally positive, but we've got lots to work on for next time as well. Good Bad Using history Ending with general theory Pen and paper database design Version control was useful Good practice in software Concise and module programming in Python Console segment Smooth flow between Bash and Python Challenging and flowed nicely Futher reading material Desktop setup Instructor teaching style Permission to spend less time coding iPython notebook looks great Paired programming model Git script (tutorials used were available) Legal issues (opensource) Good for beginners Free course (and food!) Variety Practical (reality-based) Overview of DB options Testing Better ways to do things Somewhat static seating created helpful partners Typing speed is too fast Class time chunks too long Why iPython Need more 'why' Curriculum Advanced Git bounced Too much switching screens Some things failed Beverages included only caffeine Need snacks at breaks Lacked connection between course material and applicability Tuesday way too long Wanted a cheat sheet Not enough exercises How to create DB Anti-Windows bigotry Next day install at end of day Some concepts skipped Don't know where to start (registrationg etc.) Inappropriate room size Breadth PPT for CS Read More ›

Run My Code
Greg Wilson / 2012-05-11
RunMyCode is a web site and service intended to support reproducible research (initially in computational economics). Authors create companion web sites for papers that include the software they used; other people can then re-run their models, and (crucially) play with parameters, using cloud-based instances of those environments. They only support MATLAB, R, and SAS right now, but are hoping to add more tools soon. It's a cool idea, and we'd welcome your impressions. Read More ›

Fish and Bugs
Greg Wilson / 2012-05-10
The May/June 2012 issue of Washington Monthly has an article by Alison Fairbrother titled "A Fish Story". Near the top, it says, "In 2009, a routine methodological upgrade at NOAA—and the subsequent discovery of a few lines of faulty computer code—forced the start of a profound shift in the ASMFC's estimates of menhaden stocks." A few pages later, we get more details: In 2009, the Menhaden Technical Committee updated its methodology for estimating the menhaden population—something it does every five years—and then ran the menhaden catch data through a new computer model. The results weren't much different: although the numbers of menhaden were declining, the estimated number of eggs produced by spawning female menhaden was at the target level, so according to the reference point, menhaden weren't being overfished. Shortly thereafter, a colleague of Jim Uphoff's, a biologist named Alexei Sharov, got hold of the computer model that had been updated by NOAA scientists. Going through the code line by line, Sharov, one of Maryland's representatives on the Technical Committee, found a fundamental miscalculation buried inside the model. Uphoff, meanwhile, studied the methodology of the code and discovered that NOAA had both underestimated the amount of fish killed by the industry and overestimated the spawning potential. Sharov brought these two mistakes to his peers on the committee, and it was agreed that corrections needed to be made. Several months later, after the model had finished running a second time, the science finally caught up with what Jim Price and the anglers had been saying for decades: even using the lax reference points developed by the ASMFC, menhaden had been subject to overfishing in thirty-two of the past fifty-four years. When the assessment was then peer reviewed by a group of international scientists, the reviewers deemed that the reference point currently in use for menhaden—8 percent of maximum spawning potential—was not sufficiently safe or precautionary. Furthermore, the number of menhaden swimming in the Atlantic had declined by 88 percent since 1983—to a level so low that it caused George Lapointe, former commissioner of Maine's Department of Marine Resources, to have what he called an "oh shit moment." If anyone knows more about the "fundamental miscalculation", I'd be grateful for a summary. Read More ›

Bootcamp in Boston, July 9-10
Greg Wilson / 2012-05-09
We are pleased to announce that we will be running a bootcamp on July 9 and 10 in Boston—please see its page for details (some of which we're still working out). We have room for 40 participants, so please register early. (And if you can, register with friends: we are finding that people get a lot more out of this training if they're learning with their labmates and other collaborators.) Read More ›

The Architecture of Open Source Applications: Volume 2
Greg Wilson / 2012-05-08
We are very pleased to announce that The Architecture of Open Source Applications: Volume 2 is now available from Lulu. A PDF version will go on sale in the next few days, and e-book will become available as soon as we can produce it. Many thanks to everyone who contributed, and to the indefatigable Amy Brown for pulling it all together. As always, all royalties will go directly to Amnesty International, so if you buy a copy, you'll be helping to make the world a better place. Read More ›

An Exercise With Functions and Plotting
Mike Hansen / 2012-05-06
Let's say you have a text file called workout.csv that contains information about your workouts for the month of March: # date, kind of workout, distance (miles), time (min) "2012, Mar-01", run, 2, 25 "2012, Mar-03", bike, 10, 55 "2012, Mar-06", bike, 5, 20 "2012, Mar-09", run, 3, 42 "2012, Mar-10", skateboarding, 2, 10 # Broke my leg :( "2012, Mar-11", Wii, 0, 60 "2012, Mar-12", Wii, 0, 60 "2012, Mar-13", Wii, 0, 60 "2012, Mar-14", Wii, 0, 60 It's a common-separated value (CSV) file, but contains comments and blank lines. The first line (a comment) describes the fields in this file, which are (from left to right) the date of your workout, the kind of workout, how many miles you traveled, and how many minutes you spent. Our goal will be to read this data into Python and plot a graph with the day of the month on the x-axis and the time worked out on the y-axis. Let's get started. 1. Reading To begin, let's read in the data file with Python's csv module. The code is fairly straightforward: import csv # Read in all rows from the csv file reader = file("workout.csv", "r") csv_reader = csv.reader(reader) # Print out rows for row in csv_reader: print row Saving this code to a file called plot_workouts.py and running python plot_workouts.py on the command-line produces the following output: ['# date', ' kind of workout', ' distance (miles)', ' time (min)'] ['2012, Mar-01', ' run', ' 2', ' 25'] ['2012, Mar-03', ' bike', ' 10', ' 55'] ['2012, Mar-06', ' bike', ' 5', ' 20'] ['2012, Mar-09', ' run', ' 3', ' 42'] ['2012, Mar-10', ' skateboarding', ' 2', ' 10'] [] ['# Broke my leg :('] [] ['2012, Mar-11', ' Wii', ' 0', ' 60'] ['2012, Mar-12', ' Wii', ' 0', ' 60'] ['2012, Mar-13', ' Wii', ' 0', ' 60'] ['2012, Mar-14', ' Wii', ' 0', ' 60'] Unfortunately, as we can see, Python's CSV reader doesn't filter out comments or blank lines. We'll need to do it manually. However, this is a common task that we might want to do again and again across programs. Let's write a function named filter_lines that will filter the lines in a file before the CSV reader does its thing. def filter_lines(reader) lines = [] for line in reader: if len(line.strip()) > 0 or not line.startswith("#"): lines.append(line) return lines This function will take a file reader and return a list of lines (excluding blank lines and comments). Let's make filter_lines a bit more readable by introducing a second function called keep_line: def keep_line(line): return len(line.strip()) > 0 or not line.startswith("#"): def filter_lines(reader) lines = [] for line in reader: if keep_line(line): lines.append(line) return lines This new code is easier to read. We can see that keep_line takes in a line and will return True when the line is not blank and not a comment. Here's the complete code so far: import csv #------------------------------------------------------------ def keep_line(line): return len(line.strip()) > 0 and not line.startswith("#") #------------------------------------------------------------ def filter_lines(reader): lines = [] for line in reader: if keep_line(line): lines.append(line) return lines #------------------------------------------------------------ reader = file("workout.csv", "r") lines = filter_lines(reader) csv_reader = csv.reader(lines) for row in csv_reader: print row Running this now produces the following output: ['2012, Mar-01', ' run', ' 2', ' 25'] ['2012, Mar-03', ' bike', ' 10', ' 55'] ['2012, Mar-06', ' bike', ' 5', ' 20'] ['2012, Mar-09', ' run', ' 3', ' 42'] ['2012, Mar-10', ' skateboarding', ' 2', ' 10'] ['2012, Mar-11', ' Wii', ' 0', ' 60'] ['2012, Mar-12', ' Wii', ' 0', ' 60'] ['2012, Mar-13', ' Wii', ' 0', ' 60'] ['2012, Mar-14', ' Wii', ' 0', ' 60'] Hooray! Our blank lines and comments are gone. Before moving on to parsing the data (converting it from text to dates, integers, etc.), let's take a moment to think about how Python is actually using our filter_lines and keep_line functions. For that, we need to understand the call stack 2. The Call Stack Python tracks which functions are currently being executed with a data structure named the call stack. When Python encounters a function call, like lines = filter_lines(reader), it "pushes" information about where to come back to and then jumps to the function's code. When a return statement is found (or when the function ends), Python "pops" information off call stack to remember where it was. This can be difficult to visualize. Below is a diagram of our program before and after the call to filter_lines. Python starts out in the "global" function whose code is just the main body of your program. When we call filter_lines with reader as a parameter, Python copies a reference to workout.csv into a new variable reader, makes a note that it should return to the global function, and jumps to the code for filter_lines. Each time we call keep_line inside filter_lines, Python saves its place on the call stack, copies a reference to line, jumps to keep_line, and jumps back to filter_lines when it's done. It's important to remember that the reader in the global function and the reader in filter_lines are two different variables. However, they point to the same file in memory, so reading from the file inside of filter_lines changes the file position of reader in the global function. Python copies things by reference instead of by value, which is very fast (it only needs to point the new variable at the right thing in memory). It can lead to confusion, however, if you don't expect a function to make changes to a parameter (e.g., trying to read data from reader after calling filter_lines produces nothing since we're at the end of the file). If you really need to, making copies is easy. A list named my_list, for example, can be copied simply by slicing the whole thing my_list[:]. With a picture of the call stack in our heads, let's move on to parsing our workout data. 3. Parsing Our workout data is stored as text. In order to process and plot it, we need to convert each field to its appropriate type (e.g., a date, an integer, etc.). Converting from text to integers or floating point numbers is easy; we can just call the int() or float() function. Our first field is a date, however, which requires a bit more work. Parsing dates can get hairy very quickly. Luckily, the hard work has already been done for us! Python includes module called datetime that handles parsing for dates and times (go figure ;)). 3.1 The datetime module The datetime module contains a class also called datetime. This class has a lot of useful functions for date manipulation as well as a function called strptime for parsing (think "string parse time" for strptime). datetime.strptime takes two parameters: (1) a date string like "2012, Mar-01", and (2) a format string that describes how to parse the date string. We make a format string by replacing the pieces of our date string with special format codes (which start with a '%'). For example, %Y stands for the "year with century as a decimal number", so we need to replace the year in our date string (2012) with %Y as such: "%Y, Mar-01". Next, we use the %b (abbreviated month name) and %d (day of the month) format codes to replace the remaining pieces ("Mar" and "01", respectively). Our final format string is "%Y, %b-%d". Note that we include the comma, space, and dash. Let's write a function called parse_workouts that will take in a list of CSV rows and produce a list of workouts (one for each row). Each workout will itself be a list with the parsed date, workout kind, distance, and time. The complete code is below. import csv from datetime import datetime #------------------------------------------------------------ def keep_line(line): return len(line.strip()) > 0 and not line.startswith("#") #------------------------------------------------------------ def filter_lines(reader): lines = [] for line in reader: if keep_line(line): lines.append(line) return lines #------------------------------------------------------------ def parse_workouts(rows): workouts = [] for row in rows: date = datetime.strptime(row[0], "%Y, %b-%d") kind = row[1].strip() distance = int(row[2]) time = int(row[3]) workouts.append([date, kind, distance, time]) return workouts #------------------------------------------------------------ reader = file("workout.csv", "r") lines = filter_lines(reader) csv_reader = csv.reader(lines) workouts = parse_workouts(csv_reader) for w in workouts: print w At the top, we import the datetime class from the datetime module using Python's from module import class import form. Our parse_workouts function loops over each CSV row, parses the individual fields, and packages them up as a single workout list. At the end, we print out our workouts. Running this code produces the following output: [datetime.datetime(2012, 3, 1, 0, 0), 'run', 2, 25] [datetime.datetime(2012, 3, 3, 0, 0), 'bike', 10, 55] [datetime.datetime(2012, 3, 6, 0, 0), 'bike', 5, 20] [datetime.datetime(2012, 3, 9, 0, 0), 'run', 3, 42] [datetime.datetime(2012, 3, 10, 0, 0), 'skateboarding', 2, 10] [datetime.datetime(2012, 3, 11, 0, 0), 'Wii', 0, 60] [datetime.datetime(2012, 3, 12, 0, 0), 'Wii', 0, 60] [datetime.datetime(2012, 3, 13, 0, 0), 'Wii', 0, 60] [datetime.datetime(2012, 3, 14, 0, 0), 'Wii', 0, 60] Each workout is a list whose first element is a datetime.datetime object. Python prints datetime objects as datetime.datetime(year, month, day, hour, second). The second element is the kind of workout (a string). The third and fourth elements are the workout distance and time, respectively (both integers). Everything is looking good, so let's prepare for plotting. We want to plot the day of the month on the x-axis and the time we worked out on the y-axis. We'll write two functions, one to extract the day of the month from each workout, and another to extract the time from each workout. import csv from datetime import datetime #------------------------------------------------------------ def keep_line(line): return len(line.strip()) > 0 and not line.startswith("#") #------------------------------------------------------------ def filter_lines(reader): lines = [] for line in reader: if keep_line(line): lines.append(line) return lines #------------------------------------------------------------ def parse_workouts(rows): workouts = [] for row in rows: date = datetime.strptime(row[0], "%Y, %b-%d") kind = row[1].strip() distance = int(row[2]) time = int(row[3]) workouts.append([date, kind, distance, time]) return workouts #------------------------------------------------------------ def extract_days(workouts): days = [] for w in workouts: date = w[0] days.append(date.day) return days #------------------------------------------------------------ def extract_times(workouts): times = [] for w in workouts: times.append(w[3]) return times #------------------------------------------------------------ reader = file("workout.csv", "r") lines = filter_lines(reader) csv_reader = csv.reader(lines) workouts = parse_workouts(csv_reader) days = extract_days(workouts) times = extract_times(workouts) print "Days:", days print "Times:", times In the extract_days function, we loop through each workout and append the day field of each datetime object onto a list. See the datetime documentation for a complete list of fields. extract_times is similar to extract_days, but grabs the fourth element of each workout list (the workout time) instead. Running the new code produces a list of days and workout times: Days: [1, 3, 6, 9, 10, 11, 12, 13, 14] Times: [25, 55, 20, 42, 10, 60, 60, 60, 60] We're now ready to start plotting. 4. Plotting There are many plotting libraries available for Python. For this tutorial, we'll stick with one of the most popular libraries based on MATLAB: matplotlib 4.1 Installing matplotlib matplotlib does not come with the standard Python installation. In addition, it depends on another library called numpy; which is also not included. The installing matplotlib page provides detailed instructions for installing matplotlib on Windows, Mac OS X, and Linux. Don't forget to download and install numpy as well. In order to choose the correct downloads, you need to know which version of Python you're running. At the command-line, run python --version (mine says Python 2.7.2+). The first two numbers (2.7 for me) will give you an idea of which matplotlib file to choose. On Windows, I downloaded the file named "matplotlib-1.1.0.win32-py2.7.exe" because I have Python 2.7 and a 32-bit installation of Python. The numpy downloads are named similarly; I downloaded "numpy-1.6.1-win32-superpack-python2.7.exe". Once everything is installed, you can check that it's working by running python and typing in the following code: from matplotlib import pyplot If no errors are printed, then you should be set. 4.2 Using matplotlib There are many, many functions in matplotlib. Our program will use the pyplot.plot function, which makes line and scatter plots. This function takes a list of x values, a list of y values, and some options like the line thickness and color. For now, we'll create a function called plot that will create a new figure, plot workout days vs. times, and then save the figure to a file. import csv from datetime import datetime from matplotlib import pyplot #------------------------------------------------------------ def keep_line(line): return len(line.strip()) > 0 and not line.startswith("#") #------------------------------------------------------------ def filter_lines(reader): lines = [] for line in reader: if keep_line(line): lines.append(line) return lines #------------------------------------------------------------ def parse_workouts(rows): workouts = [] for row in rows: date = datetime.strptime(row[0], "%Y, %b-%d") kind = row[1].strip() distance = int(row[2]) time = int(row[3]) workouts.append([date, kind, distance, time]) return workouts #------------------------------------------------------------ def extract_days(workouts): days = [] for w in workouts: date = w[0] days.append(date.day) return days #------------------------------------------------------------ def extract_times(workouts): times = [] for w in workouts: times.append(w[3]) return times #------------------------------------------------------------ def plot(days, times, filename): fig = pyplot.figure() pyplot.plot(days, times) pyplot.savefig(filename) #------------------------------------------------------------ reader = file("workout.csv", "r") lines = filter_lines(reader) csv_reader = csv.reader(lines) workouts = parse_workouts(csv_reader) days = extract_days(workouts) times = extract_times(workouts) plot(days, times, "workout_times.png") Running this code will create a new file named workout_times.png that looks like this: As you can see, matplotlib takes a "no frills" approach by default. We can spruce up our figure by adding a title, axes labels, a grid, and a "tick" on the x-axis for each day (instead of every other day). def plot(days, times, filename): fig = pyplot.figure() pyplot.title("Times I worked out in March") pyplot.xlabel("Day") pyplot.ylabel("Time (min)") pyplot.xticks(range(1, max(days)+1)) pyplot.grid() pyplot.plot(days, times, color="red", linewidth=2) pyplot.savefig(filename) A complete description of these pyplot functions is beyond the scope of this tutorial. A future tutorial will explore them in detail. For now, we suggest using the matplotlib gallery to get an idea of what each function does. With the changes above, workout_times.png is looking a lot nicer: That's all, folks! The complete code looks like this: import csv from datetime import datetime from matplotlib import pyplot #------------------------------------------------------------ def keep_line(line): return len(line.strip()) > 0 and not line.startswith("#") #------------------------------------------------------------ def filter_lines(reader): lines = [] for line in reader: if keep_line(line): lines.append(line) return lines #------------------------------------------------------------ def parse_workouts(rows): workouts = [] for row in rows: date = datetime.strptime(row[0], "%Y, %b-%d") kind = row[1].strip() distance = int(row[2]) time = int(row[3]) workouts.append([date, kind, distance, time]) return workouts #------------------------------------------------------------ def extract_days(workouts): days = [] for w in workouts: date = w[0] days.append(date.day) return days #------------------------------------------------------------ def extract_times(workouts): times = [] for w in workouts: times.append(w[3]) return times #------------------------------------------------------------ def plot(days, times, filename): fig = pyplot.figure() pyplot.title("Times I worked out in March") pyplot.xlabel("Day") pyplot.ylabel("Time (min)") pyplot.xticks(range(1, max(days)+1)) pyplot.grid() pyplot.plot(days, times, color="red", linewidth=2) pyplot.savefig(filename) #------------------------------------------------------------ reader = file("workout.csv", "r") lines = filter_lines(reader) csv_reader = csv.reader(lines) workouts = parse_workouts(csv_reader) days = extract_days(workouts) times = extract_times(workouts) plot(days, times, "workout_times.png") Read More ›

UCL Bootcamp: Version Control Wrap-Up
Chris Cannam / 2012-05-04
For the bootcamp at UCL, we tried using Mercurial (with EasyMercurial) instead of Subversion in the version control segment. You can see the plan for the segment on this EasyMercurial project page. Briefly, we opened with a few plain slides about the purpose of version control, followed by a hands-on example in three parts (working by yourself, working by yourself with an online remote repository, and working with others). We started at the beginning and got as far as "hg bisect", but did not cover branching. I was presenting the segment, so I'm not well placed to judge how effective it was as a learning experience. But I did make some notes. Command line or GUI? I started with the EasyMercurial GUI, made some (but in hindsight probably not enough) attempt to show how GUI operations corresponded to command-line operations, and dropped back to the command line for more esoteric ideas at the end (such as bisect). However, more of the attendees were familiar with the command line than we had expected, so it might have been simpler to use that for the basics. Getting from nowhere to "...and now we're using version control! see how easy that was" is a shorter process at the command line. Using a GUI introduces distracting complications, such as the need to switch windows—aggravated in this case by my finding I wasn't practised enough at doing the necessary switching cleanly when using projector resolutions and huge fonts. On the other hand, EasyMercurial has a better view of the history and clearer merging, and I certainly wouldn't want to introduce divergent simultaneous commits and merge conflict resolution without those. (In my daily work I also use the GUI to manage these activities, even though I'm instinctively a command-line user.) It seems counter-intuitive to say that the command line is better for basic stuff and the GUI better for advanced stuff, but perhaps that is the way it is. It would be interesting to have more feedback from others who were present. Peer-to-Peer or Master Repository? A common requirement is to share code between "my computer" and "the lab compute server" (implied for example in the "frequently asked question" at the end of the Software Carpentry version control lecture). We approached this by using a master repository stored on an external server, in this case Bitbucket. But since we have a distributed version-control system, we could have taken a peer-to-peer approach, pushing and pulling between peer repositories on the two machines directly. We could have done this for sharing changes between paired neighbours in the bootcamp, as well. Using a master repository has advantages such as redundancy, availability, and backups. It's probably better practice in a real lab, and it's also easier to use a known remote server in a setting like this where everyone has one machine in front of them with a dynamic IP address using an unknown network topology. On the other hand, there are situations in which it's useful to know that a peer-to-peer arrangement is possible, and I do wonder whether we should have demonstrated it in some way. One We Made Earlier? We didn't have any pre-prepared working material in the segment, just a few introductory slides followed by building a repository from scratch (with recipes in it). One suggestion was that we could provide a canned repository for people to start from to get them more quickly into "working together" mode. My fear is that this would be to miss one of the biggest strengths of a distributed version control system: how simple it is to start using version control on your own project, without technical support from anyone, and then to push a server with all your history intact whenever you feel like it. Delivery and Technical Bits The biggest practical problem we had was that the room was fairly big with no microphone, and I simply don't speak clearly and loudly enough to be heard by everyone at the back. I need to work on this! There were a few technical problems with software installation, some of which we should be able to solve before next time around (e.g. issues with the EasyMercurial package on some versions of Ubuntu or with installing from source on Fedora). There weren't all that many of these problems, but it doesn't take very many to bog down a workshop. On the upside, we didn't have any network problems during this segment (a common cause of trouble). I started out feeling a bit rushed, and although I had a plan and I largely followed it, I did forget to include a couple of planned excursions aimed at explaining more about which system we were going to learn and why, as well as some material about how often to commit, what material to include in a project under version control and so on. Generally I failed to provide as much context as I had intended to. However, the technical material did appear to be about the right amount for the time (and attention) available. One thing we did that proved a bit tricky to set up and awkward to do, but that I think seemed worthwhile and that I would like to try again, was to use two projectors and thus show "both sides" of a collaborative working process with two people making edits live at the same time. Again I'd like to hear what others think. Onwards I'll be running the version control segment at the Newcastle bootcamp as well, where I'll be trying to improve on some of the things that didn't go so well at UCL. This does mean it'd be great to have any more feedback sooner rather than later! My thanks to Ben Waugh for the organisation at UCL, Chris Harte for presentation suggestions beforehand, and Luis Figueira for running the second laptop during the session. Read More ›

The Good and the Bad of It
Neil Chue Hong / 2012-05-02
We've come to the end of a hot, slightly sweaty, two days of learning in UCL and have just done a straw poll of participants of the good and bad points of this bootcamp. Something that was trialled at UCL was asking participants to sign up in small (3-4 people) teams drawn from their local research group. In total, we had a little over 40 learners and 10 helpers. We also taught the version control portion using EasyMercurial (thanks to Chris Cannam from SoundSoftware). Here's what we learned... Good Beginner friendly Easy learning curve Now know how to use SQL Doesn't matter what programming language I used in the first place Version control for the win Taught functional programming rather than object-oriented Material backed up by anecdotes and evidence Good at giving thoughts rather than skills Can go back and share with other members of the group Killed my fear of version control Give tips on good programming practices Anything new was given with the why as well as the what Introduction to systematic testing Best balance of typing, testing and listening Good ratio of helpers to students Greg's digressions keep your mind active Learned feedback technique of using coloured sticky notes Test driven development Quoted Alan Turing (programs are just a type of data) Working in pairs Live coding (realtime typing) Being forced to use python Psychological aspects, and how it relates to programming Always good to reinforce the learning of things you've heard before Interaction with participants at workshops — both in workshop and in pub Website is a good resource Insight (for a sysadmin) of how the researchers I support work Bad Can't afford to buy all the books referenced Too early a start [9am start both days in central London] Too much to take in Not enough functional programming Lower level than group is used to Need coffee at the start of the morning [which we didn't have on day 2] Problems with the wireless network Need biscuits in the afternoon No handouts [though SWC does this because best practice suggests you shouldn't give out handouts at the start] Lack of air conditioning Too short (not enough days) [SWC experience shows that 5 days F2F didn't work, will have online followups] Not clear when you should be furiously catching up on typing Greg's too fast Helpers not physically interspersed amongst tutors Room too small Not enough desks / table arrangements Couldn't find nose on list of required software No documentation on how to get nose running on a windows system No toilets / running water on the same floor [we broke the toilets and the lift!] Microphone would have been good Recommending Cygwin rather than VMware or Virtualbox [but people want to come out with a working setup...] Implied endorsement of bittorrent for illegal activities (plus smartass comments) What does this tell us? Probably that the teaching environment has a much bigger impact than you might expect, and that we should do all we can to fix it upstream. Also that Greg talks and types too fast! We're going to have plenty of chances to see how these good and bad points change as we go on to the next workshops — collecting the evidence so that we can understand how to teach people better. Read More ›

Better Across the Pond?
Neil Chue Hong / 2012-04-30
I'm sitting in a packed room helping out at the UCL bootcamp, the first of a series to be run in the United Kingdom (Newcastle is next, and there are plans to run additional workshops in Oxford, RAL, Glasgow, Edinburgh and Bristol). I'm thinking about two things: why would you design a 60 seater lecture theatre with only one window for ventilation and why is Software Carpentry so popular in the UK? Here are the statistics: the waiting list for the UCL bootcamp is 3x bigger than the capacity of the room; the (larger) Newcastle workshop is oversubscribed twice over; our main issue is making sure that tutors at bootcamps aren't double-booked. So why is Software Carpentry seeing such levels of interest? One thing I do know is that it isn't just the Greg Wilson effect — Greg is only teaching at this workshop, the other bootcamps are being led by people from the Software Sustainability Institute, SoundSoftware and the local institutions. Is there another explanation? I'm going to suggest three possible factors and would like to hear if there's evidence to support or contradict it, particularly if there are similar factors in other countries. The BBC Micro / Sinclair Spectrum effect Doctoral Training Centres Climategate The BBC Micro / Sinclair Spectrum effect Thirty years ago, two computers changed the course of the UK IT industry: the BBC Micro and the Sinclair ZX Spectrum. These machines, and the associated TV programmes and magazines, encouraged a generation to experiment with programming. Fast forward to 2012, and not only does the generation which grew up with PEEKs and POKEs form the technical backbone of the video games industry and companies like ARM, but that generation is bemoaning the lack of something similar in schools that has led to successive generations not having the basic understanding of software programming. Doctoral Training Centres Doctoral Training Centres were initially funded by the UK research councils to increase the research capacity in interdisciplinary research activities such as the life sciences interface and complexity science that are difficult to locate within a traditional University's departmental organisation. Increasingly, students at the centres receive training on specialist transferable skills which are applicable to their area, and many centres are teaching software development. This means that those researchers undertaking PhDs within these centres have had additional training over the rest. There are over 50 DTCs across the country and across many disciplines. Climategate When a server at the Climate Research Unit was hacked and thousands of emails and computer files were spread across the internet, the impact was not just in the exposure of poor working practices. They showed to the world the struggles of a scientist called "Harry" as he attempted to wrestle with difficult data analysis code. Now, even though there is no evidence that the code was producing incorrect results, the fact that it was difficult to prove the validity caused a crisis. In the UK ClimateGate wasn't just confined to the scientific press: national newspapers such as the Guardian, Telegraph, and even the somewhat sensationalist Daily Mail picked up the story. Suddenly scientists were all over the news, and it wasn't pretty. So my theory is that given a culture in the UK where we have a group of researchers who didn't get the same direct experience of programming at school that their research leaders did, are seeing the next generation of PhD students snapping at their heels, and who definitely don't want to be part of the next coding scandal have realised that it's good to go back to basics, to learn not just what or how to programme, but why we programme. And that is the ethos of Software Carpentry. Read More ›

Stop Me If You've Heard This One
Greg Wilson / 2012-04-28
I used to tell this joke: An engineer says, "Theory approximates reality." A mathematician says, "Reality approximates theory." A sociologist says, "Would you like fries with that?" Skip forward ten years. It was the early 2000s, just after the first dot-com bubble burst, and I started noticing that all of a sudden, programmers were taking design and graphic designers seriously. Overnight, it seemed, companies had started paying designers competitively and giving them real authority. Somehow, nerds like me who had made jokes about people with "soft" skills (and boy, isn't that term revealing) had come to realize just how valuable and difficult those skills were. Skip forward a few more years to 2010. After a lot of rear-guard defensive denial on my part, Jorge Aranda, Marian Petre, and others had finally convinced me that "soft" (qualitative) research techniques weren't just another way to explore how software development teams worked—in many cases, they were the best way. When I left the University of Toronto to work full-time on Software Carpentry, though, I didn't transfer that understanding to education. Instead, I only read things based on "hard" data, i.e., statistical results from controlled experiments. In retrospect, it's little wonder I was so frustrated by how little it helped me... Skip forward one more time to 2012. The web is abuzz with techies and business people explaining how they're going to fix education. What most of them actually mean is, "Here's how we're going to make money from education," but that's not what this post is about. What it's about is that they don't value educators professionally the way they value designers [1]. Just take a look at the ed-tech startups Audrey Watters has profiled in the last twelve months: how many have anyone in house who has spent even two full days boning up on the psychology of learning, the evidential basis for different instructional techniques, or the reasons why previous attempts to technologize classrooms have failed? I think that if we (and by "we", I mean programmers) really want to help people, we need to meet educators halfway. We need to learn as much about education as we now do about graphic design, business, marketing, and intellectual property law. I realize it's difficult—there aren't "serious amateur" books about education for techies like there are about graphic design [2]—but throwing questions from the Audrey Test into Google is a start. I'd certainly be a lot further ahead if I'd done that two years ago, and I suspect most ed-tech startups will be further ahead two years from now if they get started today. [1] This isn't an industry vs. academia thing: the software engineering researchers who described statistical work in Making Software didn't feel the need to defend the value of their methods in the way that the people doing qualitative work did. [2] At least, I haven't found any. Read More ›

Solution to Sets and Dictionaries Exercise
Greg Wilson / 2012-04-26
Last week, I posted an exercise on working with sets and dictionaries that also included a fair bit of file I/O and string manipulation. My solution is below, in four parts, along with the code produced in each. If someone would like to re-do the file parsing using regular expressions, I'd be happy to post that as well. import sys #-------------------- def parse_pair(pair): ''' Parse an atom-count pair. If the count is missing, assume that the count value is 1. ''' if '*' not in pair: return pair, 1 atom, count = pair.split('*') count = int(count) return atom, count #-------------------- def parse_molecule(text): ''' Get a single molecule description from a text string. ''' name, formula_text = text.split(':') name = name.strip() pairs = formula_text.strip().split('.') formula = {} for p in pairs: atom, count = parse_pair(p) assert atom not in formula, \ 'Already seen atom %s in text %s' % (atom, text) formula[atom] = count return name, formula #-------------------- def read_molecules(reader): ''' Read molecules from a molecule file, returning a dictionary of {name : formula} pairs. ''' result = {} for line in reader: line = line.strip() if (not line) or line.startswith('#'): continue name, formula = parse_molecule(line) assert name not in result, \ 'Already seen %s!' % name result[name] = formula return result #-------------------- print read_molecules(sys.stdin) Part 2 Software Carpentry Sets def merge(left, right): result = {} for key in left: # Only in left if key not in right: result[key] = left[key] # In both, so check that values are the same. else: if left[key] == right[key]: result[key] = left[key] for key in right: # Only in right. if key not in left: result[key] = right[key] return result Part 3 Software Carpentry Sets import sys from nano import read_molecules #-------------------- def can_produce(formulas, atom): ''' Return the set of molecules that contain the given atom. ''' result = set() for molecule in formulas: if atom in formulas[molecule]: result.add(molecule) return result #-------------------- if __name__ == '__main__': data = read_molecules(sys.stdin) atom = sys.argv[1] print can_produce(data, atom) Part 4 Software Carpentry Sets import sys from nano import read_molecules from merge import merge from produce import can_produce #-------------------- def get_data(filename): if len(filenames) == 0: data = read_molecules(sys.stdin) else: data = {} for f in filenames: reader = open(f, 'r') more_data = read_molecules(reader) reader.close() data = merge(data, more_data) return data #-------------------- if __name__ == '__main__': assert len(sys.argv) >= 2, 'Usage: final.py atom [files...]' atom_name = sys.argv[1] filenames = sys.argv[2:] data = get_data(filenames) makeable = can_produce(data, atom_name) makeable = list(makeable) makeable.sort() for m in makeable: print m Read More ›

An Exercise With Sets and Dictionaries
Greg Wilson / 2012-04-20
You are working for a nanotechnology company that prides itself on manufacturing some of the finest molecules in the world. Your job is to rewrite parts of their ordering system, which keeps track of what molecules they can actually make. Before trying this exercise, please review: Introduction Storage Dictionaries Examples Nanotech Example Submit your work by mailing Greg: your final program, the input file(s) you used to test it, and a shell script that runs all of your tests. 1. Reading Your company stores information about molecules in files that contain formulas and names, one per line, like this: # Molecular formulas and names # $Revision: 4738$ chlorine : Cl*2 silver nitrate: Ag.N.O*3 sodium chloride :Na.Cl More specifically: Lines may be blank (in which case they are ignored). Lines starting with '#' are comments (which are also ignored). Each line of actual data has a molecule name, a colon, and a molecular formula. There may or may not be spaces around the colon. Each formulas has one or more atom-count values, separated by '.' Each atom-count consist of an atomic symbols (which is either a single upper-case letter, or an upper-case letter followed by a lower-case letter) which may be followed by '*' and an integer greater than 1. If there is no count (i.e., if the '*' and integer are missing), the count is 1. Write a function called read_molecules which takes a handle to an open file as its only argument, reads everything from that file, and returns a dictionary containing all the formulas in that file. (Here, "a handle to an open file" means either sys.stdin, or the result of using open(filename, 'r') or file(filename, 'r') to open a file.) The result dictionary's keys should be the names of the molecules with leading and trailing whitespace removed. Its values should themselves be dictionaries of atomic symbols and counts. For example, if the data shown above is contained in the file molecules.mol, then this Python: reader = file('molecules.mol', 'r') data = read_molecules(reader) reader.close() print data should produce something like: { 'chlorine' : {'Cl' : 2}, 'silver nitrate' : {'Ag' : 1, 'N' : 1, 'O' : 3}, 'sodium chloride' : {'Na' : 1, 'Cl' : 1} } Note: if your tutorial group has already covered regular expressions, use them for this part of the exercise. If you have not yet met regular expressions, use string splitting instead. 2. Merging Write a function called merge_molecules that takes two dictionaries like the one shown above and produces a third dictionary that contains the contents of both according to the following rules: If a molecule appears in one input dictionary or the other, it also appears in the result. If a molecule appears in both input dictionaries with the same formula, one copy of it appears in the result. If a molecule appears in both input dictionaries with different formulas, it is not copied to the output dictionary at all. (This kind of "silent failure" is actually a really bad practice, but we won't see what we should do until we discuss exceptions.) Your function must not modify either of its input arguments: the original dictionaries must be left as they were. 3. What Can We Make? Write a function called can_produce that takes a dictionary of molecular formulas (like the one shown above) and the atomic symbol of one kind of atom, and returns a set containing the names of all the molecules we might be able to make. For example: reader = file('molecules.mol', 'r') data = read_molecules(reader) reader.close() print can_produce(data, 'Cl') should print something like: set(['chlorine', 'sodium chloride']) 4. Putting the Pieces Together Write a program called produce.py that uses these three functions to tell us the molecules we could make using a particular kind of atom based on the contents of several molecular formula files. For example: $ python produce.py Cl < molecules.mol prints: chlorine sodium chloride while: $ python produce.py Na salts.mol organics.mol alloys.mol reads and merges all the formulas in the three files salts.mol, organics.mol, and alloys.mol, and prints a list of all the molecules from those files that contain sodium. Read More ›

Three Years Later
Hanah Chapman / 2012-04-19
It's not putting it too strongly to say that Software Carpentry changed my life. That's where Software Carpentry came in. I had just about given up on learning to program, resigning myself to a lifetime of GUI-clicking and begging for help whenever more was needed. Then I got an email from a friend, forwarding some information about the course. Hallelujah! Finally, a course for beginners like me, which didn't assume any prior knowledge but also didn't talk down. The things I learned have opened countless doors for me. I can write my own code and understand code written by others, and I'm not afraid to learn new programming languages as needed. I even took a postdoc in a highly computational lab, which I would never have had the nerve to do if it weren't for Software Carpentry. Given my background and interests, I'll probably never be one of the really expert programmers in my field, but I don't need to be. I know more than enough to function independently, and more importantly, I know how to ask the experts for help, and I can understand their answers. Acquaintances will tell you that I've become a bit messianic about promoting training in computer programming to psychologists. I think too many people in my field have given up on the idea of learning, like I did. To them I say, if I can do it, so can you! So thank you to all the talented people who have put their energy into the course. The program has a made a huge difference to my career, and I would recommend it in a heartbeat to anyone who needs this kind of training. Read More ›

Where Next?
Greg Wilson / 2012-04-18
Do you know someone who'd be helped by a Software Carpentry bootcamp? Or can you introduce us to someone who'd help us organize one in a new venue? If so, please introduce us: we'd like to start planning for the fall, and we're always keen to make new friends. Read More ›

Behind the Scenes (or, the Ethics of Cultivating Discontent)
Greg Wilson / 2012-04-18
A lot goes on behind the scenes here at software-carpentry.org: The site itself is WordPress with a partly-customized theme. We use the blog for topics like this and pages (over a hundred of them) for lecture topics. We used to use Trac to manage work items, but nobody kept it up-to-date; these days, we use a WordPress to-do list plugin for the same purpose, and with as little result. Our videos are hosted on YouTube—we used to store them locally, but performance improved a lot when we offloaded. We manage our mailing lists and version control repositories through the Dreamhost control panel, which actually delegates mailing list management to Mailman. The calendar and map are hosted by Google. We do event registration through EventBrite. We currently use BlueJeans and Skype for web conferencing, but it's been plagued with both technical and social difficulties: people need to have the right Skype client for their OS, and there are the usual problems with unmuted microphones, unintelligible audio, feedback loops, and so on. Forget flying cars: I'll believe the future has arrived when we can make this work... This analysis leaves me feeling a bit conflicted. When I think about what we should teach researchers about the web, I have three requirements: They should be able to build solutions to problems they actually have. They shouldn't create egregious security holes. They should be able to debug things on their own when they go wrong. Since people can only debug things they understand [1], #3 depends on them understanding how the web works. One test of that is whether they recognize that they shouldn't have to log in and out of different sites in order to move information around manually. But if we don't have a solution to that problem (yet), are we really doing them a favor by pointing out that it actually doesn't have to hurt this much? [1] Tweaking code more or less randomly until it appears to work doesn't count as "debugging" in my book. Read More ›

In Search of Prior Arguments
Greg Wilson / 2012-04-17
A faculty member whose research involves building some fairly complex scientific software would like to make all his work open source. He is repeatedly having to justify this choice to funding agencies and his dean, whose objections include: concern for sensitive information being released (anything involving pollution has the potential to become a political football) concern for misuse by naive users undermining the reputation of the tool concern for missing value in licensing the IP I know lots of other people have had to overcome these and other objections; what I'm looking for is a published (and preferably peer-reviewed) refutation of them that he and others can cite, preferably one that is specific to open source in science (rather than in general). Pointers would be very welcome. Read More ›

Halfway Home
Greg Wilson / 2012-04-17
We're half-way through our current round of work, so it's time to start thinking about what we've accomplished, what we've learned, and what we'd like to do next. Here's what I think we now know: Our training makes scientists more productive. We can prove it. Our methods scale. We can become self-sustaining in 2-3 years. In more detail: 1. Our training makes scientists more productive. Feedback from learners has been overwhelmingly positive: they believe that what we're teaching is relevant and useful, and they're going to incorporate into their work. Where that isn't the case, it's usually because of a mis-match between their level and the level of the material we're teaching. As we scale up, we'll be able to address this by running separate workshops for people with different backgrounds. 2. We can prove it. By June, we will be able to show that: pre- and post-workshop questionnaires, in-depth interviews a few months after the end of training, and students recording videos of themselves solving simple tasks give us both qualitative and quantitative insight into the impact we're having, which in turn will allow us to back up our claim of improving productivity with more than just anecdotes. 3. Our methods scale. By "our methods", I mean: short on-site bootcamps... ...followed by a few weeks of hour-long online tutorials... ...and the assessment methods discussed above. By "scale", I mean: Katy Huff, Tommy Guy, Matt Davis, Joshua Smith, Jason Pell, Rosangela Canon-Koning, Adina Chuang Howe, and Chris Cannam are all traveling from A to B to teach in this round of workshops; Matt Davis and Steve Haddock have already run some of the online tutorials; lots of other people (grad students, profs, Mozillians, and assorted volunteers) are co-teaching locally; the Chicago, Newcastle, and Paris workshops have run or are going to run without me; and the assessments that Jorge Aranda is developing can be conducted by other people. In short, I am no longer the bottleneck I was two years ago. The key seems to be "attend one, help someone teach one, lead one yourself". Moving people around from site to site builds horizontal (peer-to-peer) connections, increases the value of the workshop in the eyes of both learners and hosts (someone from far away must be smarter than someone you know from the neighborhood :-), and is a much better way to transfer knowledge than any number of "how to" guides. 4. We can become self-sustaining in 2-3 years. The "attend/assist/lead" model is producing people who can and will organize local support groups similar to the Hacker Within. By 2014-15 we expect to have at least a dozen faculty partners who are able to contribute a few thousand dollars a year to instructors' travel costs, help update the web site, lobby to get students official recognition for taking part, and so on. Based on past experience with open source projects, that should be enough for Software Carpentry to take on a life of its own. Read More ›

GitHub for Education
Greg Wilson / 2012-04-17
In my experience, most teachers don't develop courses from scratch. Instead, they take whatever material is at hand, modify it to meet their needs, and then—well, that's usually where they stop. Unlike open source software developers, they usually don't give it back to the community in any explicit way. Instead, the next person who needs a starting point has to stumble over it in a Google search, and the original creator may never know that someone improved upon what they did. Back in December (of 2011), I wondered whether the fork, merge, and share model that underpins so much of modern open source software development could be applied to education. It turns out that lots of other people have been thinking along these lines (and that some of them have actually done something about it): Joseph Reagle: fork-merge-share (describing, among other things, a geek-oriented solution to one practical obstacle I mentioned) AJ Juliani: Why Every Educator Should Read Hacker News Aaron Shaw: A Modest Academic Fantasy Ethan Watrall: I want "forking a class" to become part of our vernacular (Twitter) Brian Croxall: Forking Your Syllabus (ProfHacker) Lincoln Mullen: How to Fork a Syllabus on GitHub (ProfHacker) Katherine Harris: Acknowledgments on Syllabi (tracking sources) Michael Feldstein: When It Comes to Content, Say "Yes" to Wrappers But "No" to Containers "GitHub for Education" is a handy shorthand for this idea, but "the idea" isn't necessarily, "Let's put educational materials in GitHub", but rather, "Let's facilitate a culture of spontaneous-but-structured collaboration and improvement." (You can see why we say "GitHub for education" instead.) I'd like to start experimenting with this, so my question is, if the source materials for Software Carpentry were on GitHub instead of in our own Subversion repository, would you actually start contributing patches? Read More ›

Utah State University Wrap-Up
Greg Wilson / 2012-04-16
Our bootcamp at Utah State University finished earlier today—many thanks once again to Ethan White and his friends for hosting us. Here's what the students thought: Good Bad keeping history break work into pieces integrating different things good for different levels of knowledge liked the pace good tech support from helpers download data from the web liked the escalation using up-to-date languages emphasis on commonality of patterns liked flexibility talked about formatting/style everything consistent Greg's stories hearing from multiple people organized around simple, general structural things liked structure around data processing machine/software setup hard to keep up with Greg's typing wind tunnel no beer assumed knowledge we didn't have less lecture, more time on examples/practicals rushed wide range of applicability wanted more complicated examples some things glossed over don't understand strengths/weaknesses of languages would have liked time to make better examples some places were "show" not "do" didn't cover testing Greg's stories couldn't tell if people were with us or not hard to keep track of what tool we were in coordination across people Next stop: London! Read More ›

Data Munging with Regular Expressions
Greg Wilson / 2012-04-15
Indiana U's Mike Hansen has written a blog post explaining how he used regular expressions to rename and fix some MATLAB files. Even if you don't use MATLAB, you should find lots of useful stuff in it. Thanks, Mike. Read More ›

We're Neutral (but Not Really)
Greg Wilson / 2012-04-14
From Wikipedia: Open science is the umbrella term of the movement to make scientific research, data and dissemination accessible to all levels of an inquiring society, amateur or professional. It encompasses practices such as publishing open research, campaigning for open access, encouraging scientists to practice open notebook science, and generally making it easier to publish and communicate scientific knowledge. The term reproducible research...refers to the idea that the ultimate product of research is the paper along with the full computational environment used to produce the results in the paper such as the code, data, etc. necessary for reproduction of the results and building upon the research. The two ideas aren't necessarily connected: putting research out in the open doesn't automatically make it reproducible, and someone can do the work required for 100% computational reproducibility without sharing it with the world. However, advocates of one often advocate the other as well, and they do seem like a natural pair. Computational competence is like that too. It's possible to do reproducible research without knowing how to program, what version control is, etc., but those skills make it a lot easier. Similarly, it's possible to adopt open notebook practices without understanding what's going on behind the curtain, but only if everything works properly every time: as soon as something doesn't, you need to understand the HTTP request cycle, what an API is, or how to write simple database queries (or have a labmate who does). Everything Software Carpentry teaches can be used to support opaque science, but I believe that lowering the practical barriers to adopting open, reproducible practices will speed their wider adoption. And in my opinion, that's a good thing... Later: in response to an emailed comment, yes, I do see an analogy with the invention of printing (and later, of the web). Gutenberg and others didn't set out to foment religious and political dissent, but by putting the means of mass communication in the hands of the masses, they made it easier for those with opinions to make those opinions known. Teaching researchers how to build things themselves doesn't necessarily mean the end of price gouging by big publishers, or more trustworthy computational science, but a world in which 50% of people can do something is going to be very different from one in which only 1% of them can do it. Read More ›

Video Update
Greg Wilson / 2012-04-12
Back in February, we asked people to make short screencasts of themselves solving a simple programming problem. The submissions convinced us that it's a good idea, but so many people ran into so many problems that we're taking a step back and trying to write better instructions, select a simpler (more stable, more reliable) set of tools, etc. Some people's screen recording software stopped recording at 10 minutes without giving any signal; others produced files that play properly on some platforms, but have no audio on others; and screen resolution problems have made text unreadable in several cases. I didn't think this would be so hard in the early 21st Century, but then, I say that a lot...We'll pick this up again as soon as we can; in the meantime, our thanks (again) to everyone who has contributed so far. Read More ›

Solution to Data Merging with Dictionaries
Greg Wilson / 2012-04-12
This week's tutorial problem was to merge the data from a set of input files to show how often different species were observed on different dates. The shell pipeline, Python code, and two sample input files follow the video. Data Merging with Dictionaries shell command grep -h -v '#' *.txt | sort | uniq -c merge.py import sys # Read and merge data. results = {} filenames = sys.argv[1:] for f in filenames: reader = file(f, 'r') for line in reader: if line.startswith('#'): pass else: date, species = line.split() key = (date, species) if key not in results: results[key] = 1 else: results[key] += 1 reader.close() # Format output. all_combos = results.keys() all_combos.sort() for key in all_combos: count = results[key] print count, key[0], key[1] cousteau.txt # Jacques Cousteau 2012-03-27 marlin 2012-03-29 tuna 2012-03-29 tuna 2012-03-29 turtle haddock.txt # Steve Haddock 2012-03-28 squid 2012-03-28 marlin 2012-03-28 marlin 2012-03-29 eel 2012-03-29 squid 2012-03-29 turtle 2012-03-29 turtle 2012-03-30 squid 2012-03-31 turtle Read More ›

Straw Man for Web Programming
Greg Wilson / 2012-04-10
Last week, I asked what we should teach researchers about the web. I think that I have an answer, and that the easiest way to describe it is by describing what we want learners to be able to build when we're done. So, imagine you are studying changes in rainfall due to climate change in North America. As part of that work, you're comparing results from your simulation with historical data from Environment Canada. Since your calculations may be useful to other scientists, you want to share them on the web. You are therefore going to build a command-line tool so that: rainy path-to-index.html start-date end-date location adds a new entry to your online results page by: reading data from the Environment Canada database, comparing those historical values to your predictions, and adding an entry to index.html showing the results. In order to do this, you will need to understand: how HTTP GET with query parameters works; how to pull things out of XML data (I believe that's what Environment Canada will give us—no sign of JSON on their site); and how to create HTML programmatically. More fundamentally, they'll see the fetch-remix-publish cycle that underpins so much of the web. Their tool won't be interactive—we won't try to turn it into a CGI script, because doing so would open up too many cans of worms—but I think we can actually do the above in half a day if people are already familiar with something like Python. Thoughts? Read More ›

A Future Student
Greg Wilson / 2012-04-09
I don't know what we'll be teaching in 20 years, but I hope Tristan Davis will be there to learn. Read More ›

Titus Brown Finds a Theme
Greg Wilson / 2012-04-06
Titus Brown just posted "Big Data Biology — why efficiency matters", in which he explains the academic, practical, and algorithmic reasons why efficient computation is good for science. His opinions are all grounded in his extensive personal experience working in the overlap between biology and computing, and, combined with a couple of things Michelle Levesque showed off at our Oakland workshop last week, have me thinking that we ought to include half an hour on performance analysis and tuning in the Software Carpentry core. Damn you, Titus—I thought I had this curriculum figured out :-) Titus's post has reminded me of something I've realized about big data. (Caveat: I've never done "big data" myself, just watched other people wrestle with it.) Titus and others don't think about individual bytes and records any more often than chemical engineers think about individual atoms. Instead, they think in terms like "percentage yield for such and such parameters" and "cost-yield tradeoff for such and such a process". Yes, they can relate their rules back to their respective atoms, but that's like saying (to switch analogies for a moment) that a physicist studying fluid mechanics can relate the Navier-Stokes equations back to quantum mechanics. Read More ›

On Crossing Australia (or, Further Thoughts on What to Teach Researchers about the Web)
Greg Wilson / 2012-04-06
A while back, I blogged about Bret Victor's "Inventing on Principle" talk at CUSEC'12, which is an inspiring vision of what programming could be. Ned Gulley's thoughts on it are more nuanced (and more interesting): Victor's real power is his ability to rapidly create and deploy these tools. In a twinkling he can size up a task that is worth studying, put a box around it and spin a tool. He does this so effortlessly, with such mesmerizing legerdemain, that we lose sight of this meta-skill. What Victor was really doing in his talk was illustrating the power of tool spinning, the rapid creation of customized, context-sensitive, insight-generating tools... Don't use the thing Bret made. Do the thing that Bret does. That last sentences crystallizes what I've been groping toward as I think about what we should teach researchers about the web. I want to give them the power to do whatever they want to, as effortlessly as possible, because (a) I can't anticipate what they'll actually need, and (b) what they need is changing all the time. The problem is that there's a wide gulf between simple things that are easy to do (e.g., tweaking CSS) and/or playing in a sandbox that we create (e.g., pointing an in-browser visualization tool at their own data) and being able to throw together something that we didn't anticipate (e.g., build even a small web app using jQuery and Django, or any comparable set of tools). It's sort of like Australia: lush and green in Sydney and Perth, but there's a loooooong trek through an inhospitable desert to get from hither to yon. By comparison, desktop programming (e.g., the media-based curriculum that Mark Guzdial and others developed) is a lot more like hiking from Halifax to Toronto: there are certainly difficult patches, but at no point do you find yourself stuck in the middle of nowhere [1]. That desert in web programming—the long, unrewarding gulf between the simple and the powerful—is what makes this hard. "But wait," I hear you say, "What is this 'gulf' you speak of? It only takes a few minutes to show someone how to write a simple CGI script, or to tweak some PHP to modify a WordPress plugin." Well, yes, but that's like saying that it only takes a few minutes to show someone how to start a car and get it out on the road. It's what we have to teach people so that they can survive what happens next that takes time. As I said in my earlier post, all we can teach people about server-side programming in a few hours is how to create security holes. You or I would know to scrub input before using it in a database query; we could tell novices to do that, but they wouldn't have the context to understand what that really meant, or how to do it properly. The best we could hope for is that they'd memorize a few rules, some of which some of them would then mis-apply. I was actually a bit surprised that Mark Guzdial was surprised by me including security in my list. If I give a teenager the keys to my car [2] and let him take it out on the road without suitable instruction, I think I'm morally liable for the resulting catastrophe. Similarly, as long as we were only teaching people how to build things that ran in the safety and privacy of their own machines, the worst they could do was delete their home directory (been there, done that—twice). But the web has changed this, just as it's changed everything else. So how do we close this gap? Partly, I think, by making things easier to drive: as Audrey Watters reminded us in summing up the research she did for Mozilla, tools like HyperCard put most of the power of "real" tools in the hands of end-user programmers: educators, graphic artists, businesspeople, and everyone else who doesn't think of themselves as a software developer but wants to make something happen. Yahoo! Pipes, If This Then That, and things like them aren't nearly as powerful, yet, but I think they're more interesting than any number of "live coding in the browser" tools that still expect you to be typing Javascript into a text box. Which brings me back to Mark Guzdial's post, and to his discussion of Juha Sorva's wonderful work on UUhistle (a program visualization tool): One of the arguments that [Juha]'s making is that the ability to understand computing in a transferable way requires the development of a mental model—an executable understanding of how the pieces of a program fit together in order to achieve some function. For example, you can't debug without a mental model of how the program works... Juha's dissertation is making the argument...that you can't develop a mental model of computing without learning to program. To which I would add, "And most people won't learn how to program unless each step lets them do something they care about, safely, that they couldn't do before." [1] OK, it's not the smoothest analogy I've ever come up with, but you get the picture. [2] I don't actually drive (haven't in 20 years), but I've never let the facts get in the way of making a point :-) Read More ›

Lessons Learned at the University Of Chicago
Katy Huff / 2012-04-05
Software Carpentry brought a bootcamp to the University of Chicago with collaboration from the FLASH Center at the University of Chicago's Computational Institute and The Hacker Within . The instructors were Milad Fatenejad, Katy Huff, Anthony Scopatz, and Joshua R. Smith. Space constraints at U. Chicago meant that only 50 empty seats could be secured for two contiguous days on campus. But, the room was lovely! Thus, though the first day of enrollment brought 125 requests for tickets, only 50 could be invited. Despite the valiant efforts of Anthony Scopatz and the FLASH center administrators, no extra space could be found. The ubiquitous Anthony Scopatz, organizer extraordinaire, insisted a few days beforehand that accepted students unable to attend step aside to allow tickets to be granted to students on the wait list. Unfortunately, there were 20 no-shows nonetheless. The discouraging lesson from this is that maximal attendance is not guaranteed even when demand is not the constraint. Encouraging lessons were learned too. The bootcamp was taught on virtual machines, a favorite tactic of Hacker Within bootcamps, which nearly eradicated technical difficulties. Since all students were following along in identical linux environments customized for the bootcamp, initial set up took less than half an hour, and there were no interruptions thereafter. Feedback from the (mostly post-doc and grad student) attendees included comments echoing common themes: "The interactive portions were the best part." "Git is something I had never seen before and looks like it will be very useful." "It would have been better if it were on a weekend." (grad students...) "I expected to get started on Python and learn formal things about programming. The bootcamp provided both." Excellent reviews, if we do say so ourselves! Read More ›

Solution to Data Checking Problem
Greg Wilson / 2012-04-04
I finally had a chance this morning to record my solution to the final exercise I set the learners from the Space Telescope Science Institute. It demonstrates how to build a little program in Python that checks the consistency of some experimental data; along the way, it uses file I/O, functions, assert statements, lists of lists, and on-the-fly unpacking. As always, feedback on the content and format would be greatly appreciated. (And if anyone would like to post their own solution, either as plain code or as a video, please let me know so that I can include it here.) Solution to Helmet Data Checking Problem import sys #-------------------- def get_reader(args): '''Select either standard input or a file to read from.''' if len(args) == 1: reader = sys.stdin elif len(args) == 2: reader = file(sys.argv[1], 'r') else: print >> sys.stderr, "Usage: check.py [filename]" return reader #-------------------- def get_data(reader): '''Read a helmet test data file. We expect the date on the first line, a title line second, and then one triple for each experiment.''' reader.readline() # should be the date reader.readline() # should be the title line # Each other line must be (degrees, seconds, brittleness) record data = [] for line in reader: fields = line.split() assert len(fields) == 3, "Bad line: %s" % line deg = float(fields[0]) sec = float(fields[1]) brit = float(fields[2]) record = [deg, sec, brit] data.append(record) return data #-------------------- def report_error(left, right): '''Report inconsistent data.''' print left, 'is inconsistent with', right #-------------------- def check(data): '''Check that records obey consistency rules.''' for record_1 in data: deg_1, sec_1, brit_1 = record_1 for record_2 in data: deg_2, sec_2, brit_2 = record_2 if (deg_1 == deg_2) and \ (sec_1 > sec_2) and \ (brit_1 < brit_2): report_error(record_1, record_2) if (deg_1 > deg_2) and \ (sec_1 == sec_2) and \ (brit_1 < brit_2): report_error(record_1, record_2) #-------------------- reader = get_reader(sys.argv) data = get_data(reader) check(data) reader.close() Read More ›

Upcoming Events for Webmaking Instructors
Greg Wilson / 2012-04-03
Software Carpentry has had several homes over the years. Right now, it's part of a larger effort by the Mozilla Foundation to teach people—all kinds of people—how to make the web their own. And as part of that, Mozilla is running events over the next couple of months in Boston, San Francisco, Toronto, and London to bring together people who are trying to teach this stuff. The details are on Michelle Levesque's blog; we hope you'll be able to join us. Read More ›

Solution to the First Image Processing Homework
Greg Wilson / 2012-04-03
We had technical issues during yesterday's online tutorial (again), so I have recorded the solution to the image processing exercise we gave the students in Indiana and Toronto. I'd be very grateful for feedback (not just from students): is this a useful way to present ideas? Or should we put our effort into debugging [name of web conferencing software goes here]? Solution to First Image Processing Homework My final code is: import sys from PIL import Image #-------------------- def on_boundary(temp, xsize, ysize, func): for x in range(xsize): for y in (0, ysize-1): r, g, b = data[x, y] data[x, y] = func(r, g, b) for y in range(ysize): for x in (0, xsize-1): r, g, b = data[x, y] data[x, y] = func(r, g, b) #-------------------- def for_each_pixel(temp, xsize, ysize, func): for x in range(xsize): for y in range(ysize): r, g, b = data[x, y] data[x, y] = func(r, g, b) #-------------------- def halve_red(r, g, b): return r/2, g, b #-------------------- def double_green_blue(r, g, b): return r, 2*g, 2*b #-------------------- def set_black(r, g, b): return 0, 0, 0 #-------------------- assert len(sys.argv) == 4, \ "Usage: program operation infile outfile" operation_name = sys.argv[1] input_filename = sys.argv[2] output_filename = sys.argv[3] picture = Image.open(input_filename) xsize, ysize = picture.size data = picture.load() if operation_name == 'halve_red': for_each_pixel(data, xsize, ysize, halve_red) elif operation_name == 'double_green_blue': for_each_pixel(data, xsize, ysize, double_green_blue) elif operation_name == 'erase': for_each_pixel(data, xsize, ysize, set_black) else: assert False, \ "Unknown operation: " + operation_name picture.save(output_filename) Later: Indiana U's Michael Hansen has posted an alternative solution that goes into more detail. Thanks! Read More ›

A Four-Day Curriculum
Greg Wilson / 2012-04-03
In response to my weekend post about what we teach in two days, Steve Haddock sent me this link to a four-day course based on the excellent book Practical Computing for Biologists that he co-wrote with Casey Dunn. It's worth reading and pondering in full, but in brief, their curriculum (in half-day chunks) is: Installing software and working with text editors. Using a command-line shell to create pipelines. Python. More Python. Scientific Python (and a bit on graphics). More on Python graphics, bioinformatics, and generating web pages with Sphinx. Relational databases. Images. The delight is in the details, of course, but the three things that jump out are (a) having half a day at the start to get things set up, (b) the domain-specific material (scientific Python, bioinformatics, making graphs, etc.), and (c) the absence of version control. What doesn't show up here, but was clear from Steve's email, is that they also spend much more time doing hands-on exercises than we do in a two-day course. If we cut out databases, so that we were only covering the shell, (very) basic Python, and version control, we could do more hands-on work, but I don't know if narrower-but-deeper is a good trade-off or not. As I said two years ago in a slightly different context, the problem is never figuring out what should be in a course; the problem is always to figure out what should be left out. Read More ›

What to Teach Researchers About the Web
Greg Wilson / 2012-04-01
One reason I'm reflecting on what I've learned in the last two years is a question that is back on the top of my work pile: what should we teach researchers about the web? Partly, it's a priority because I'm currently embedded in Mozilla; their mandate is to defend and extend the open web, and their educational efforts are all aimed at that, so I ought to be doing something too. The real reason, though, is that a lot of things have brought this into sharper focus recently: Audrey Watters' investigation of what and how to teach people about webmaking (summarized in this short talk and the Audrey Test). Mark Guzdial's commentary on getting the level right (and everything else he's been writing for the last year). Jon Udell's "Awakened Grains of Sand" and "Tags for Democracy" posts (and everything else he has been writing for the last year too). Michelle Levesque's thoughts on what Mozilla should teach. Here's what (I think) I've figured out so far: People want to solve real problems with real tools. Styling HTML5 pages with CSS and making them interactive with Javascript aren't core needs for researchers. All we can teach people about server-side programming in a few hours is how to create security holes, even if we use modern frameworks. People must be able to debug what they build. If they can't, they won't be able to apply their knowledge to similar problems on their own. Jon Udell has summed up the big ideas they ought to know. In concrete terms, we want them to understand how to construct (and deconstruct) URLs; how an HTTP request/response is processed; pass by value vs pass by reference, push vs. pull, structured vs. unstructured data; and how a few common security problems arise. So what can we teach people that meets these goals, and respects our constraints? Visualize this: plug an interactive Javascript visualization engine into a web page, show them how to put their data somewhere accessible, and voila: interactive data exploration on the web. This would be fun, but it would fail our debuggability/reproducibility requirement. OpenDAP is a framework for sharing the kind of grid-based data that's common in the earth sciences. Setting up a server would be out of reach, but formatting query URLs to pull data from public servers would be within reach, and we could easily run such a server on our site to provide a stable target. My concerns are (a) it's only showing learners half of the equation, and (b) it's not directly relevant to people in genomics and other fields. Kynetx (as described in Phil Windley's book The Live Web) is a framework for handling event streams. It's very cool, but it's still very young, and I don't know any scientists who are using it. Read dynamic, write static: download data from several sites, merge it, and produce some static HTML pages that other people can then download and merge. This is a common pattern in real life (especially when run periodically by cron), and with a little bit more work, we can show people that they only need to download things that have changed. On the downside, it's not really dynamic or interactive, and I want people to see that the web is more than just a bunch of pipes that deliver documents. Read More ›

Sending Email Back in Time
Greg Wilson / 2012-04-01
We're about to release the second volume of The Architecture of Open Source Applications, which has indirectly prompted a bit of soul-searching on my part. When we invited people to contribute, we asked them to give us the one-hour whiteboard talk they'd give to a new developer being brought onto the project. We also asked them to sum up what they'd learned. "Imagine you could send a brief email back in time," we told them. "What would you say to your younger self?" Which of course begs the question, what would I tell the me of 2010 about Software Carpentry? Write less, experiment more. I translated all of the Version 3 content into short videos in a ten-month rush. I should have selected a smaller core, put that online, then tried out more ways of using it. Take some online classes. I signed up for one online course (on online education) through a local university. It was truly awful, so I dropped out. I should have signed up for half a dozen others, of different kinds, to find out first-hand what other people are doing. (My first question these days for anyone involved in online education is, "How many classes have you done yourself?" The most common answer by far is, "None.") Teach with, not to. In April 2011, when work on Version 4 wrapped up, I only knew of a couple of groups using Software Carpentry material independently, and we weren't coordinated at all. Things are much better this time around: the workshops in Trieste, Chicago, Newcastle, and Paris have run or are being run without me on site, and I'm confident that workshops will run in future at STScI, Indiana, MBARI, and the Bay area without me as well. The key isn't just to recruit co-instructors; it's to move them around to meet each other to build a real peer network. So how about you? If you're reading this, you've probably either used this material, taught it, or both. If you could send email back in time to April 2010, what would you tell yourself? Read More ›

Wrapping Up in Oakland
Greg Wilson / 2012-03-30
We've wrapped up the workshop in Oakland for folks from NERSC, Berkeley, and Stanford. More later (when I'm home and have slept), but here's the students' feedback. Many thanks to Shreyas Cholia for organizing the workshop, to Michelle Levesque for helping to teach it, and to Elango Cheran and Jorge Aranda for helping out. Good Bad software strategy Python intro Subversion good! anecdotes (no, really) speed of typing easy to follow connected whole course to pipeline model liked emphasis on programming hygiene liked worked examples incorporated pedagogy agnostic about languages programming philosophy live coding (not PowerPoint) liked learning about SVN learned that some tools exist humor online resources similar environments engagement on first day whole atmosphere of class relating dev to science having people around learning terminology/theory comprehensive set of tools big picture view four topics picked were good free! room was too warm Greg didn't say SVN has branching need more use cases from audience too short Michelle types too fast first three hours slow first three hours too fast Python: too much basic software discussion needed more time to talk to neighbors mention the next step more depth enough rope to hang themselves too few interactive examples room was too small, too more MATLAB advanced students could come later more examples to work on our own engagement on second day more instruction about setup/install hands-on with HDF5 etc. too fast too diverse levels already knew lots of this didn't talk about Silicon Valley wanted a before+after test healthier snacks Read More ›

What We Teach in Two Days
Greg Wilson / 2012-03-30
This week's workshops at MBARI and NERSC both had more lecturing and less hands-on practical work than either I or the students would have liked, but when we're trying to squeeze so many things into two days, that's probably unavoidable. We hope that the online tutorials we're going to run over the next few weeks will make up for that by giving learners a chance to practice their skills at leisure. On the other hand, I'm quite pleased with the topics and sequence: I think we did a pretty good job of explaining how to do scientific data processing, and how the pieces fit together. Here's what we covered: The morning of day 1 is the Unix shell. After ls, cd, mkdir, rm, mv, and an editor [1], we introduced text filters like head, tail, wc, sort, uniq, and cut so that we could teach pipes and redirection. We then spend the middle of the morning on the Unix philosophy of "little pieces loosely joined", and wrap up by showing them how to save commands in files (to re-execute), and how to use for-loops to run their data pipelines once for each source file. We also talk about repeating commands with up-arrow or !123, and about using history | tail -whatever > the-steps-i-used.txt to keep a record of how they produced results. The afternoon of day 1 is a quick (and unfortunately shallow) introduction to Python. "Open a file, for-loop over the lines, convert them from strings to floats, add 'em up, and print the total" is the first hour's goal; once they've got that, we cover if statements, command-line arguments, and standard input and output, so that by the end of the afternoon they are building little tools of their own that play nicely in a Unix pipeline. (For example, Michelle Levesque had the NERSC students implement very simple versions of head and cut.) We close off by showing them how to factor repeated code into functions, and how to put those functions into files of their own so that they can be re-used in several different tools. We tell them (but don't actually show them) that all of these ideas apply equally well to R, MATLAB, Perl, or whatever else they want to use; we also point out things like using sensible variable names, breaking code into digestible lumps, and other transferable bits of programming hygiene. The morning of day 2 is version control [2]. We start with the introduction that's on the web site, which is the only time we use slides (everything else is live coding), then walk them through the update-merge-edit-commit cycle. We also show them how to use svn status, svn log, svn blame, and svn revert, but do not actually show them how to merge things (either across branches or from old revisions to new ones): based on past experience, that's a step too far for an introductory lecture. What we do instead is show them how to use keyword expansion to put the revision numbers of files into the files themselves, so that they can start tracking data provenance with just a few extra lines of code in their pipelines. This is the capstone of the "how to program" part of the bootcamp. The afternoon of day 2 introduces the basics of SQL: filtering, aggregation (but not group by), simple joins, NULL if there's time (and my voice hasn't run out), insert and delete, and then how to put SQL in a Python program. Again, we emphasize that the ideas transfer to other languages, and how database queries can and should be thought of as just another stage in a pipeline. The big idea that ties all of this together isn't actually the Unix philosophy; it's that programming is a human activity.: Short-term memory can only hold so much at a time, so build things to fit into it. We're most productive when we're not being interrupted (or interrupting ourselves), so use tools that support an interactive do-and-see flow. People are fallible, so make defense in depth a habit (i.e., check your data, figure out how to test things before you write them, run regression tests, etc.). So that's what we do. I think it works well—I'd enjoy hearing everyone else's thoughts. [1] If learners already use a plain-text editor, we enocurage them to keep using that; otherwise, we show them Nano, not because anyone should actually use it for programming, but because it's so simple that we don't really have to explain anything more than "control-X to exit". [2] Unless Dreamhost has screwed up creation of a temporary Subversion repository for students to use, in which case some last-minute juggling is required. Read More ›

Maintaining Momentum
Greg Wilson / 2012-03-30
For a variety of reasons (which is my way of saying "I don't know why" :-), Software Carpentry has proven really popular in the UK. We have close to 200 people signed up for 40 seats at our London workshop, and over 160 for 40 seats at Newcastle, with more coming in all the time. We also now have groups from three other universities, and a couple along disciplinary lines, asking if we can run workshops for them. However, the funding that has supported this round of work will run out at the end of June, and the earliest we could receive more support from the same source is mid-October. Our challenge now is therefore to find ways of supporting them in the short term so that momentum isn't lost. Here are some ideas; we'd welcome more. We're already planning to use the London workshop in part as a "train the trainers" event, i.e., to have 10-15 of the participants be people who already know how to do this stuff, and are attending to learn how to teach it. More volunteers, particularly open source developers who live and breathe this stuff, would help. We could try webcasting workshops so that more people could sit in. I don't actually think this will work very well—past experience shows that being in the room makes an enormous difference to the learning experience—but I thought I'd add it for completeness. Several people have asked us to write an instructor's guide explaining what we teach, in what order, and why. I'll do a one-page version as soon as I can, based on the MBARI and NSERC bootcamps, but realistically, this is 2-3 weeks of work, which pushes it over the end-of-June horizon. We could recruit more people to run the online follow-up tutorials we're running after each workshop, so that I can spend more time on meta stuff (like the instructor's guide). Personally, I think grad students who are thinking about academic careers should jump at the opportunity to learn how to teach online, because (a) that's what they're likely to have to do for the next thirty years, and (b) having some experience in doing it will make them look pretty shiny when they're applying for jobs. Any takers? We can push harder to try to get universities to offer Software Carpentry as a fully-resourced course (i.e., to pay someone to teach it during term, either for credit or otherwise). This is one of our long-term goals anyway, but runs into a chicken-and-egg problem: we have to demonstrate value in order to get resources, but need resources to demonstrate value. Other ideas? What could we do in the next two and a half months to ensure that things don't go on hold for the six months after that? Read More ›

Wrapping Up MBARI Workshop
Greg Wilson / 2012-03-28
Steve "Jellyfish" Haddock and Greg Wilson taught a two-day workshop at MBARI this week. It seemed to go well: feedback is below. Good Bad used specific, concrete examples hands on good breadth knowing the tools exist seeing the Power of Pipes seeing how to use version control for traceability insistence that prior experience wasn't necessary the Power of Python (!!!) liked the Power of Databases saw background of databases now understand how much more efficient is possible stories are entertaining starting from scratch helpful databases may not be completely useless liked having things online after day 1 great motivational speaker emphasis on commonality/interoperability liked provenance web site linked coding/learning introduced things to use *with* Python content was fairly conceptual clean code from the beginning drawing parallels between Python and MATLAB including version numbers (provenance) forced to learn my Mac being allowed to talk to neighbors "programming is meaningless" is good prepared pedagogy, not material I feel like I'm a better programmer great going over Unix stuff some things were too fast (esp. day 2) instructor wanders more practice time instructor types too fast wanted two screens at once switching between Python and shell confusing more time on switching between screens follow instructor or take notes but not both handout please! not enough hands-on practice not knowing how to get data into database can't keep track with multiple Python files didn't explain regular expressions lack of hands-on more data manipulation not enough time shorter, more frequent breaks download folder of examples can't do it all in two days confused with version control accelerated toward the end don't know how to use database in research advance materials show up early for setup say "meaningless" first sys.argv etc. not covered Read More ›

Bootcamp in Paris June 28-29, 2012
Greg Wilson / 2012-03-28
Nelle Varoquaux and others are organizing a Software Carpentry bootcamp at INRIA Paris, which will run June 28-29, 2012. We'll post more details as we have them, and look forward to seeing you there. Read More ›

Object-Oriented Programming in Fortran 2003
Greg Wilson / 2012-03-23
Damian Rouson is teaching a class in Berkeley March 26-28, 2012, and again April 10-12, on object-oriented programming in Fortran 2003. The March class is full, but there are still seats available for April. Fortran is still the workhorse of scientific computing, and recent dialects like 2003 have a lot of powerful features—it's well worth checking out. Read More ›

The Dark Matter of Computational Science
Greg Wilson / 2012-03-18
Scott Hanselman's recent post "Dark Matter Developers" got me thinking once again about what Software Carpentry is about. He says: [We] hypothesize that there is another kind of developer than the ones we meet all the time. We call them Dark Matter Developers. They don't read a lot of blogs, they never write blogs, they don't go to user groups, they don't tweet or facebook, and you don't often see them at large conferences... Where are [they]? Probably getting work done. The same is true of computing in the sciences. For everyone who shows up at Supercomputing or blogs about porting the eigenvalue package they wrote in Haskell to a GPU cluster, there are a hundred (or more) who are busy actually doing organic chemistry or neuropsychology. My question is, how do we reach them? Some ideas can be found in this recent talk by Jessica McKellar and Asheesh Laroia about how they increased both the size and diversity of the Boston Python user group: Diversity in practice: How the Boston Python User Group grew to 1700 people and over 15% women Read More ›

And While We're Stuck Here With 21 Seconds Worth of Music to Fill...
Greg Wilson / 2012-03-18
Scholarship in the Age of the Internetatron: Scholarship in the Age of the Internetatron Read More ›

Wrapping Up the STScI Course
Greg Wilson / 2012-03-16
The online portion of our work with learners at the Space Telescope Science Institute wound up today. 6 of the 14 people who took part in the on-site workshop submitted "graduation exercise" videos, and three more sent apologies (time pressure, technical difficulties, etc.), which I think is a pretty good completion rate. As always, we finished up by asking everyone to give us one good and one bad thing about the class: Good Bad Test-driven development Got some experience teaching in a new setting (from my on-site helper) Exercises incorporated a lot of ideas (saw things in context) Liked emphasis on "philosophical" parts: how to share work with colleagues, etc. Liked the way the exercises were structured (e.g., the hints) Useful to see how instructor worked through a problem from scratch Changed the way I think about coding Now breaking code into smaller chunks Course has made me more aware of how I program Now believe that good code should not need to be commented The stuff on SVN was very useful Needed to take personal time to do class Disappointed by the narrowness of topics Frustrated trying to get started (didn't have relevant background) Course repeated a lot of things I'd seen before N-body problem exercise was a biiiig jump Would have liked assignments earlier Time set for this meeting was inconvenient Wanted more tips and tricks, psychology of programming, etc. Some of the apparent contradictions in the "Bad" column reflect the diversity of learners' backgrounds; once again, I think this was the biggest challenge we faced. Overall, though, it was an interesting contrast to my experience running a class through P2PU. I've taught the content of Software Carpentry dozens of times over 14 years, so I had something immediate to draw on for the folks at STScI. On the other hand, I've never completed a course online myself, so I didn't have relevant personal experience to bring to bear at P2PU. I hope that by engaging local helpers in future bootcamp follow-ups, we'll help to grow a pool of people who can do that kind of teaching better. Read More ›

Thank You, Enthought
Greg Wilson / 2012-03-16
We are pleased to announce that Enthought has generously given us a grant to support some of the travel costs associated with our bootcamps. Along with hosting an earlier version of our material for several years and preparing a rock-solid Python distribution for scientific computing, this is one more good deed we'd like to thank them for. Enthought, Inc is delighted to provide funding to Software Carpentry to support educating scientists in basic computing skills. Enthought was founded with a mission to improve the way scientific computing is accomplished, which we do through our support of open source initiatives, applications we build for customers, and through a goal we share with Software Carpentry: teaching scientists computing skills that improve their productivity and advances their contributions to science. Read More ›

First Homework for Indiana Students (and a few from Ontario)
Greg Wilson / 2012-03-15
We ran our first online tutorial yesterday for students from our recent bootcamp in Indiana (plus a few from Toronto). Despite some technical difficulties with web conferencing, we managed to get through some basic image processing. The learners all now have homework for next week (shown below); considering that most of them have little prior programming experience, and only a half day of Python, it's pretty cool that they're able to do things like this. 1. Install the Python Imaging Library. Everything we'll need is in the documentation for the core module. 2. The picture.py program attached will either double the blue and green in an image, or cut the red in half, depending on how you run it: $ python picture.py double input.jpg output.jpg $ python picture.py half input.jpg output.jpg The actual work is done in the functions half_red and double_blue_green. There's a lot of redundant code in there—in fact, other than their names, those functions only differ by one half of one line. Have a look at our lessons to review how functions work in Python, and then at the first half of the lesson on object-oriented programming (you don't need to know about variable numbers of arguments for this exercise), and then try to rewrite picture.py so that the loop over the X and Y dimensions of the picture only appears once, in one function called adjust_colors. That function may not contain any if statements; instead, it should take four parameters: the X size of the image the Y size of the image the image data one other thing that you have to figure out 3. Write a new program called average.py that takes a single image filename as a command-line argument, and prints out three integers on a single line showing the average red, green, and blue values of the image's pixels. For example: $ python average.py maddie.jpg 112 83 85 4. Write another new program called exaggerate.py that takes two filenames as command-line arguments. The first is the name of an existing image; the second is the name of a new image that is to be created. The program sets the red, green, and blue values of each pixel to either 0 (the minimum possible) or 255 (the maximum possible) according to whether they are less than or greater than the average value for that color over the whole image. For example, using the average red-green-blue values from Question #3, if a pixel's original value is (10, 100, 200), its new value is (0, 255, 255), because there's less red than average, but more green and more blue. If your starting image is maddie-coy-flower.jpg (attached), your final image should be the exaggerated.jpg image (also attached). Read More ›

Where Next for the Next-Gen Course (and Software Carpentry)?
Greg Wilson / 2012-03-14
Titus Brown's next-generation sequencing course has been a great success: so great, in fact, that he's overwhelmed with applications for this year's run. That has made him think aloud (or rather, ablog) about where to take the course next. His options include: Maintain the status quo (but the effort/reward ratio for him isn't sustainable). Run it for pay (which would put it out of reach for most learners). Flip the course, using videos for training and live sessions for tutorials (see below for discussion). Franchise, i.e., train other people to deliver the course wherever they are (see below as well). Merge with Software Carpentry. I'm all in favor of the last option: I would really like to see Software Carpentry become a hub through which other people offer, find, and/or run "software for scientists" training (both live and online). In practice, this would mean: We handle advertising, signup, and other administrative and technical tasks (like getting web conferencing to work—why is this still so hard in the 21st Century?). We match instructors with groups of learners. We help train new instructors by pairing them with people who've already taught material before they have to fly solo. (We're doing this right now, and we'll know in six months how well it works.) We coordinate assessment, both so that we can improve what we're doing, and so that we can show potential funders what impact we're having. What do you think? Are you teaching something that we could fold into a larger effort? Would you be interested in helping to teach if we were handling the organizational details, and it was a chance to learn content and method from a more experienced instructor? And most importantly (at least for me, right now), do you have any thoughts about how we could better organize the collaborative elements of this? As many people have found in the past (and are now rediscovering), canned notes and recorded lectures are a lot less effective than peer-to-peer learning, which is inherently social. I think our experiments running online tutorials using desktop sharing work as well as they do because learners can hear and respond to each other's questions: instead of being one-to-many, it's many-to-many (albeit with a long tail distribution—I admit I do most of the talking). What if anything could we do to make more of that happen? What have you actually used that worked well? Read More ›

How We're Doing
Greg Wilson / 2012-03-14
The first online session for our Indiana University students will run tomorrow afternoon (assuming we can get web conferencing to work—why is this still so hard in 2012?). Meanwhile, our other workshops are filling up quickly: Site Start Ticketed Space Monterey Bay Aquarium Research Institute March 26 17 40 Lawrence Berkeley National Laboratory March 28 40 40 University of Chicago April 2 116 150 Utah State University April 14 26 50 University College London April 30 49 50 Michigan State University May 7 7 40 Newcastle University May 14 9 40 University of British Columbia May 22 40 40 and we are now discussing more workshops at other sites: Purdue, Oklahoma, Paris, Melbourne, and possibly ones in South America and Africa as well. If you'd like us to help organize one where you are, please get in touch. Later: that's the good news. The news from other fronts is unfortunately not as upbeat. We're stymied issuing badges to people who've completed basic training while we wait for upgrades to the Open Badges site; videos of people solving simple problems have only been trickling in (we have half a dozen, we're aiming for 20-30); and we seem to be chasing our tail with regards to a web-native slideshow tool. I hope the end-of-month status report will be more positive. Read More ›

Ask the CompuScienceGeek?
Greg Wilson / 2012-03-14
Titus Brown recently blogged a question about how to organize the files used in a computational science pipeline: by project, by paper, separately, or something else. Rather than answering that, I fastened onto a throw-away comment: I think we need an "ask the compusciencegeek" service... My earlier idea of having a reserved area on Stack Overflow for novice questions didn't lead anywhere, but I've enjoyed the Software Sustainability Institute's "Ask Steve" column. Should we start something similar here? Or direct people at Steve, and syndicate his answers here? What questions do you have about using computers in your research that we could answer? Read More ›

The Trieste Workshop, One Week Later
Tommy Guy / 2012-03-12
Katy Huff and I are back from Trieste, Italy, where we were instructors in the Advanced School for Scientific Computer at the ICTP. This was a different sort of workshop in many ways. First of all, it was 2 weeks long and the students were from all over the globe. Countries represented include Russia, Bangladesh, China, India, Pakistan, Albania, Iran, Palestine, Ghana, Nigeria, Chad, South Africa, Serbia, Romania, Ukraine, Argentina, Ecuador, and Colombia. The level of effort they showed was astounding: between our exercises and the project they brought from home, many students were in the computer labs until 11:00 at night. There are a few things that we've learned: "All in" works. Two things really set our students apart. First, they had all traveled a long distance to be at the workshop, which set the tone for the week. Second, the workshop required that each student bring a project from their own research so they could apply some of the ideas of the workshop right away. This was the most successful element of the workshop because it lowers the barrier to entry for new technology even further. But it requires a different sort of instruction. For instance, instruction didn't stop when the lecture ended. It carried through meals and often into the night. Students were in the lab until 11:00 or later, and that provided good time to help with one on one questions. While this style of teaching is more rewarding for the instructors (I thought!), it also requires a larger commitment in terms of time and energy than a 2 day workshop. We didn't have the 5 to 1 student to teacher ratio that we try to have for most workshops. I think we could have benefited from a few more instructors, particularly when we broke into individual projects. Fortunately, several of the students started pitching in to help introduce tools like Paraview that could help other students. More flexibility Another advantage to the longer, project oriented format was that each person came away with a new tool that uniquely fit their needs. For several people, this was Python and matplotlib. We had more than one person change their original project idea based on the material in the first week. For several people, valgrind seemed to be a tool they would continue to use. Several people also started using doxygen. The cool thing about this style of workshop is that valgrind and doxygen weren't even on the syllabus: they came out of recognized needs and were introduced either through special sessions or through one-on-one instruction. The big takeaway I hope to teach another workshop like this whenever one is available, but there are other lessons that we could apply to our 2 day workshops. First of all, the idea of a student project makes a lot of sense. It would be interesting to add a day to the workshops that is devoted to applying ideas in each student's project. Another idea, which we've kicked around before is a few "choose your own adventure" sessions to allow time to introduce tools to meet the needs of the specific audience. Special thinks to Graziano Giuliani, Antun Balaz, and Stephano Cozzini for organizing this workshop. This was one of the best experiences I've ever had, and I hope to do it again soon! Read More ›

The IPython Notebook
Greg Wilson / 2012-03-12
Titus Brown has created a short video showing what the IPython Notebook can do. The short answer is, a lot: it's an interactive notes-plus-graphics-plus-live-coding tool that runs in the browser. (The video doesn't really get going 'til the 2:50 mark, so you may want to skip ahead to then.) The question isn't whether we want to using this in Software Carpentry, but when. Installing it is (in my experience) still painful, but it holds tremendous potential. If we can get to the point where 90% of our intended audience can "just run it", and the other 10% need only a couple of minutes' help, we'll begin the switch. Now, if only we could figure out how to integrate the shell, version control, databases, spreadsheets, and everything else into this... :-) IPython Notebook Intro Screencast See also this longer screencast from Fernando Perez (its principal author): IPython in-depth: high-productivity interactive and parallel python Read More ›

What's the Model, Kenneth?
Greg Wilson / 2012-03-09
Over on rwxweb [1], Michelle Levesque has posted and dissected a diagram showing how various webbish skills depend on one another. It's an invaluable design aid, but it violates the third of Jon Udell's seven principles: it's presented as a bitmapped image, rather than as something a machine can easily parse and digest. I can't, for example, read the source and generate a two-column table showing "X depends on Y", or something like that. I wrote a bit back in January about what sort of data model we could use to represent things like that, but that's the wrong approach as well. What I really want is to discover that someone has already thought this through and created such a model, so that if I want to alter or extend Michelle's model, I can change the authoritative machine-readable representation [2], and she can merge (or at least diff) those changes. All I've found so far, though, are bits of database schema buried in things like Moodle and SCORM; there doesn't seem to be anything that a programmer would recognize as a model or format per se. Of course, what I really, really want is to figure out how to teach this idea and others like it to scientists and engineers in the 20-odd hours we have for this course... [1] I hope I've capitalized that correctly. [2] Of course, I want to make those changes using some sort of GUI rather than editing a blob of XML or JSON or [made-up name of random text format goes here]. If the world was working as it should, I'd be able to use any of several GUIs, depending on how I personally prefer to manipulate models of this kind. Read More ›

Our Indiana U Workshop Went Well
Greg Wilson / 2012-03-09
Our two-day bootcamp at Indiana U. on March 7 and 8 went well—while less than half of the people who registered actually showed up (more on that in a moment), everyone who did attend the two days seemed to get a lot out of it. Special thanks are due to Prof. Andy Lumsdaine (host), Jennifer Richards (organizer), and Ben Martin, Randy Heiland, Joe Cottam, Mike Hansen, DongInn Kim, and Purdue's own Jeff Shelton for helping out. At the end, we asked everyone to give us one good and one bad thing about the bootcamp: Good Bad Shell scripting Data provenance Tips on efficiency and coding The five-step cycle (update, write tests, make them pass, refactor, commit) Where to find more info Got started with basic Unix Good structure Comfortable environment Good examples Help team was awesome Website is useful Saw lots of new tools Examples were talked through Energetic instructor Good breadth Test-driven development Version control Enjoyed the stories (even though they weren't all true) Didn't get to web stuff Didn't get to look at code examples long enough Went too fast through Python Less support for Windows users Not enough exercises Not enough time to practice Didn't see how to create databases (only how to query them) Too much breadth Didn't see how to create a repository No reference guide/cheat sheet Didn't explain keyboard shortcuts Printed instruction would have helped (ESL) Free lunch is nice, but what about free breakfast? Some parts too advanced Too much typing: should be more pre-cooked data files for download Time was too limited Didn't show IPython Too many "drive-by" topics No capstone exercise to bring everything together Dove into databases too quickly I agree with most of what's in the second column (especially the comment about breakfast). What I want to fix first, though, is the high no-show rate that plagues any free event. One suggestion is to charge people $20 or so when they register, then refund it when they show up, so that only no-shows wind up paying anything. What would your reaction to that be? We'll start online tutorials with our Indiana students next week—like their predecessors from the Space Telescope Science Institute, they'll meet once a week to work through a few simple problems that are directly relevant to their research. Next up: the Monterey Bay Aquarium Research Institute and Lawrence Berkeley National Laboratory. Read More ›

Software Carpentry Meetup at PyCon
Greg Wilson / 2012-03-07
I'm not able to be at PyCon this year, but if Software Carpentry fans would like to get together, Shreyas Cholia is organizing a lunchtime meetup on Friday. Please drop him a line if you'd like to connect. Read More ›

I Resemble That Remark
Greg Wilson / 2012-03-07
Titus Brown's recent post "Top 12 Reasons You Know You are a Big Data Biologist" may resonate with some readers of this blog. I particularly like the idea (in the comments) of using links in email to drive garbage collection of data sets. Read More ›

Programs as Experimental Apparatus
Greg Wilson / 2012-03-05
Suppose you have two photographs of a patch of the night sky: and you want to know how different they are. The simplest way would be to see if any of the pixels' values differ, but that's pretty much guaranteed to return a "yes". A better measure is to see how many pixels differ by more than some threshold, but that raises two questions: how to measure the differences between pixels, and what threshold to use. To answer the first question, most images encode the red, green, and blue values of pixels separately, so we can add up the absolute values of the differences between those color values: d1 = abs(R — r) + abs(G — g) + abs(B — b) We could equally well add the color values for each pixel to get a total, then look at the differences between those: d2 = abs((R + G + B) — (r + g + b)) Does it matter which we choose? And either way, how big a difference should count as different? Since we're scientists, we can answer these questions experimentally. Here's a Python program that reads in an image, scales it down to half its original size, scales it back up, then calculates both difference measures. Its output is a histogram of how many pixels differ by how much according to the two measures, and how different the measures are from each other: import sys from PIL import Image # Load the original image. original = Image.open(sys.argv[1]) width, height = original.size original_data = original.load() # Create a duplicate by enlarging then shrinking the original. duplicate = original.resize((width/2, height/2)).resize((width, height)) duplicate_data = duplicate.load() # Count how many pixels differ by how much using two measures: # | (R+G+B) - (r+g+b) | # | R-r | + | G-g | + | B-b | overall = [0] * (3 * 255 + 1) individual = [0] * (3 * 255 + 1) for x in xrange(width): for y in xrange(height): o_r, o_g, o_b = original_data[x, y] d_r, d_g, d_b = duplicate_data[x, y] diff_o = abs((o_r + o_g + o_b) - (d_r + d_g + d_b)) overall[diff_o] += 1 diff_i = abs(o_r - d_r) + abs(o_g - d_g) + abs(o_b - d_b) individual[diff_i] += 1 # Display histogram. num = width * height print '%10s %10s %10s %10s %10s %10s' % \ ('diff', 'overall', '%age', 'individual', '%age', 'difference') for i in range(len(overall)): pct_o = 100.0 * float(overall[i]) / num pct_i = 100.0 * float(individual[i]) / num diff = abs(overall[i] - individual[i]) pct_d = 100.0 * float(diff) / num print '%10d %10d %10.1f %10d %10.1f %10.1f' % \ (i, overall[i], pct_o, individual[i], pct_i, pct_d) Its output looks like this: diff overall (%) individual (%) difference 0 130716 42.6 128472 41.8 0.7 1 6574 2.1 1746 0.6 1.6 2 9432 3.1 8430 2.7 0.3 3 63792 20.8 66992 21.8 1.0 4 4887 1.6 7037 2.3 0.7 5 4880 1.6 5692 1.9 0.3 6 29115 9.5 29622 9.6 0.2 7 3131 1.0 3485 1.1 0.1 8 2888 0.9 3175 1.0 0.1 9 13714 4.5 13907 4.5 0.1 10 1981 0.6 2097 0.7 0.0 11 1849 0.6 1955 0.6 0.0 12 7597 2.5 7665 2.5 0.0 13 1432 0.5 1467 0.5 0.0 14 1311 0.4 1341 0.4 0.0 15 4608 1.5 4641 1.5 0.0 The good news is that there isn't much difference between the counts for the two measures. However, it's hard to get a sense of what else is in this data. Time to visualize—let's plot the percentage of pixels that difference according to d1: The result isn't surprising: if our upsize-downsize algorithm didn't lose any information, we'd expect no differences at all. Since rescaling is lossy, though, we see that a lot of pixels differ by small values, and only a few by large values. But there's something else in our data that could easily be missed. Look at the first dozen entries in the table above; do you see a pattern? Let's plot the scores for multiples of three separately from the scores for differences that aren't even multiples of three: If we do the same thing for the whole data set, we get: A moment's thought produces a hypothesis: since we have three color channels (red, green, and blue), it's possible that the rescaling algorithm is introducing a systematic bias by perturbing each channel by one as it sizes up or down. Looking at the curves for differences up to 15, that bias seems to be responsible for most of the overall difference. If we really want to measure the differences between images, we're going to have to find a way to eliminate this. I'm not going to go into how we might do that now. What I want to point out is that this is not a new problem. Think about the telescope that took the picture we started with. Glass lenses are subject to chromatic aberration; telescope designers must either reshape lenses to minimize it, combine layers of different kinds of glass to correct for it, or tell the astronomer to compensate mathematically somehow. Equally, we can implement a different resizing algorithm to remove this systematic bias, or correct for it. The important thing is to think of a program as a piece of experimental apparatus, and treat it accordingly. That is part of what we want to teach in this course. Read More ›

Open Education Week
Greg Wilson / 2012-03-05
Today marks the start of Open Education Week. To celebrate, please post a link here to your favorite online educational resource. Read More ›

Help Us Write Assessment Questions
Greg Wilson / 2012-03-05
We have been asking people the questions below before the start of a bootcamp in order to get a handle on how much they already know. As part of evaluating the current round of work, we'd like to expand the list a little, and touch on a few more topics. Our rough categorization of how much someone should know about various things is in the competence matrix; ideally, we want another 3-4 questions on each topic. Our criteria are: The command or concept should make sense on its own. It should be fairly representative of the topic (that is, not an obscure edge case). Assessing impact may be the most important thing we do in this round of Software Carpentry, so please send us suggestions by email, or add comments to this post. Do you understand the following Unix shell commands well enough to explain them to someone else? ls data/*.txt find ~ -name '*.py' -print Do you understand the following Subversion commands well enough to explain them to someone else? svn update svn diff -r 1723 Do you understand the following Python statements well enough to explain them to someone else? {'east' : 5, 'west' : 11} __init__(self, *args) Do you understand the following testing concepts well enough to explain them to someone else? fixture mock object Do you understand the following Make terms and commands well enough to explain them to someone else? dependency cp $< $@ Do you understand the following SQL terms and commands well enough to explain them to someone else? select * from data where data.left < data.right; inner join Read More ›

Happy People
Greg Wilson / 2012-03-05
Our workshop at the International Center for Theoretical Physics in Trieste has ended, and judging from the participants' smiles, it went well: Thanks once again to Tommy Guy and Katy Huff for teaching, to Stefano Cozzini for inviting them, and to everyone who took part for all their hard work. Read More ›

Performance Curves, Curriculum Design, and Trust
Greg Wilson / 2012-03-04
Suppose you have a processing pipeline with three stages: Each stage takes one second to run; what's its overall performance? As Roger Hockney pointed out in the 1980s, that question isn't well formed. What we really need to ask is, how does its performance change as a function of the size of its input? It takes 3 seconds to process one piece of data, 4 to process two, 5 to process three, and so on. Inverting those numbers, its rate is 1/3 result per second for one piece of data, 2/4 = 1/2 result/sec for two, 3/5 for 3, etc. If we draw this curve, we get: Any pipeline's curve can be characterized by two values: r∞, which is its performance on an infinitely large data set, and n1/2, which is how much data we have to provide to get half of that theoretical peak performance. Deep pipelines tend to have high r∞ (which is good), but also high n1/2 (which is bad); shallow pipelines are the reverse. When I wrote Practical Parallel Programming twenty (!) years ago, I said that the more interesting measure for any machine was actually p1/2, which is how many programming hours it takes to reach half of a machine's theoretical peak performance. It was meant sarcastically: on most machines, the answer was and is "infinity", since most programmers think they're doing well if they can ever achieve 20-25% of the performance that the manufacturer quotes for a piece of hardware. But the idea has stuck with me, and I think it underpins a lot of Software Carpentry. Our goal is to increase researchers' r∞, i.e., to help them produce new science faster. Our challenge is to minimize p1/2, so that researchers see benefits early. In fact, our real challenge is that learners' performance over time actually looks like this: That dip is due to Glass's Law: every innovation initially slows you down. If the dip is too deep, or if it takes too long to recover from it, most people go back to doing things the way they're used to, because that's the safest bet [1]. But I've noticed something interesting in this round of Software Carpentry: if learners are working in a group with their peers, they seem to be willing to trust us more (or for longer) than otherwise. I don't think this is a case of not wanting to be the first to stop clapping; I think instead that with a group of half a dozen or more, the odds are good that someone is getting something out of the material at any particular moment, which gives everyone else a reason to carry on. It's still early days, so I reserve the right to change what I think about this, but I'd welcome feedback... [1] This is why we don't tackle object-oriented programming, distributed version control, or parallelism in our core curriculum: it takes too long for our learners to see the benefits. And in the case of parallelism, the payoff beyond "run all these jobs on whatever hardware is available" is usually negligible. Read More ›

ULP (or, This is tricky and perhaps profound)
Greg Wilson / 2012-03-01
Bruce Dawson's "Comparing Floating Point Numbers (2012 edition)" is, as he says of its subject, tricky, and perhaps profound—worthwhile reading for anyone pushing numbers around. Read More ›

Toronto Bootcamp February 2012: How We Did
Greg Wilson / 2012-03-01
Last week's bootcamp at the University of Toronto was not the most successful one we've ever done: quite a few registrants didn't show up, and based on feedback from the instructors, we tried to cover too much, too fast. That said, the learners who stuck with us all the way through to Friday afternoon came away knowing a lot of useful things (or at least knowing that those things existed). We'll gear back a bit for next week's workshop at Indiana U. and see how that goes. Read More ›

Inscight from Trieste
Greg Wilson / 2012-03-01
In the latest Inscight podcast, the inimitable Katy Huff talks to several of the participants in the bootcamp she and Tommy Guy are leading at the International Centre for Theoretical Physics. It's great to hear so many people from all over the world learning so much. Read More ›

Worth Reading, Worth Watching
Greg Wilson / 2012-02-29
The PLoS Computational Biology article "Ten Simple Rules for Getting Help from Online Scientific Communities" has a lot of good advice. Mark Guzdial's talk "Helping Everyone Create with Computing" is full of good ideas—see in particular the discussion of computational scientists starting around 06:15. Read More ›

Reproducibility Redux
Greg Wilson / 2012-02-28
A recent editorial in Nature, and a longer article by Darrell Ince, Les Hatton, and John Graham-Cumming titled "The case for open computer programs", are just two signs of the growing pressure to raise standards around computational work in science. We can debate the exact meaning and value of reproducibility, but there's no arguing with the fact that if scientists want to do better work, they'll need better skills. That's the real long-term goal of Software Carpentry. Read More ›

Frustration (continued)
Greg Wilson / 2012-02-27
It's been a frustrating couple of days. To recap, I want to convert our material from PowerPoint to HTML5 to make it easier for people to fork and merge, to make things easier to re-style, because it's an open format, and so on. David Seifried has welded an HTML5 audio player to Caleb Troughton's deck.js to create a display tool, which I'm very pleased with, but the content is killing me. Seriously. Here are three slides taken from our first episode on the Unix shell: How can I translate those into HTML? The shell session transcripts are straightforward enough—a <pre> here, a <span> there—but what about the explanatory comment in blue in the third slide? Or the filesystem diagram in the second? Or the stuff (I can't think of a better term) in the first? As I see it, the options are: Give up and do simple bullet-point text with the occasional inset image file, as we did with Version 3. On the upside, it would be easy to write. On the downside, it's second-rate educationally. Good instructors don't cover blackboards with bullet points: they stir diagrams and text together, because that's what's most effective. Create one SVG (or HTML5 canvas element) per slide. The upside is free-form positioning; the downside is that both are painful to work with, which discourages creativity and collaboration. What I mean by "painful to work with" is that a lot of careful manual editing would be needed to do things like add elements incrementally in sync with a transcript. The result would also be largely unintelligible to search engines, and good luck copying and pasting it. Back to the drawing board... Read More ›

Badges (Finalized)
Greg Wilson / 2012-02-27
We have finalized our first set of Software Carpentry badges—with luck, we'll list the first set of recipients later this week. for people who complete the core curriculum for helping to teach (either workshops or online) for organizing workshops for creating (or fixing) content Read More ›

Trieste, Italy Workshop - Week 1
Katy Huff / 2012-02-24
Last Sunday, Tommy Guy and Katy Huff flew to Trieste, a small city in northeastern Italy, to assist in teaching an Advanced School on Scientific Software Development at the International Center for Theoretical Physics (ICTP). Stefano Cozzini, Graziano Giuliani, and Antun Balaz invited us to help them teach this two week workshop which is one in a series of workshops focused on high performance computing for scientific applications. The ICTP mission is to extend access to advanced scientific tools and education to scientists from developing countries. Undergraduate students, graduate students, and post-graduate scientists gathered in Italy from nations all over South America, Asia, the Middle East, and Africa. An intense first day introduced version control and the unix shell in less than four hours, but students stayed alert. Approximately 50 students followed along, performing exercises on the Linux machines in the ICTP computer lab. Reviews of the workshop were written up nightly in student blogs. For good reason, students found that these "afternoon exercises were a bit hard to follow due to fast pace and great amount of information." However, most found that the exercise-driven lecture style we used was an effective way to introduce such a density of information. One student explained that the hands on instruction helped him "to better understand backend story of the git functions/commands with an example." We even convinced some students to take the plunge with version control, some who bravely admitted that "the only version control I had was to regularly send an email to myself with the latest version..." The students were increasingly able to keep up during the following days as Tommy and Katy covered the Python programming language, focusing on data structures and packages useful for science. Some students have found themselves enamored with Python as a result and a number have even decided to re-impliment their research codes in the language. Today, Steve Crouch from the Software Sustainability Institute in the UK gave a series of excellent talks on software development practices that further motivated the version control, testing, and debugging exercises that the students performed over this week. Read More ›

Fourth (or Sixth) Online Tutorial
Greg Wilson / 2012-02-24
For the past four weeks, I've been meeting online with learners from the Space Telescope Science Institute to work through some Python topics and exercises. (We split the group in two for a couple of those sessions to accommodate different interests and levels.) I think it's been going pretty well: about half the people who took part in the two-day workshop in January are still with us, and we've just covered basic image processing with PIL. I'm looking forward to finding out whether the model scales to more groups with more people in the coming four months. Read More ›

Should We Relocate Our Repository?
Greg Wilson / 2012-02-23
Software Carpentry has been an open project since 2004: MIT License for the code, Creative Commons for everything else. All our stuff has lived in a public Subversion repository since then as well—it's at http://svn.software-carpentry.org/swc if you want to check it out. Today's question is, should we move that repo off our server and onto GitHub, BitBucket, or some other repo-hosting service? We want to make it easy for people to remix our content; as I said back in December, that means making it easy for them to fork, merge, and share. Would you be more likely to do this if our slides, learning plans, diagrams, and code samples were on some well-known hosting site rather than in Subversion on our own machine? Read More ›

What Deep Thoughts Look Like
Greg Wilson / 2012-02-22
Before writing yesterday's post about assessment, I should have explained what I mean by"fundamental concepts". I'll start with Lewis Epstein's wonderful book Thinking Physics: Here's a typical problem from the book. Put a block of ice in a bathtub, then fill the bathtub to the brim with water, so that the block is floating freely. When the ice melts, will the water level go up (causing a spill), go down, or stay the same? Hm... well, the ice displaces its own weight of water, so when it melts, it exactly fills the "hole" it made, so the water level stays the same. Now let's try something a bit more complicated. Put the same block of ice in the bathtub, but put an iron weight on top of it, and then fill the tub to the brim. Now what happens when the ice melts? Does the water level go up, go down, or stay the same? Epstein would say that if you can answer questions like that, then you can think physically. It isn't about calculation—as with Dan Meyer's "Joulies", it's about understanding principles and following them through to conclusions. The real aim of Software Carpentry is to teach scientists to think like that about computing. We want people to understand the principles: model-view separation human-readable vs. machine-readable data copying vs. aliasing state machines different models of computation (imperative, functional, reactive, declarative) interface vs. implementation the complementarity of algorithms and data structures code as data (and data as code) The Unix shell, Python, SQL, regular expressions, and what not are how we hook people ("Hey look, something useful") and how we get these principles across (as with most big ideas, any direct description is either incomprehensible or banal). However, these principles aren't natural laws in the way as F=ma and the Second Law of Thermodynamics—if you compare them to the principles I listed a month ago, there's overlap, but the lists aren't the same. So: What is the best way (or a good, stable way) to carve up this intellectual space? How do we tell what a particular person actually understands? "Write this program" is not an answer to the second problem—as many studies have shown, people can solve routine problems by rote without really understanding what they're doing. (This is the starting point for Eric Mazur's work on peer instruction, and the reason so many of us are so skeptical about things like the Khan Academy.) Can we just teach the tools (for some value of "just") and let the big ideas sort themselves out? The answer is clearly "yes", because that's what I did from 1998 to 2007. Does it work? I think the answer is "only partially": some people generalize from specifics to principles correctly on their own, but many either don't do it at all, do it incompletely, or do it incorrectly. And does it matter? I think so: I think that if we want scientists (or anyone else) to use computing on their own, for their own ends, they need to be able to step past what we've shown them with good odds of success, and that certainly requires understanding "why" as well as "what". Read More ›

Watch Me: Trial Run
Greg Wilson / 2012-02-22
A dozen people have come forward since I asked last week for volunteers to make short screencasts showing how they program. I just sent them a sample problem to work on to test things out (see below the fold); the videos they create won't be made public, but I hope it gives readers an idea of the scale of problems we're going to be looking at. If you have suggestions for interesting problems of a similar size, please add them as comments on this post. Hello, and thank you once again for volunteering to help Software Carpentry by recording a screencast to show people how you program. To test out your system, I'd like you to record yourself solving the problem described below using any tool or tools you like, on whatever kind of computer you prefer. Use whatever recording tool you like (a demo version of Camtasia, QuickTime, xvidcap, ...), and save in whatever video format is easiest. Please: do use a headset mike if you have one, but if you don't, please don't worry about it for now—this is just a test do use full-screen recording—the real videos will have to be constrained (probably to 800×600 or 1024×768), but for now, let's keep it simple do talk a lot while you're coding—stream of consciousness like "OK, so let's open up the editor again and try swapping those values in the other order..." is what we're after don't worry about editing your video to cut out "ums" and "errs", typing mistakes, and so on—we'll do that for you in the real screencasts, and again, this is just a test Write a command-line program called 'total' to add up the numbers in a data file. The input file's name is given to the program as its sole command-line argument; its only output is the sum of the numbers in the file (if the file does not contain any numbers, the output is 0.0). The file may contain any number of lines (including none at all); each line may contain at most one floating-point number, and may also contain leading or trailing whitespace. (Lines containing only whitespace, or nothing at all, are allowed, and should be ignored.) For example, if the file 'numbers.txt' contains: 22 31.3 +5.0e1 -1 then invoking the program as: $ total numbers.txt should print 102.3 on a line by itself. Read More ›

Granules of Research
Greg Wilson / 2012-02-22
Cameron Neylon recently posted an article titled "Github for science? Shouldn't we perhaps build TCP/IP first?" His argument is that the web's a good way to move text around, because it was built by programmers, and programmers work with text. It's not (yet) well suited to moving science around, because we don't (yet) have something as granular and portable as text for scientific ideas. Yes, any particular piece of research can be represented as text, but so can any image, or any audio stream, or anything else—it's the structure that adds meaning, and we haven't (yet) agreed on structures. Circling back to today's first post, part of what we're trying to do is give scientists the background to understand and take part in conversations like these... Read More ›

Why *Not* Use Python
Greg Wilson / 2012-02-21
When we started Software Carpentry back in the late 1990s, we used Perl as a teaching language instead of Python. At the time, it was a no-brainer: Perl had many more users, better documentation, and more libraries. We switched because we found ourselves explaining the same inconsistencies over and over again (as I've said many times since, every page of the O'Reilly Pocket Guide to Perl used one of the words "except", "unless", or "however" at least once). Python had fewer "buts": we saw right away that students were learning concepts more quickly, and they seemed to retain more as well. But Python isn't perfect, and I was reminded very forcefully of its biggest flaw on Saturday, when I spent half a day teaching kids aged 8-14 how to program as part of a Mozilla Hack Jam in Toronto. About three quarters of the kids were able to start drawing pictures with Python's turtle graphics library right away. The other quarter, though, stumbled (and were sometimes blocked completely) by the same old installation headaches that plague grownups trying to use Python to do science. One would-be learner showed up with a brand-new MacBook Air running OS X 10.7. Half an hour and four downloads later, he still couldn't get a turtle to draw a straight line. We tried 32 and 64-bit DMGs for Python 2.7.2 and Python 3.2.2, without luck; the only advice Google found for us started, "Install the latest version of XCode...", at which point we gave up. Several others, who had Windows 7 machines, were able to install, but then we discovered that Python still doesn't put itself on the search PATH. "Oh," said one of my helpers, "That's easy, you just go into System... then Advanced... then edit this environment variable..." It's a good thing he was looking at the computer as he said this, instead of at the faces of the kids he was trying to help—if he'd been doing the latter, he would have realized how inappropriate "simple" and "just" were. People used to talk about "grand challenges" in scientific computing. Mostly, they meant the kind of big science that shows up on magazine covers. For me, though, the only "grand challenge" in scientific computing that matters is making stuff work the first time for everyone. It might not be as sexy as protein folding, global climate change, or predictive models of fender crumpling, but it would help a lot more people—and not just scientists. Later: another stumbling block when doing things with turtle graphics is Python's "counted loop" idiom: for i in range(3): do something If I want people to draw squares, hexagons, and what-not, I either wave my hands ("Trust me, this is just what you do") or explain functions and lists when what I really want to do is explain loops. It's not as big a thing as the installation headaches, but first-class ranges: for i in [0:3]: do something would make things noticeably easier in this one particular case. Is it important enough to merit changing the language? Probably not on its own, but if there are other reasons to do it—or to go all out and add a cross-product operator: for (i, j) in [0:3] @ [0:5]: do something 15 As a bonus, we could then overload @ for matrix multiplication :-) Read More ›

Hello from Trieste!
Tommy Guy / 2012-02-21
Day 1 of the Trieste bootcamp was a success! Katy and I covered the Bash shell and git. It was encouraging to see students in the lab after dinner working on their shell exercises. In general, the students are very enthusiastic. Later we'll try to list their home countries. So far, I've met people from Colombia, Mexico, Argentina, South Africa, Cameroon, Nigeria, Romania, Italy, Pakistan, Iran, India, and China. Their fields range from astronomy to nuclear physics to climate science. Today, we are starting Python. You can follow our material on our github page's wiki. Pay particular attention to Katy's lectures on git and github, where she introduced git for personal use in hour one then introduced collaborative use through github in the second hour. Here's a picture (more to come!) Read More ›

Badges (Mark 1)
Greg Wilson / 2012-02-21
One of our key deliverables for the Sloan Foundation-funded work is a badging program built on top of Mozilla's Open Badges Initiative. Riffing on our new logo, Carri Han has designed three badges for us: for people who have mastered our core content for people who have organized and run workshops for people who have created content Please let us know what you think. Read More ›

Assessment Redux
Greg Wilson / 2012-02-21
The single biggest challenge Software Carpentry faces right now is how to tell what impact it's having. This is only partly to satisfy funders—as I said back in December, if we don't know how to tell if we succeeded, we're going to fail. It would be (relatively) easy to put together a multiple-choice quiz to see how much people have learned about basic shell commands, the syntax of Python, and so on, but that would only address the shallowest aspects of learning. We're trying to impart some fundamental principles, and what we need is questions that will tell us whether people have internalized them. (As many studies have shown, it's possible to get a decent score on a quiz without actually understanding the subject matter.) For example, consider this question about Subversion: Emmy wants to see what has changed in her working copy since revision 120. The command she should run is: svn log -r 120 svn diff -r 120 svn revert -r 120 None of the above It addresses Q05 ("How can I keep track of what I've done?") fairly directly, but not R02 ("Use a version control system"), R03 ("Automate repetitive tasks"), or any of the basic principles. Open-ended answers might do the latter, but it's hard to come up with ones that don't lead the witness: asking, "When would you use a version control system?" isn't going to give us much insight into what people actually think. We could combine a few multiple-choice with a few open-ended, but realistically, if it takes more than 10-15 minutes for people to answer, many (most?) won't. If anyone can see a way to square this circle, I'd welcome ideas. Read More ›

A Flash (well, MP4) from the Past
Greg Wilson / 2012-02-19
In July 2009, we held a one-day symposium on open science at the University of Toronto. I recently uploaded video from those talks to YouTube; the audio is a bit shaky, but I hope they're useful despite that. The talks are linked below. Titus Brown: Choosing Infrastructure and Testing Tools for Scientific Software Projects Cameron Neylon: A Web Native Research Record: Applying the Best of the Web to the Lab Notebook Michael Nielsen: Doing Science in the Open: How Online Tools are Changing Scientific Discovery David Rich: Using 'Desktop' Languages for Big Problems Victoria Stodden: How Computational Science is Changing the Scientific Method Jon Udell: Collaborative Curation of Public Events Greg Wilson: Opening Remarks Read More ›

How They See Us, Part N
Greg Wilson / 2012-02-16
This week's Ed-Tech Podcast from Steve Hargadon and Audrey Watters discusses Software Carpentry a bit around the 23:00 mark [1]. In answer to Hargadon's point about home schooling, and whether the way people learning programming even fits the notion of class, we have a couple of answers. First, most of the people we're trying to help don't know enough (yet) to know what to type into Google, how to recognize when they've stumbled upon an answer to their problem, or what to tag a question with on Stack Overflow. Some can climb that hill themselves; a handful can't, but most won't (see below), so one of our goals is to help them get from A to B so that they can get from B to Z. In addition, while the scientists and engineers we're trying to help might think that computing is interesting, their real passion is quantum chemistry, neurology, or climate change; in practical terms, computing is a tax they have to pay in order to do the research they actually want to do [2]. From that perspective, "wander around and stumble upon" feels like a high-risk strategy, so they (mostly) vote with their feet and don't do it. Second, even those who do wander and stumble tend to find very different things. As a result, there's no common core of skills or assumptions that one researcher can reasonably expect her peers to be familiar with. In contrast, most researchers can expect colleagues to know at least a few basic things about statistics, and to share some cultural values about when a correlation is significant and so on. In choosing what to include in our core, we're also (implicitly) making a statement about what that core is, and what's reasonable to expect others to share. [1] What's really interesting, though, is the discussion in the first few minutes about Silicon Valley's ed-tech amnesia. [2] Regarding Hargadon's comment about "willingness to hack", I think that every researcher I've ever met has that in spades—they're just investing that energy in something other than programming. And yes, lists of "things programmers need to know" make me yawn too—but only if I already know enough about the topic to forge ahead on my own. I'm really grateful for "must read" lists whenever I dive into a new area... Read More ›

Watch Me: Volunteers Wanted
Greg Wilson / 2012-02-15
Back in 2007, Jon Udell observed that screencasts facilitate accidental knowledge transfer in a way that more traditional media don't. As I said yesterday, we'd therefore like to start recording short screencasts of programmers thinking aloud as they solve small problems using their preferred tools. The aim is to show learners how to program—what order to write things in, how to debug, when and how much to test, and so on. Everything will be covered by the same Creative Commons license as our other material, and made freely available for remixing and other use. If you'd like to help, please: Volunteer to be recorded by mailing us. We'll help you install a screen recorder (if you don't have one already—you might be surprised to find that you do), give you a small problem, and edit the video you produce so that you don't have to. Volunteer to edit video for us, so that we can put our energy into organizing people :-). Volunteer to work the floor at PyCon in March. We can't attend (workshops to run, etc.), but it would be great if we could get a dozen or more "here's how I do it" recordings done during the conference. Remember, as an open source project, Software Carpentry depends on your help to survive and thrive. If you have wanted to help, but have worried that creating and recording lectures would be too much work, this is a way for you to help that will take half an hour or less. We look forward to hearing from you. Read More ›

Slide Drive
Greg Wilson / 2012-02-15
Speaking of new kinds of content (which I've been doing a lot), David Seifreid has built a working prototype of a new slideshow tool that combines deck.js with an HTML5 audio player. You can check out a demo or grab the source from https://github.com/dseif/slide-drive. Slides are pure HTML like this: <section popcorn-slideshow="24"> <h2>Solution</h2> <p>Short intensive workshops</p> <div> Our solution combines short, intensive workshops... </div> <div popcorn-slideshow="27"> <p>Plus self-paced online instruction</p> <div> ...with self-paced online instruction. </div> </div> </section> which combines slide pages and transcripts in a single file suitable for diffing and merging. (Images are still in external files, but I can live with that.) You can pause the slideshow at any point to select and copy the content (something you definitely can't do with a video), and we'll add support for translations into other languages and so on. Many thanks to David for pulling this together; please let us know what you think. Read More ›

And Speaking of New...
Greg Wilson / 2012-02-15
...check out Bret Victor's talk at CUSEC 2012—jump in around the 7:00 mark and watch for a couple of minutes. You'll want to go back and watch the whole thing... Read More ›

Analyzing Next-Generation Sequencing Data
Greg Wilson / 2012-02-15
Analyzing Next-Generation Sequencing Data http://bioinformatics.msu.edu/ngs-summer-course-2012 June 4th — June 15th, 2012 Kellogg Biological Station, MSU Course sponsor: NIH. Instructors: Dr. C. Titus Brown, Dr. Ian Dworkin, and Dr. Istvan Albert. Board of advisors: Dr. Kevin White; Dr. Paul Sternberg; Dr. Rich Lenski; Dr. Robin Buell; Dr. Jim Tiedje; Dr. Lincoln Stein Applications are being accepted through March 1st (midnight Pacific)! Course Description This intensive two week summer course will introduce attendees with a strong biology background to the practice of analyzing short-read sequencing data from Roche 454, Illumina GA2, ABI SOLiD, Pacific Biosciences, and other next-gen platforms. The first week will introduce students to computational thinking and large-scale data analysis on UNIX platforms. The second week will focus on mapping, assembly, and analysis of short-read data for resequencing, ChIP-seq, and RNAseq. No prior programming experience is required, although familiarity with some programming concepts is helpful, and bravery in the face of the unknown is necessary. 2 years or more of graduate school in a biological science is strongly suggested. Faculty, postdocs, and research staff are more than welcome! Students will gain practical experience in: Python and bash shell scripting cloud computing/Amazon EC2 basic software installation on UNIX installing and running maq, bowtie, and velvet querying mappings and evaluating assemblies Materials from last year's course are available at http://ged.msu.edu/angus/tutorials-2011 under a Creative Commons/use+reuse license. You can read a blog post about last year's course at http://ivory.idyll.org/blog/jun-11/ngs-2011. Read More ›

Stack Underflow?
Greg Wilson / 2012-02-14
Circling back to an earlier post, one of the challenges free-range learners face is where to get help after their workshop or short course is over. Google isn't much help, since they don't yet know enough to know what to look for, and Q&A sites like Stack Overflow can be pretty intimidating: most novices don't want to bother people and/or don't want to look clueless in public (and getting "That's a stupid thing to ask, newbie" doesn't help). One possible solution would be to to set up an area on Stack Overflow specifically for novices. Anyone could ask a question there, but only people with a reasonably high score would be allowed to respond, since their score would be evidence that they're both knowledgeable and civil (it's hard to build up karma on SO if you're aren't both). As well as giving novices a place to turn for help, this would help them transition into the larger community; while it's possible that questions would simply go unanswered, we think it's more likely that at least some of the "hardcore helpful" on SO would dive in (particularly if there were bonus points for answering questions in this area). It would also help ore experienced users by improving the signal-to-noise ratio in the regular areas. Finally, it would make it easier for us to figure out what problems novices are having most often, and what sorts of explanations they find most useful. What do you think? Is "Stack Underflow" worth trying? Later: in response to some comments by email and on Twitter, this "novices area" would not be specific to Software Carpentry (any more than a question like "what's standard input?" is). It could be implemented within Stack Overflow using a reserved tag such as "novice", but by itself, that wouldn't provide the safety that would come from restricting who could answer novices' questions. It also wouldn't provide the fringe benefit of improving the general signal-to-noise ratio... Read More ›

New Kinds of Content
Greg Wilson / 2012-02-14
Mark Guzdial, whose blog on CS education is always interesting, recently posted about using worksheets to help people learn to write programs. As he says, research going back 30 years shows that reading and writing skills develop independently; there's also a ton of research showing that partially-worked examples are a very effective (possibly the most effective) way to teach people new skills. Which immediately suggests two questions: Should we provide worksheet-style examples on this site? If so, would you be willing to help us create them? It would be very easy to say "yes" to the first question: after all, who doesn't want more content? But looking at our site's statistics, it seems that most people are surfers (like Xanthe) rather than divers, and people surfing for solutions to specific problems probably wouldn't work through the examples. Also, given my travel schedule for the next four months, there won't be any point saying "yes" to #1 unless a few people say "yes" to #2. We can't promise to make you rich, but you'd certainly be popular :-) We'd also like your help creating another kind of content. I used to teach with slides; these days, like many other people, I just plug in my laptop and program live. It's not just because it's more response—people also tell me that they learn a lot "by accident" from watching how I program. In that spirit, I'd like to record a bunch of programmers thinking aloud as they solve small problems on their own machines, using their favorite tools [1]. Emacs on Linux, XCode on Mac, the MATLAB IDE on Windows—each has its own quirks and joys, and we can all learn something from each of them. Again, the two keys questions are, "Should we?" and "Will you help?" Please drop us a line if you'd like to. [1] Or aggregate screencasts like this that other people have already done, and whose licenses are CC-compatible. If you have favorites, please add links in the comments. Read More ›

Our New Look
Greg Wilson / 2012-02-13
We have a new logo: which is also available in stacked form for coffee mugs (thanks, Steve): We also have some merchandise, and the first batch of laptop stickers is on its way :-) Read More ›

How Many Legs Does Science Have?
Greg Wilson / 2012-02-13
Back in 2010, Moshe Vardi wrote an opinion piece titled "Science Has Only Two Legs", in which he argued that computational science is just another form of experimental science, and that programs should be held to the same standards as other pieces of experimental apparatus. It's an interesting counterpoint to recent excitement around "big data" (as evidenced in works like The Fourth Paradigm), and, I think, directly relevant to what we're trying to teach. What are your views? Read More ›

Formatting Revisited
Greg Wilson / 2012-02-13
David Seifreid has been working for the past week to combine deck.js, a Javascript-plus-CSS slideshow package, with the AudioJS audio player, so that we can re-do slides as pure HTML5 (instead of using PNGs exported from PowerPoint). At the same time, I'm trying to turn the course notes into long-form prose (what used to be called a "book") for people who prefer reading at leisure. How should all this content be managed? My previous post on 21st Century teaching formats described what I'd eventually like, but the tool I want doesn't exist yet, so what can be done now? We will have: metadata, such as keywords and topic guides; slidescontaining vector diagrams, raster images, point-form text, and code samples audio narration synced with the slides; tranascripts of the narration; and prose (the "book" stuff), which may include the same code samples and figures. I know from experience that the transcripts of the audio will be a starting point for the book-form material, but the latter will be longer. We'll therefore have four parallel streams of data: slides, audio, narration (as text), and the book. That suggests something like this (using the topic/concept distinction I discussed a couple of weeks ago): <section class="topic"> <section class="metadata"> keywords, notes to instructors, etc. </section> <audio src="..." /> <section class="concept"> <section class="slide" popcorn-slideshow="start time in seconds"> Slide (or several slides if we're using progressive highlighting on a single logical slide). </section> <section class="transcript"> Text transcript of the slide (or group of closely-related slides). </section> <section class="book"> Long-form prose discussion of the same concept. </section> </section> <section class="concept"> ...as above... </section> </section> Diagrams and images will be stored in external files and href'd in—I've played with putting the SVG directly in the document, but in practice, people are going to use different tools to edit the two anyway. I'd like to use inclusions for code fragments, so that they don't have to be duplicated in the slide and book sections, but there's no standard way to do text inclusion in HTML (which is odd when you think about it, given that other media are usually included by reference instead of by value). The advantages of this format that I see are: Anyone can edit it without special tools. It's mergeable: provided people stick to a few rules about indentation and the like, it'll be a simple text merge (which is a lot easier than merging PowerPoint slides or the like). We can re-skin it using CSS and a bit of Javascript. (For example, the default web view won't show the transcript or book form, just the slides and audio.) It's accessible to people with visual handicaps (since related content is side-by-side and complete). We can compile it to produce web-only or print-only versions using XSLT or the like if we want to. Things I don't like: I really would like to store code snippets in external files and href them as if they were diagrams or images. We can do that with a simple script, but then what you're editing and what you're looking at in your previewer (browser) will be separated by a compilation step, which in my experience always results in headaches. Different authors' HTML editing tools will indent things differently, so we'll need to provide some sort of normalizer for them to run before doing an update or commit. It's not a big deal, but again, experience teaches that it will lead to a constant background annoyance level ("Whoops, sorry, I forgot to clean up before I committed that change"). We could use a wiki-like syntax for notes, and rely on something like Sphinx to convert that to what we need. This is the route the authors of these SciPy lectures have taken, and while it's intriguing, I don't see how to support the parallel streams we want without some serious hackage. It would also tie any processing tools we build to an idiosyncratic format (reStructuredText); HTML5 might be more typing, but it can also be crunched by almost any language people are likely to use straight out of the box. Thoughts? Read More ›

Advertising Flyer
Greg Wilson / 2012-02-13
I've put together a one-page flyer to advertise the course (each site would customize the "Local info" space at the bottom). Please let us know what you think... Read More ›

Pre-Workshop Questionnaire
Greg Wilson / 2012-02-12
I've been asking people to fill in the short questionnaire below before our workshops in order to give us a better idea of what they want and what they already know. How do you score, and how could we improve the questions? Roughly what fraction of your working time do you spend creating, modifying, or testing software? What operating system will you be using in this class? Linux Mac OS X Windows Do you understand the following Unix shell commands well enough to explain them to someone else? ls data/*.txt find ~ -name '*.py' -print Do you understand the following Subversion commands well enough to explain them to someone else? svn update svn diff -r 1723 Do you understand the following Python statements well enough to explain them to someone else? {'east' : 5, 'west' : 11} __init__(self, *args) Do you understand the following testing concepts well enough to explain them to someone else? fixture mock object Do you understand the following Make terms and commands well enough to explain them to someone else? dependency cp $< $@ Do you understand the following SQL terms and commands well enough to explain them to someone else? select * from data where data.left < data.right; inner join What do you hope to gain from this workshop? What could we do to make it most useful for you? Read More ›

Audrey Watters on Software Carpentry
Greg Wilson / 2012-02-10
Audrey Watters is a prolific, insightful writer on all things related to technology and education. I recently asked her to take a look at this course, and tell us what sorts of things might be worth trying. Her first post on the subject asks some questions about educating end-user programmers in general; your thoughts would be welcome too. Read More ›

Advanced Scientific Programming in Python
Greg Wilson / 2012-02-10
Advanced Scientific Programming in Python A Summer School by the G-Node and the Institute of Experimental and Applied Physics, Christian-Albrechts-Universität zu Kiel Scientists spend more and more time writing, maintaining, and debugging software. While techniques for doing this efficiently have evolved, only few scientists actually use them. As a result, instead of doing their research, they spend far too much time writing deficient code and reinventing the wheel. In this course we will present a selection of advanced programming techniques, incorporating theoretical lectures and practical exercises tailored to the needs of a programming scientist. New skills will be tested in a real programming project: we will team up to develop an entertaining scientific computer game. We use the Python programming language for the entire course. Python works as a simple programming language for beginners, but more importantly, it also works great in scientific simulations and data analysis. We show how clean language design, ease of extensibility, and the great wealth of open source libraries for scientific computing and data visualization are driving Python to become a standard tool for the programming scientist. This school is targeted at Master or PhD students and Post-docs from all areas of science. Competence in Python or in another language such as Java, C/C++, MATLAB, or Mathematica is absolutely required. Basic knowledge of Python is assumed. Participants without any prior experience with Python should work through the proposed introductory materials before the course. Date and Location September 2-7, 2012. Kiel, Germany. Preliminary Program Day 0 (Sun Sept 2) — Best Programming Practices Best Practices, Development Methodologies and the Zen of Python Version control with git Object-oriented programming & design patterns Day 1 (Mon Sept 3) — Software Carpentry Test-driven development, unit testing & quality assurance Debugging, profiling and benchmarking techniques Best practices in data visualization Programming in teams Day 2 (Tue Sept 4) — Scientific Tools for Python Advanced NumPy The Quest for Speed (intro): Interfacing to C with Cython Advanced Python I: idioms, useful built-in data structures, generators Day 3 (Wed Sept 5) — The Quest for Speed Writing parallel applications in Python Programming project Day 4 (Thu Sept 6) — Efficient Memory Management When parallelization does not help: the starving CPUs problem Advanced Python II: decorators and context managers Programming project Day 5 (Fri Sept 7) — Practical Software Development Programming project The Pelita Tournament Every evening we will have the tutors' consultation hour: Tutors will answer your questions and give suggestions for your own projects. Applications You can apply on-line at http://python.g-node.org Applications must be submitted before 23:59 UTC, May 1, 2012. Notifications of acceptance will be sent by June 1, 2012. No fee is charged but participants should take care of travel, living, and accommodation expenses. Candidates will be selected on the basis of their profile. Places are limited: acceptance rate last time was around 20%. Prerequisites: You are supposed to know the basics of Python to participate in the lectures. You are encouraged to go through the introductory material available on the website. Faculty Francesc Alted, Continuum Analytics Inc., USA Pietro Berkes, Enthought Inc., UK Valentin Haenel, Blue Brain Project, Ecole Polytechnique Federale de Lausanne, Switzerland Zbigniew Jedrzejewski-Szmek, Faculty of Physics, University of Warsaw, Poland Eilif Muller, Blue Brain Project, Ecole Polytechnique Federale de Lausanne, Switzerland Emanuele Olivetti, NeuroInformatics Laboratory, Fondazione Bruno Kessler and University of Trento, Italy Rike-Benjamin Schuppner, Technologit GbR, Germany Bartosz Telenczuk, Unite de Neurosciences Information et Complexite, Centre National de la Recherche Scientifique, France Stefan van der Walt, Helen Wills Neuroscience Institute, University of California Berkeley, USA Bastian Venthur, Berlin Institute of Technology and Bernstein Focus Neurotechnology, Germany Niko Wilbert, TNG Technology Consulting GmbH, Germany Tiziano Zito, Institute for Theoretical Biology, Humboldt-Universität zu Berlin, Germany Organized by Christian T. Steigies and Christian Drews of the Institute of Experimental and Applied Physics, Christian-Albrechts-Universität zu Kiel , and by Zbigniew Jedrzejewski-Szmek and Tiziano Zito for the German Neuroinformatics Node of the INCF. Website: http://python.g-node.org Contact: python-info@g-node.org Read More ›

Multiple Pitches
Greg Wilson / 2012-02-09
Based on the feedback we've had on my first attempt [no longer online] at "Software Carpentry in 90 Seconds", it seems as though we should actually create three videos: one aimed at potential participants, i.e., students, workshop organizers, and content creators (since we're hoping these will largely be the same people); one for potential sponsors, who are both funding bodies and the senior scientists who set strategic direction for departments and disciplines; and some "talking head" testimonials from former participants. Does this sound like the right breakdown? Read More ›

Comparing Software Carpentry to CS Principles
Greg Wilson / 2012-02-09
A lot of new educational initiatives in computing have sprung up in the past couple of years, ranging from Mozilla's Hackasaurus to the UK rethinking its grade-school curriculum. One of the biggest is the "Computer Science: Principles" project in the United States, which, with backing from the National Science Foundation, is trying to re-define the high school Advanced Placement course. They have some well-grounded ideas about what should be taught and why; how does Software Carpentry compare? To make the long story below short, Software Carpentry is tackling a subset of the ideas and issues of CS Principles. I originally wrote, "the purely technical subset", but that's not quite right: we do look at communication and collaboration in software teams, at producing information for human consumption, at the aesthetic aspects of programming, but only insofar as these affect correctness and productivity. One reason is that our learners have much less time to devote to this than high school students have for a year-long course. Another is that we're not tasked with shaping young citizens in the way that public schools are, so communication skills and considerations of societal impact aren't our concern. I don't think we cover any big ideas that CS Principles doesn't, which is actually encouraging: if we can hang on five or ten years, we might be able to assume that most of our learners are already familiar with the big ideas, so that we can concentrate entirely on "how". Computational Thinking Practices Connecting Computing to society, innovation, and everyday life: not a focus for SC, although we try very hard to connect things to scientific research and researchers' everyday needs. Developing computational artifacts (which includes writing programs), and applying computing techniques to creatively solving problems: yes. Analyzing problems and artifacts by applying aesthetic, mathematical, pragmatic, and other criteria: I'd like to see more in Software Carpentry on things like code review, but I don't think we need anything more on algorithm analysis or the like. Communicating: not really our focus, but we hope that teaching people how to generate web pages, produce and consume RSS feeds, and so on will make communicating easier and more effective.. Working in Teams: also not directly our focus, but again, we hope that teaching people things like version control will help them collaborate. Big Ideas Ours are described in two recent posts. CS Principles lists these: Computing is a creative activity: we says "computing is a human activity", which isn't quite the same. Abstraction reduces information and detail to facilitate focus on relevant concepts: we say "programming is about creating and composing abstractions", which isn't quite the same. Data and information facilitate the creation of knowledge: we don't have an explicit equivalent. Algorithms are used to develop and express solutions to computational problems: well, yeah. Programming enables problem solving, human expression, and creation of knowledge: I'm not sure this is a "big idea" in the same sense as the others; we don't belabor it, since we think that most scientists already get this. The Internet pervades modern computing: ditto. Computing has global impacts: ditto. Learning Objectives The student can use computing tools and techniques to create artifacts: agreed. The student can analyze computational artifacts: we don't do this, but should. The student can use computing tools and techniques for creative expression: I think this is just a stronger restatement of objective #1. The student can use programming as a creative tool: ditto. The student can describe the combination of abstractions used to represent data: this is the realization of our "it's all just data" big idea, so yes. The student can explain how binary sequences are used to represent digital data: I think this is just a particular instance of #5 above. The student can develop an abstraction: we fold this into #1—in order to develop artifacts (programs), Software Carpentry learners are going to have to develop and understand abstractions. The student can use multiple levels of abstraction in computation: as above. The student can use models and simulations to raise and answer questions: this is a big goal of our course. The student can use computers to process information to gain insight and knowledge: again, this is a big goal of our course. The student can communicate how computer programs are used to process information to gain insight and knowledge: we'd like our learners to be able to explain what they've learned to others, not least so that they can run workshops of their own, but it's not a priority. The student can use computing to facilitate exploration and the discovery of connections in information: as with #9 and #10. The student can use large datasets to explore and discover information and knowledge: as above. The student can analyze the considerations involved in the computational manipulation of information: the discussion makes it clear that "considerations" means "tradeoffs", so yes, this is something we'd like our learners to be able to do. The student can develop an algorithm: if you think that every program is the embodiment of an algorithm, then yes, of course, but if you mean "can derive quicksort on their own", then no. The student can express an algorithm in a language: yes, of course. The student can appropriately connect problems and potential algorithmic solutions: specifically, this means that learners can identify problems that can be solved in a reasonable time; explain why heuristic approaches are necessary to solve some problems in a reasonable time; and explain how some problems cannot be solved using any algorithm. We have explicitly not included this in Software Carpentry, as our experience has been that there's nothing substantive we can teach in the hour or two we'd be able to devote to it that would actually make a difference. Given more time, though... The student can evaluate algorithms analytically and empirically: I agree it's important, but it's not really our focus—we are definitely not trying to cram an entire CS degree into a short course. The student can explain how programs implement algorithms, e.g., explain how instructions are processed: this should be in Software Carpentry, but isn't right now. The student can use abstraction to manage complexity in programs: again, I think learners have to do this to build the artifacts we ask them to build. The student can evaluate a program for correctness: we should do more of this. The student can develop a correct program: I hope so. The student can employ appropriate mathematical and logical concepts in programming: not really—we don't touch on things like pre- and post-conditions, invariants, and the like because we're teaching the craft of programming rather than the science of computing. The student can explain the abstractions in the Internet and how the Internet functions: not currently part of our core, because we don't think we can teach people enough about web programming in a short course to let them do anything except create security holes. I'd be very happy to be persuaded otherwise... The student can explain characteristics of the Internet and the systems built on it: as above. The student can analyze how characteristics of the Internet and systems built on it influence their use: as above. The student can connect the concern of cybersecurity with the Internet and systems built on it: as above. The student can analyze how computing affects communication, interaction, and cognition: out of scope—though I hope we're preparing learners to hear what people like Cameron Neylon, Michael Nielsen, and Jon Udell are saying about this. The student can connect computing with innovations in other fields: not us. The student can analyze the beneficial and harmful effects of computing: as above. The student can connect computing within economic, social, and cultural contexts: as above. Read More ›

Why We Don't Teach Parallel Computing in Software Carpentry
Greg Wilson / 2012-02-07
Konrad Hinsen recently wrote a blog post that explains why teaching parallel computing with Python is hard. To make a long story short, Python's multiprocessing module can fail on simple problems in a whole bunch of ways that require fairly advanced understanding to diagnose and repair—and that's even before you try doing things on Windows. Read More ›

We're Going to Be Busy
Greg Wilson / 2012-02-03
Here are the bootcamps we have lined up for the next few months: International Center for Theoretical Physics, Trieste, February 18 — March 2 University of Toronto, February 23-24 Indiana University, March 7-8 Monterey Bay Aquarium Research Institute, March 26-27 Lawrence Berkeley National Laboratory, March 28-29 University of Chicago, April 2-3 Utah State University, April 14-15 University College London, April 30 — May 1 Michigan State University, May 7-9 Newcastle University, May 14-15 University of Alberta, May 16-17 University of British Columbia, May 22-23 We could use help with all of them, both in person as they're running, and online as participants follow up with self-paced material. If you'd like to help out, please get in touch. Read More ›

First Online Tutorial
Greg Wilson / 2012-02-03
Our first online tutorial with the folks at the Space Telescope Science Institute via Skype, and I think it worked well. Our setup was: The students got together in a meeting room. Each student brought their own laptop. One extra laptop was connected to the projector; its webcam was pointed at the room, so that I could see the students, and its microphone (mostly) picked up their voices. I shared my desktop, so that instead of seeing me, the students could see what I was viewing and typing. I (mostly) used a full-screen terminal window, white on black, with an 18-point font, switching back and forth between my editor and running my evolving program on the command line. That was pretty much it, and as I said, I think it worked as well as live coding in the classroom works as a lecturing technique (which is pretty well). There were a few times when I wanted to see what was on their screens, and going forward, we're going to have to find a way to do that. Overall, though, I think that using Skype for connecting, and native desktop tools for everything else, works better for small groups than things like Elluminate (now part of Blackboard) that try to do it all in one. I know it won't scale to dozens of people, but this will certainly get us through the next six months. If anyone has tips to share, they'd be very welcome. Read More ›

Where To Host Q+A and Discussion?
Greg Wilson / 2012-02-02
People have questions and want answers, or ideas and complaints they want to share. Right now, the only ways for them to do this on our site are: Mail us. Add a comment to a page or blog post. Um... that's it. We experimented with forums last year, but they never reached critical mass (and have since filled up with spam). What should we do going forward? Should we try to resurrect those forums? Set up a Q&A mailing list? Or direct everyone to the computational science area at StackExchange? The pros and cons as I see it are: Forums: hosted here, hence under our control (and could potentially be tightly integrated with the learning content), but really, the last thing the Internet needs is another place to look for information. Mailing list: if there isn't much traffic, it's not useful; if there's lots, people will mostly unsubscribe or tune out. StackExchange: control (or lack of it) is an issue, but it's well engineered, and once scientists get used to looking there, they might start using its siblings (like the original Stack Overflow). I'm obviously inclined to #3—what are your thoughts? Later: as an experiment, I've asked a question on Stack Exchange about core computing skills for scientists. Please feel free to answer it (and vote it up :-). Still later: I've posted a proposal for a novices' area for Stack Exchange. Comments would be very welcome. Read More ›

Software Carpentry in a Minute and a Half
Greg Wilson / 2012-02-02
I've recorded a first draft of the quick introduction I mentioned yesterday. Feedback would be very welcome. Read More ›

Re-doing the Three-Minute Pitch
Greg Wilson / 2012-02-01
It's time to revise Software Carpentry's three-minute pitch. Here's what I think I need to say; as always, comments would be welcome. Opening slide: large logo, the title "Computing Skills for Scientists and Engineers", and a small block at the bottom with the date and license. Our mission is to help scientists and engineers be more productive... ...by teaching them basic computing skills The problem is that scientists and engineers spend 40% or more of their time wrestling with software... ...but more than 95% are largely self-taught... ...so they spend hours doing what should take minutes... ...reinvent a lot of wheels... ...and still don't know whether their results are reliable or not Our solution combines short, intensive workshops... ...with self-paced online instruction... The benefit is more confidence that computational results are correct and significant increases in productivity... ...a day a week is common... ...and 10X isn't rare Our workshops cover the core skills a researcher needs to know in order to be productive in a small team: using version control to manage and share information basic Python programming how (and how much) to test programs working with relational databases using the shell to do more in less time Basically, everything you should know before you tackle things with "cloud" or "peta" in their name Our online instruction goes into these topics in more detail... ...and continues with: program design and construction matrix programming using spreadsheets in a disciplined way data management development lifecycles Our content is all available online... ...under a Creative Commons license... ...so you are free to re-use and re-mix it Our work is supported by The Sloan Foundation The Mozilla Foundation And has been supported in the past by Michigan State University Indiana University Microsoft Queen Mary University London MITACS SHARCNET The Software Sustainability Institute SciNet The UK Met Office The MathWorks The University of Toronto Enthought The Python Software Foundation The Space Telescope Science Institute Los Alamos National Laboratory We also depend on contributions from people like you, who give us feedback... ...create lessons and exercises... ...and organize and deliver workshops For more information, to get involved, or for help organizing a workshop, please visit us online at ... ...follow https://twitter.com/swcarpentry on Twitter... ...or email team@carpentries.org Read More ›

Reorganizing This Web Site
Greg Wilson / 2012-01-31
It's time to reorganize this web site. Here's my plan; comments would be welcome. In particular, WordPress might not be the right tool to use going forward, but I'm not sure what else would be as easy to set up and maintain. Overall design: the logo and "Software Carpentry" always appear at the top, along with a Javascript site menu; pages are fixed width (and all of that width is taken up with content—no sidebars); the footer contains a copyright notice and links to half a dozen pages (About, Contact, License, etc.). These footer links are redundant, since those links are in the Javascript menus as well, but this makes them more visible. The Home Page: display a short text blurb about Software Carpentry and a 3-minute video; has a 2×2 grid, each cell of which has an icon and a line of text to take people to "About", "News", "Lessons", and "Training". About: a longer blurb about Software Carpentry (several paragraphs), and links to three sub-pages: Sponsors: brief note on how to contact us to sponsor; a logo and blurb for each current sponsor; a logo and name for past sponsors. Contributors: brief note on how to contact us to contribute, and the kinds of contributions we want; a photo and short bio for each contributor. Impact (replaces "Testimonials"): pull quotes from past students, a summary of how we're assessing things now and what results we've found; links to PDFs of scholarly papers. Elsewhere: a list of related courses and materials (people can submit links, but they need approval before posting). News (replaces "Blog"): our blog, but with WordPress navigation on a separate page rather than cluttering things up in a sidebar. We should also display recent mentions on Twitter somewhere. Lessons (replaces "Lectures"): the main page is a point-form list of topic and lesson titles, each linking to the appropriate page. The page for Topic PQR has its title, a paragraph about the topic, a list of lessons (each decorated with keywords, duration, and links for downloading audio, video, HTML, PowerPoint, PDF, etc.), and comments. The page for Lesson XYZ shows the topic title (PQR) and the lesson title; the slideshow or video; some up/next/previous links; download links for audio, video, and other formats; exercises; and comments (which may be on the lesson as a whole, on particular slides, or on particular points in or ranges of video). The "Lessons" section also has: Reading (our annotated bibliography; Glossary; Version 4: a page showing a list of lists of links to pages of older material (the stuff we have now); and Version 3: a page showing a similar list of lists of links to even older material. Training (replaces "Boot Camps"): displays a map and calendar (preferably side-by-side) and links to per-workshop pages. The page "Training ABC/Date" is for on-site training at a specific site on a specific date. The "Past Events" page is just a list of links to old training pages. Contact: our email addresses, a link to our Twitter account, and contact info for organizers of upcoming training. (This last also appears on each workshop's page, but there's no harm duplicating it.) License: the full text of our Creative Commons and BSD licenses. Read More ›

Terminology
Greg Wilson / 2012-01-29
Before going further with the redesign of the Software Carpentry curriculum, I need to define a few terms and their relationships. These definitions refer to another post on learners and their needs, which you may want to read first. A concept is an atom of learning. It can be a fact (e.g., what a call stack is), a technique (e.g., how to pass parameters to a function), or a rule (e.g., when to copy data). An association is any connection between two concepts. Obviously, all concepts in a topic should be strongly associated. A dependency is a prerequisite association between two concepts. Circular dependencies may exist (you can't understand A unless you understand B, which depends on C, which in turn depends on A), but we'll work hard to avoid such cycles, since every particular learner encounters concepts in some sequential order. A topic is a set of related concepts; for example, the topics "Parameters" includes all of the concepts described above. A concept may be part of several topics, but for learning purposes, we would like its first appearance for an individual learner to be gentler than subsequent appearances for that same learner. A lesson is an atom of teaching. It will typically comprise a small number (1-4) of closed-related concepts, and be a few minutes long. Where possible, it should be followed immediately by some kind of reinforcement (discussed below). A concept may appear in several lessons (same idea, different contexts). A tutorial is a specific sequence of lessons on related topics; for example, the tutorial "Functions" would include lessons on parameters, returning values, and so on. Tutorials aren't strictly necessary for free-range learners—they can go through lessons in any order that respects dependencies—but we must define them: to give learners like Zuzel the larger narrative arcs they need (learners like Xanthe only need lessons); and to give instructors like Tahura, who are working in traditional classroom settings, guidance on how to fill an hour-long lecture. Reinforcement is something done to help people absorb a concept, such as further examples, class discussion, quizzes, or exercises. All four are problematic: Further examples are based on guesses of what learners might not have understood correctly; such guesses are often wrong. Class discussion is hard to implement for asynchronous, self-directed learners. Multiple choice quizzes can mislead or frustrate learners (most questions have more than one right answer), or fail to clear up learners' misconceptions ("You're wrong, but I won't tell you why"). Long-answer quizzes and exercises require a human assessor's attention (which is also hard to implement for asynchronous learners). Assessmentis anything done to determine how well something else has been done. "How well" is important: assessment will not always (or even usually) be black-and-white. Specific kinds are: formative assessment, which provides short-term diagnostic feedback to learners and teachers; summative assessment, which demonstrates attainment of some level; and evaluative assessment, which tell us how well the teaching is working. Formative assessment is a kind of reinforcement; summative and evaluative are (usually) not. The diagram below shows these relationships. I've drawn most relationships as many-to-many, but in practice, we hope that most will actually be one-to-many, i.e., that concepts will fit neatly into topics, and topics into lessons. The one-to-one relationship between summative assessments and topics is also a bit misleading: what it's meant to imply is that any particular learner will eventually demonstrate mastery of some topic (e.g., Yeleina will show that she understands functions). This is the smallest "chunk" for which we'd contemplate awarding a badge, though as discussed earlier, we're going to start with something coarser-grained. Read More ›

Learners and Their Needs
Greg Wilson / 2012-01-29
I originally wrote these descriptions as part of a post on formats for learning material. I'm finding them useful in other contexts as well, so I'm re-posting them separately. Our description of our audience describes four scientific users in more detail. Zuzel likes textbooks. More specifically, she likes prose that she can read and re-read at leisure. She'll use a tablet, but would rather listen to music while she learns than to someone lecturing. Yeleina prefers interactive learning. She wants to see things evolve on the whiteboard or on the screen; recordings of live coding sessions with voiceover are OK, but slide after slide of bullet points puts her to sleep. She also wants to be able to share ideas about learning content with her peers. Xanthe is a surfer, not a diver—she wants to skim the pages that Giggle searches turn up and piece things together herself. Bullet points and brief sentences work well for her, particularly in areas she's already familiar with. Wafiya teaches programming at a city library. She needs to remix content created by other educators to meet her learners' needs, but doesn't have a lot of time to do so. She also needs to be able to find content that fits into her (individualized and group) learning plans. Veronique is a programmer who is passionate about teaching. She spends several hours a week writing short tutorials, answering questions in Stuck Underflow and other online forums, and occasionally recording screencasts. She'd like to do more with less effort (she finds today's tools frustrating), to make the content she's creating more useful, and to get more feedback from its users. Ursula's preferences are more constrained than Zuzel's, Yeleina's, or Xanthe's because she is visually impaired. Her main assistive aid, a screen reader, can only "see" text (captions in PNG or JPEG images don't count), and becomes confused when pages are modified in place by Javascript. Tahura is an assistant professor at a medium-sized university. She has to teach three undergraduate courses each year, one of which is always "Introduction to Scientific Programming". While she'd like to experiment with new teaching methods, her priority right now is to get tenure, so she (reluctantly) sticks to the university's traditional format: three hour-long lectures a week, one homework exercise every three weeks, a midterm, and a final exam. Read More ›

Our Long Tail
Greg Wilson / 2012-01-26
Frequency a page is viewed (as a percentage of total views) vs. pages (ordered by frequency); data taken over the last 90 days. I guess there's something to this "long tail" stuff after all. Read More ›

Never Mind the Content, What About the Format?
Greg Wilson / 2012-01-26
I'm still gnawing on the problem of how to construct content for 21st Century learning—or, more prosaically, what I should use to build the next version of Software Carpentry. My starting point is the need to serve several different kinds of users [1], whose descriptions I have moved to a separate post on learners and their needs. Textbook: big blocks of prose in some narrative order, with pictures, either printed or electronic, read at the learner's pace, alone. Zuzel likes this. Yeleina doesn't. Xanthe uses content out of order via the index or search bar. Wafiya remixes content from several textbooks to create lessons (by photocopying, merging PDFs, or whatever). Like Zuzel, she has read content in order, but like Xanthe, she mainly uses the index now. Veronique has thought about writing one, but (a) doesn't think she has that much to say about any single topic, and (b) is put off by the effort that would be required. Note: the comments below about the difficulty of copying, pasting, and altering also apply to electronic textbooks, as do the proposed remedies. Static slideshow: a page-by-page dump of a PowerPoint deck, possibly accompanied by a transcript of what the lecturer would say when delivering it. Zuzel uses this as if it were a badly-written textbook, with the transcript as the prose and the slides as diagrams. Yeleina finds it distracting to switch attention back and forth from slides to transcript. Xanthe searches the transcript to find what she wants, then curses because her search engine can't "see" the text in the slides. She also hates the fact that she can't copy and paste the code in the slides (since they're PNGs embedded in a web page). Wafiya remixes this content like any other. She's too polite to curse, but she finds it tedious to re-type the code that's shown in the slides (but isn't duplicated as text in the accompanying transcript). She's also finds it wearying to have to re-do diagrams: since the slides are PNGs, it's difficult for her to copy part of a slide, move its elements around, and add a few of her own. Veronique doesn't create material in this format because she thinks it's old-fashioned and not useful. Note: source code can be made available as copy-and-pasteable text directly in the page, or for download; diagrams can similarly be made available as SVGs to facilitate remixing. Doing either currently requires considerable extra work on the part of content creators. Voice-over slideshow screencast: a video recording of the slides (as they would appear on screen in a lecture) with someone speaking over them, and subtitles. Zuzel ignores the video and reads the transcript as if it were a static slideshow. If a transcript isn't available, she (reluctantly) watches the video. Yeleina prefers this to a static slideshow, but prefers the doodling screencast described below even more. Xanthe hits the "back" button as soon as she realizes it's a video (unless there's a transcript, in which case she curses because she can't copy and paste code out of a video). Wafiya directs students like Yeleina to these, but finds them harder to remix than other formats. Veronique thinks this format is also old-fashioned and not useful. Note: I'm assuming the subtitles are duplicated as a transcript, or available in some other searchable form. I'm also assuming that code is available for donwload or duplicated in the page for coying and pasting, though all of this requires extra work. Voice-over doodling screencast: a Khan Academy-style recording of someone doodling on a tablet or coding live. Zuzel treats this like a slideshow screencast. Yeleina likes this format a lot, particularly if she can add comments at specific points and see her peers' comments. Xanthe has mixed feelings: she dislikes explanations delivered this way, but frequently watches "how to" videos, since they're more likely to be accurate and complete than written descriptions. Wafiya treats these like slideshow screencasts. Veronique creates these fairly regularly: they're easy to do, and easy to re-do when systems change or she discovers a mistake. Note: I'm making the same assumptions about transcripts, code, and diagrams as above. Recorded whiteboard lecture: someone with a camera has recorded someone giving a lecture in a lecture hall, and spliced that with whatever was on the lecturer's screen. Zuzel treats this like any other screencast. Yeleina prefers this to doodling screencasts because she can see the speaker's body language. Xanthe treats these like any other screencast, i.e., she'll use it if there's a searchable transcript and things to copy and paste, or if it's a recording of a live "how to" coding session. Wafiya treats these like slideshow screencasts. Veronique doesn't create these, partly because of the setup required, but also because she doesn't think seeing her adds value—the lesson's supposed to be about the stuff. Note: I'm assuming an electronic whiteboard, since video of someone writing on an actual whiteboard is usually illegible. Radio drama: a voice-only podcast-style presentation. Zuzel ignores the audio and reads the text transcript. Ditto for Yeleina. Ditto for Xanthe. Ditto for Wafiya. Veronique doesn't create these. Note: but for Ursula, who is blind, this is the only format—all the others fold into it. She doesn't need code samples as text for copying and pasting: she needs them so that her screen reader can tell her what's that code contains. Star Wars: high-quality video with custom animations, cut scenes, and other special effects. Zuzel watches these sometimes, but doesn't learn any more from them than she would from a slideshow. Yeleina enjoys these, which means she pays more attention to them, which means she learns more (but no more than she'd learn from an engaging lecturer). Xanthe doesn't see the point. Unless something blows up. Wafiya likes their high production values, and remixes the special effects segments frequently. Veronique can't afford to produce this kind of material. Note: again, I'm making assumptions about transcripts, copy-and-pasting, etc. Write your own adventureexploration: typically a set of connected ideas or challenges with explicit dependency information (i.e., you should/must learn A and B before tackling C). Zuzel finds the lack of narrative difficult. Yeleina enjoys these if each node in the graph is in one of her preferred formats. She enjoys them even more if she is exploring with peers. Xanthe ignores the ordering and searches for what (she thinks) she needs. If content is locked down—i.e., if the system won't let her see or search C until she's "completed" A and B—she writes an angry tweet and moves on. Wafiya likes this format for several reasons, but only if everything is always visible. First, it tells her how other teachers think ideas connect (something that is missing or out-of-band for other delivery formats). Second, it's easy to remix: again, providing it's open, she can reorder things as she thinks best for particular learners. Veronique would like to do this, but has discovered that creating the metadata about dependencies and recommended paths is as hard as writing a textbook. Wander aroundexploration: lots of little snippets, but no explicit dependency information. Zuzel finds this even more difficult. Yeleina likes this less than the "write your own adventure" format: she thinks it's no different than just using Giggle to find things. Xanthe likes this because it's just like using Giggle searches. In fact, she uses every other format as if it were this one. Wafiya feels the same way as Yeleina: she likes having stuff to remix, but she has to do that remixing before this material is useful to those of her students who aren't as independent as Xanthe. Veronique creates content like this almost without realizing it by answering questions at Stuck Underflow. Jam session: a bunch of learners in a room working through material simultaneously. Zuzel doesn't like it: it's too noisy for her to concentrate, and she can't go back at her own pace to review. Yeleina thinks this is the best... thing... ever. Xanthe is Giggling for information as soon as the presenter tells people what the topic is, but will stop and watch carefully when the presenter is typing live on screen. Wafiya can only book space to do this occasionally, and even then, she doesn't enjoy improv teaching. Veronique enjoys doing this—she's volunteers with a local free-range learning group—but can only find time once every couple of months. Note: in theory this can be combined with any of the formats above. In practice, it's almost always short, live lectures interspersed with hands-on practical work. Personal tutoring: one-to-one instruction, a.k.a. "pair learning". Zuzel doesn't mind this, but she really does prefer books... Yeleina actually prefers jam sessions, since they tend to be more lively. Xanthe likes having someone available to answer questions on demand, but is happy Giggling on her own for most of what she needs. Wafiya wishes she could do this with every one of her learners, but there simply aren't enough hours in the day. The personalized lesson plans she draws up are the closest approximation she can manage. Veronique does this a lot, but is frustrated that it doesn't scale—she really wants to help more than one person at a time. I'm sure some of the above is inconsistent or just plain wrong, but here are my takeaways: Different people want content in different formats. Yeah, OK, we knew that already, but: Everybody needs first-class content, in the programming sense of the term. In practice, it means that every kind of content can be copied and pasted without losing its meaning. A bunch of colored pixels in an image that look like letters aren't actually letters; if you copy a region of an image and paste it into a text editor, you don't get the text [2]. Similarly, search engines like Giggle can't "see" code evolving line-by-line in a video, so you can't search for that. Together, I think that point #1 and point #2 imply that: We need model-view separation in learning content. I apologize for the computerese, but I don't know any other way to say it. A model (more fully, data model) is how information is stored, while a view is how people interact with it. Models should be designed to be easy for computers to work with; views should be designed to meet human needs, and the plural there is important: different people want to interact with information in different ways, and even a single person may want to use different ways at different times. Search engines want the information that's in the model, such as the captions on the boxes in a diagram, not some arbitrary view of it (like a bunch of pixels in a PNG). People usually want that as well when they're remixing, since their goals are to combine that information with information from other sources, and/or to present that information in different ways (i.e., views). We also need first-class metadata. I haven't been able to find a standard format for summarizing and exchanging lesson objectives, learning dependencies, and everything else needed to stitch individual facts together. The closest thing seems to be SCORM, but I'd rather stick a fork in my eye [3]: it's bloated, it mixes data models with meta-models with presentation layers with everything else its authoring committee could think of, and did I mention the fork? I could provide metadata as data, e.g., put a point-form list at the top of a lesson saying, "Here's what you need to know before tackling this," but that mixes model and view: since it's just a convention, computers will have a hard time stitching things together accurately. Finally, we need social learning. Even the Zuzels of this world learn best in collaboration with other people: peer learners are often better at understanding and clearing up misconceptions than instructors, and having a "running partner" helps people stay focused and motivated. This isn't really a matter of format, though, but of the tooling used to deliver content, so I'll skip over it below. OK, so how well do today's tools and/or formats do by these measures? The fact that "PowerPoint" is both a tool and a format is one indication that the answer is going to be, "Not well." Plain textis highly searchable and remixable, plays nicely with accessibility aids (i.e., screen readers), and runs everywhere, but: it doesn't do diagrams (unless you count ASCII art); it doesn't directly support metadata (except by convention); and it doesn't separate models from views HTML is also highly searchable and remixable—until you start doing dynamic updates with Javascript, at which point today's search tools (and accessibility aids) can't keep track of what's going on. Unlike text, it provides standard ways to include other media, so we'll delay discussion of images and video. And while it doesn't offer a standard way of providing metadata out of the box, HTML5's custom data attributes were designed with exactly this kind of use in mind. And modern HTML partially separates models from views: I can use CSS to tell the rendering engine (e.g., a web browser) to display things differently for different use cases. DocBook, LaTeX, and wiki text separate models from views even more than HTML does. What's in the file is a description of content, information about the content, and just enough formatting to make things pretty when viewed in specific ways, e.g., "Break the page here to avoid an orphaned line." Diagrams and metadata can be handled the same way as it is for HTML; in fact, I can't see any advantage these formats have over modern HTML any longer [4], so I'm going to take them off the table. PNG and other raster formats: fail the searchability and copy-and-paste tests. SVG and other vector formats: do better. Since (some of) the content and relationships are explicit, search engines can find things in SVGs, and you can actually select and copy a box or an arrow, rather than a region of pixels. It only goes so far—Visio-style information about "this arrow connects the box labeled A to the box labeled B" is mostly implicit—but it's better than raster. I've seen people do entire lessons as a series of SVGs, or as one large SVG with progressive reveal; I'll talk about this more below. PowerPointand its kin: model, view, and authoring tool are inextricable from one another. You can copy and paste things, and modern search engines understand the format well enough to index textual content, but metadata is just a convention, and remixing takes a lot of work (even if the version you have is the original, rather than an exported ZIP file containing an HTML page that references PNG representations of the slides). That said, authoring rich presentations is easier than it is with HTML+SVG: You use the same tool to create textual and graphical content, rather than having to switch between tools and stitch content together. You can connect textual and graphical content, i.e., you can draw a circle around a word in one of your bullet points, then connect it with an arrow to a particular box in a diagram, just as you would when writing freehand on a whiteboard. This is what HTML-based slideshow packages lack: right now, they force authors to segregate text and graphics, which I view as a throwback to the era of hot metal typesetting. The fact is, most presenters continue to use PowerPoint (or something similar) because it makes it easy to create a reasonably good presentation in a reasonable amount of time [5]. HTML slideshow packages fail this test: authors must sacrifice the quality of the presentation (e.g., skip graphics, or embed segregated graphical files), and do a lot of non-content typing (tags, page IDs, and so on). Video: fails all the "first-class content" tests [6], and isn't effective [7] unless: authors have the resources to produce Star Wars-quality content [8], or they're showing learners how to do something, like dissecting a frog or using a debugger. So after all of this, what do I actually want? I want content stored in HTML5 with purely semantic markup, so that it can be searched, copied and pasted, and styled for presentation in a variety of ways [9]. I want an agreed-upon meta and data-* vocabulary for educational metadata, like dependencies, introduction of key terms, questions and answers, and so on. I want a similar vocabulary for commenting and other social interactions that plays nicely with things like the Salmon protocol. I want an authoring tool (note the singular there) that lets me: write and draw WYSIWYG instead of typing in tags and IDs; freely mix drawings and text; and manage parallel streams (or channels), so that I can keep slide content, presenter's notes, prose, and translations of all three into other languages together. I want to be able to animate my drawings and text, which is emphatically not the same as "embed video" (though I may want to do that too). Instead of recording the pixels drawn on the screen as I type Python into an editor, I want to record and play back the text that's being created, so that learners can pause the animation, copy the text, and paste it somewhere else. Equally, instead of painting pixels to fool your eyes into believing that a box just moved off the screen, I want to move the damn box; once again, if you pause the animation, you should be able to click on the box, attach a comment to it, paste it into your own drawing, etc. Freeling mixing drawings and text feels like it ought to be doable today: we could either put the text in blocks inside a canvas element, or layer a transparent canvas over the page and dynamically resize it. Anchoring drawings to the underlying text (e.g., keeping the arrow from a term to the corresponding bit of the diagram in the right place) is "just" Javascript (for some value of "just"). Making it all WYSIWYG is just more Javascript [10]. But animation... Ah, that's a big one. It's an intrinsically hard problem, but canned effects can do a lot to put simple things within reach [11]. The big question is, how far do we push it? If I want to show you how to use a debugger, or how to draw something with a painting program, I can't re-create the whole UI—I'm going to have to record pixels off a screen. Or am I? I know this is never going to happen—we're not that organized a species—but just imagine what the world would be like if every interface was built using HTML5 and CSS. Any tool at all could export widget descriptions and a semantic trace of what they did (i.e., "the file menu was pulled down" rather than "the cursor moved to pixel (132,172) and the user clicked"), and any other tool could consume it and play it back. The consuming tool might draw the widgets differently, or display the interactions in its own way, but that would be exactly the same as applying a different skin to the original tool [12]. Returning to this universe for a moment, we can store things as HTML5 right now—I'm already using it for Version 5 of Software Carpentry. I could create a vocabulary for instructional metadata, but I'm not an information architect. WYSIWYG authoring tools for HTML5 abound, though the HTML5 they produce can be idiosyncratic (and doesn't play nicely with version control, but that's fixable). I haven't seen a WYSIWYG tool that supports freehand drawing mixed freely with text, or one that supports parallel content streams, but I think half a dozen people working could deliver something substantial in half a dozen months [13]. As for animation, I think we're stuck with video for now: prototyping an HTML5/SVG/Javascript animation framework for use in a learning tool would be a great research project, but we really do need to build a couple to throw away to find out if it's workable. If you'd lke to tackle it, please let us know—I'd be happy to be your alpha tester. Notes [1] There was a lot of talk in the 1980s and 1990s about different people having different learning styles, inspired in part on Gardner's theory of multiple intelligences. The idea has mostly been discredited, but like many memes, it lives on in popular culture. [2] Although I bet someone's working on an Emacs mode to do that... [3] I've actually done this, so I know whereof I speak. [4] Except that LaTeX and wiki text require slightly less typing than HTML, but if you're using a smart editor, even that advantage goes away. [5] Please don't quote Tufte's complaints about PowerPoint at me—I don't think it encourages bad presentations any more than the tangled rules of English spelling and grammar encourage bad writing. [6] In particular, almost all video content makes life harder for the visually impaired: a screencast in which someone talks over themselves typing in an editor or sketching on a tablet is tantalizing but useless to someone who can't see the pixels. I committed this sin when I created Version 4 of Software Carpentry; I'd like to do better in Version 5, and would like to see high-profile online learning sites make some kind of effort as well. [7] But wait a second: if video isn't effective, why do MIT Open Courseware and the Khan Academy work so well? The short answer is, they mostly don't: if you take out the 15% of people who can learn almost anything, no matter how it's presented, watching videos and doing drill exercises works less well than other options. The longer answer is, watching a good teacher (and Khan is a great teacher) work through a problem, instead of just presenting the answer, moves the content into the "how to" category that video is well suited to. [8] Research dating back to the early 1990s shows that higher-quality material improves student retention. I don't know whether it improves it enough to justify its higher production costs, though. [9] HTML5 will also help with version control, since I expect HTML5-aware diff-and-merge tools to start appearing Real Soon Now. Of course, I've been saying that for almost ten years... [10] These days, you can wave away almost any technical objection with "it's just more Javascript". [11] In my mind, the animation interface looks more like Scratch than it does like PowerPoint's menus and dialogs. It definitely doesn't require people to type in code, unless they want to create and share an entirely new kind of animation effect. [12] We could even call that format XUL... [13] "6×6" is as big a team/timescale as I'm able to contemplate these days. Read More ›

The Big Picture
Greg Wilson / 2012-01-25
I'm trying to be systematic about re-designing the core curriculum of Software Carpentry. So far, I've identified 11 common questions: Q01: How can I write a simple program? Q02: How can I make the program I've written easier to reuse? Q03: How can I reuse code that other people have written? Q04: How can I share my work with other people? Q05: How can I keep track of what I've done? Q06: How can I tell if my program is working correctly? Q07: How can I find and fix bugs when it isn't? Q08: How can I get data into my program? Q09: How can I manage my data? Q10: How can I automate this task? Q11: How can I make my program faster? whose answers depend on three fundamental principles: F01: It's all just data. F02: Programming is a human activity. F03: Better algorithms are better than better hardware. These break down into 11 more specific principles: P01: Code is just a kind of data. P02: Metadata makes data easier to work with. P03: Separate models and views. P04: Trade human time for machine time and vice versa. P05: Anything that's repeated will eventually be wrong somewhere. P06: Programming is about creating and composing abstractions. P07: Programming is about feedback loops at different timescales. P08: Good programs are the result of making good techniques a habit. P09: Let the computer decide what to do and when. P10: Sometimes you copy, sometimes you share. P11: Paranoia makes us productive. which in turn translate into 11 recommendations: R01: Use the right algorithms and data structures. R02: Use a version control system. R03: Automate repetitive tasks. R04: Use a command shell. R05: Use tests to define correctness. R06: Reuse existing code. R07: Design code to be testable. R08: Use structured data and machine-readable metadata. R09: Separate interfaces from implementations. R10: Use a debugger. R11: Design code for people to read. Here's how I see all this mapping onto the curriculum (assuming we replace agile development with number crunching): The Shell: files and directories; creating things; pipes and filters; permissions; shell scripts; finding things; variables; loops Q03: How can I reuse code that other people have written? Q10: How can I automate this task? P04: We can trade human time for machine time and vice versa. P06: Programming is about creating and composing abstractions. R03: Automate repetitive tasks. R04: Use a command shell. R06: Reuse existing code. Version control: update, edit, commit, and history; merging conflicts; recovering old versions; setting up a repository Q04: How can I share my work with other people? Q05: How can I keep track of what I've done? Q09: How can I manage my data? F01: It's all just data. F02: Programming is a human activity. P01: Code is just a kind of data. P02: Metadata makes data easier to work with. P05: Anything that's repeated will eventually be wrong somewhere. P07: Programming is about feedback loops at different timescales. P11: Paranoia makes us productive. R02: Use a version control system. R03: Automate repetitive tasks. R08: Use structured data and machine-readable metadata. Basic Programming in Python: variables and assignment; repeating things; lists; reading and writing; conditionals; nesting control structures; design patterns Q01: How can I write a simple program? Q02: How can I tell if my program is designed well? Q08: How can I get data into my program? P04: We can trade human time for machine time and vice versa. P05: Anything that's repeated will eventually be wrong somewhere. P06: Programming is about creating and composing abstractions. R01: Use the right algorithms and data structures. R11: Design code for people to read. Interlude: aliasing P10: Sometimes you copy, sometimes you share. Interlude: text F01: It's all just data. Interlude: Booleans and while loops R11: Design code for people to read. Interlude: Using a debugger Q01: How can I write a simple program? Q07: How can I find and fix bugs when it isn't? F01: It's all just data. R10: Use a debugger. Functions and Libraries in Python: how functions work; aliasing (again); multiple arguments; returning values; libraries; standard libraries; functions as objects Q01: How can I write a simple program? Q02: How can I tell if my program is designed well? Q02: How can I make the program I've written easier to reuse? F01: It's all just data. P05: Anything that's repeated will eventually be wrong somewhere. P06: Programming is about creating and composing abstractions. P10: Sometimes you copy, sometimes you share. R06: Reuse existing code. R09: Separate interfaces from implementations. R11: Design code for people to read. Interlude: provenance Q05: How can I keep track of what I've done? Q09: How can I manage my data? Q10: How can I automate this task? F01: It's all just data. P09: Let the computer decide what to do and when. R03: Automate repetitive tasks. R08: Use structured data and machine-readable metadata. Program Development: creating a grid; randomness; neighbors; handling ties; putting it all together; fixing bugs; refactoring Q01: How can I write a simple program? Q02: How can I tell if my program is designed well? Q11: How can I make my program faster? F02: Programming is a human activity. P06: Programming is about creating and composing abstractions. P07: Programming is about feedback loops at different timescales. P08: Good programs are the result of making good techniques a habit. R01: Use the right algorithms and data structures. R06: Reuse existing code. R07: Design code to be testable. R09: Separate interfaces from implementations. R11: Design code for people to read. Interlude: configuring programs F01: It's all just data. Interlude: assertions; exceptions P11: Paranoia makes us productive. Testing: goals; tests as specifications; structuring unit tests; using a unit testing framework; design for test Q02: How can I tell if my program is designed well? Q06: How can I tell if my program is working correctly? Q07: How can I find and fix bugs when it isn't? Q10: How can I automate this task? F02: Programming is a human activity. P01: Code is just a kind of data. P07: Programming is about feedback loops at different timescales. P08: Good programs are the result of making good techniques a habit. P09: Let the computer decide what to do and when. P11: Paranoia makes us productive. R03: Automate repetitive tasks. R05: Use tests to define correctness. R07: Design code to be testable. Sets and Dictionaries: sets; storage; dictionaries; simple examples; longer examples F03: Better algorithms are better than better hardware. Q11: How can I make my program faster? R01: Use the right algorithms and data structures. Interlude: numbers F01: It's all just data. Number Crunching; basics; indexing; linear algebra; making recommendations; statistics Q03: How can I reuse code that other people have written? Q11: How can I make my program faster? F03: Better algorithms are better than better hardware. P04: We can trade human time for machine time and vice versa. P09: Let the computer decide what to do and when. R01: Use the right algorithms and data structures. R06: Reuse existing code. Databases: selecting; removing duplicates; calculating new values; filtering; sorting; aggregation; joins; missing data; nested queries; transactions; programing with databases Q08: How can I get data into my program? Q09: How can I manage my data? F01: It's all just data. P02: Metadata makes data easier to work with. P03: Separate models and views. P05: Anything that's repeated will eventually be wrong somewhere. P09: Let the computer decide what to do and when. R08: Use structured data and machine-readable metadata. Comments and suggestions would be very welcome. Read More ›

Test-Driven Public Speaking
Greg Wilson / 2012-01-24
Once again, Cameron Neylon explains things much better than I ever could: "The impact factor of a journal is a better predictor of the chances of a paper being retracted than...of the number of citations." Read More ›

Take Out Agile, and Add...What?
Greg Wilson / 2012-01-24
Based on the feedback we've received so far (both as comments and by email), it looks like we should take development methodologies (i.e., agile development) out of the core curriculum and replace it with two hours on: Nothing: there's already too much in the core. Spreadsheets: because many scientists use them badly. NumPy and/or Pandas: because many of them are crunching matrices/doing stats. Visualization: which in practice would mean the basics of matplotlib. Image manipulation: because it's fun as well as useful, and lets us talk about binary vs. text data. I am quite arbitrarily limiting options to those five. Please cast your vote (one vote, not three out of five) in comments. We'd be grateful if you could include a brief explanation as well. Read More ›

Badging
Greg Wilson / 2012-01-24
One of the things we need to do in the next six months along with running workshops and updating our online content is to create some sort of badging to recognize people's skills and contributions. As we said in the proposal to the Sloan Foundation, "A badge program will provide near-term incentives for both learning and mentoring; a framework to support viral, peer-driven engagement with the program; and facilitate recognition by partner institutions and potential employers." We're going to rely on Mozilla's Open Badges project to handle the mechanics of storing and validating badges, so we only have three questions to answer: What do we award badges for? How do we determine that someone has earned one? What do they look like? The obvious answer to the first (and most important) question would be, "You get a badge for completing the core curriculum." However, one of the purposes of badging is to provide a finer-grained inventory of people's knowledge and skills, so there's an argument to be made for giving one badge per topic, e.g., a version control badge, a Unix shell badge, a basic imperative programming badge, and so on. The argument for is that their meaning will be clearer: if I say, "Jane knows the basics of Subversion," that's more immediately understandable than, "Jane has completed the core of Software Carpentry." The argument against is that if someone has collected two hundred small badges, we're going to aggregate them anyway ("Jane knows basic software development skills"), so why not just do that in the first place. I've gone back and forth on this, but currently think that one badge for the core curriculum ("Basic Software Carpentry") will work best. We will offer two other badges as well: one for organizing a bootcamp, and one for contributing a medium-sized chunk of content (on the scale of one 5-minute video episode). Having decided that, the next challenge is to determine when someone has earned a particular badge. The "Bootcamp Organizer" and "Content Contributor" badges are straightforward; telling when someone has mastered the core skills is not. We can tell that you've attended the bootcamp and viewed the videos, but how can we tell how much you've actually learned? "Solve this problem and email us the result" isn't good enough: you could get someone to do it for you [1], and even if you're honest, we can't tell how quickly you did it, how many blind alleys you went down, how often you did something in ten steps instead of one, and so on. In the short term, I think the solution is to do assessment in real time using desktop sharing, i.e., you share your desktop with me, I give you the problem to solve, and I watch you do it. This won't scale to hundreds or thousands of learners, but it'll get us through the next six months. What will badges look like? A badge is just a small PNG file with a digital signature embedded in it (it's a neat little hack), so the graphic design is up to us. I like our current logo, but (a) it doesn't size down well, and (b) I've been wanting to redesign it anyway, since the blue-to-white fade in the background doesn't print well on t-shirts, coffee mugs, and other media. In keeping with our carpentry theme ("We're not teaching people how to build the Channel Tunnel, we're teaching them how to hang drywall"), I'd like an image that combines tools like hammers and saws with something like 1's and 0's to represent software, but I'm a lousy graphic designer—if any of our readers would like to take a crack at it, please let me know. Finally, and most importantly, how can we get existing institutions—specifically universities—to recognize badges in some way? As much as we'd like people to value skills for their own sake, everyone is always busy, and always has more to do than time to do it. Can we persuade a few schools to list badges as non-credit items on students' transcripts (just as they might presently list a short course in presentation skills or entrepreneurship that doesn't count toward degree requirements, but required some work on the student's part)? If so, it would give people an extra incentive to complete the core curriculum, organize a workshop, or create some content for us, particularly in a tight job market where every small distinction counts. [1] It's unlikely that someone would cheat on a Software Carpentry exercise, but in general, if badges take off and actually start to matter, the people who sell college students essays on Steinbeck at $30 a pop will start offering to write their online exams for $50 each. Read More ›

Revising the Curriculum
Greg Wilson / 2012-01-23
I've been thinking some more about what the foundation and core of Software Carpentry actually are (and not just because Jon Pipitone keeps pestering me to do so). My last attempt had a foundation of seven principles and dozen topics in the core. I think I can slim that down even further; in fact, I think three big principles form the foundation of computational thinking: It's all just data, whose meaning depends on interpretation. This subsumes the notions that programs are a kind of data (which is the basis of things as diverse as functional programming and version control), and that we should separate models from views (because the most efficient ways for people and computers to interpret data are different). It doesn't really include the distinction of copy vs. reference, but I'm going to lump it in here because that idea doesn't seem big enough to deserve a heading of its own. Programming is a human activity. The only way to build large programs (or even small ones) is to create, compose, and re-use abstractions, because our brains can only understand a few things at a time. Similarly, good technique (specifically version control, testing, task automation, and some rules for collaborating, be they agile or sturdy) is necessary because everyone is tired, bored, or out of their depth at least once in a while. Better algorithms are better than better hardware. Computational complexity determines what's doable and what isn't, and no aspect of program performance makes sense without some understanding of it. I also think we can reduce the core topics to just nine, though I can already hear protests from the back of the room about some of the omissions. I got this list by asking, "What's the minimum I think a graduate student needs to know to contribute to the computational work in a typical lab?" My answer is: The Unix shell Includes: basic commands (from ls and cd to sort and grep); files and directories; the pipe-and-filter model. Because: it's still the workhorse of scientific computing (and is experiencing a resurgence as cloud computing becomes more popular). Illustrates: "lots of little pieces loosely joined" is a good way to introduce modularity and tool-based computing; it lets us talk the human time vs. machine time tradeoff. Omissions: find; shell scripts (particularly for loops); SSH. Version control Includes: update/edit/commit; merge (with rollback as a special case). Because: it's a key technique. Illustrates: the idea of metadata; programming as a human activity (the hour-long red-green-refactor-commit cycle). Omissions: branching; distributed version control. The common core of programming Includes: variables; loops; conditionals; lists; functions; libraries; memory model (aliasing). Because: we can't teaching validation, associative data structures, or program design without this common core. Illustrates: programming as a human activity (programs must be readable, testable, etc.). Omissions: object-oriented programming; matrix programming. Validation Includes: structured unit tests; test-driven development; defensive programming; error handling; data validation. Because: defense in depth is key to building large programs, and trustworthy programs of any scale. Illustrates: trustworthy programs come from good technique. Omissions: testing floating-point code (since we don't really know how to). Program construction Includes: piecewise refinement; refactoring; design for test; first-class functions; using a debugger. Because: knowing the syntax of a programming language doesn't tell you how to create a program. Illustrates: creating and composing abstractions; interface vs. implementation. Omissions: structured documentation. Associative data structures Includes: sets (as a prelude); dictionaries; why keys must be immutable. Because: useful in so many places. Illustrates: how the right algorithms and data structures make programs more efficient. Omissions: implementation details. Databases Includes: select; sort; filter; aggregate; null; join; accessing a database from a program. Because: useful in many contexts. Illustrates: separation of models and views; a different model of computation Omissions: sub-queries; object-relational mapping; database design. Note: we could illustrate many of the same ideas with spreadsheets, but they're not as easy to connect to programs. Development methodologies Includes: agile practices (the usual Scrum+XP mix); sturdy (plan-driven) lifecycles. Because: ties many other lessons together. Illustrates: good technique makes good programs. Omissions: code review. If we use a two-day workshop to start, and follow up over six weeks with one lesson per week, I think we can cover: Topic Bootcamp Online 1. Unix shell ls and cd; files and directories sort and grep; pipes 2. Version control update/edit/commit; merge rollback 3. Core programming all of it (but see below) not needed (but see below) 4. Validation unit tests; TDD defensive programming; error handling; data validation 5. Program construction One extended example; one demo of a debugger More examples; design for test; first-class functions 6. Associative data structures none everything 7. Databases none everything 8. Development methodologies overview of agile sturdy (plan-driven) lifecycle; evidence-based software engineering Topic #3, core programming, is the hardest to manage. If people have programmed in Python before, it can be a quick review (or omitted altogether). If they've programmed in some other interactive language, it can also be covered pretty quickly, but if they've never programmed before, or took one freshman course ten years ago, there's no way to teach them enough to make a difference in half a day. Even if there was, the other learners would undoubtedly be bored. The only solutions I can see are to restrict participation to people who can already do a simple exercise in some language, or to run one day of pre-bootcamp training for non- or weak programmers. Neither option excites me... Coming back to content, this plans means that we'll leave out a lot of useful things: Spreadsheets: lots of scientists use spreadsheets badly, but while we'd like to show them how to do so well, the only one they actually use, Excel, isn't open source or cross platform, and it's much harder to build programs around spreadsheets than around databases. Make: is very hard to motivate unless people are working with compiled languages—we've tried showing people how to build data pipelines using Make, but it's too clumsy to be compelling. Plus, Make's syntax makes a hard problem worse... Systems programming: knowing how to walk directory trees and/or run sub-processes is useful, but we think people can pick these up on their own once they've mastered the core. Matrix programming: really important to some people, irrelevant to others, and the people it's important to will probably have seen the ideas in something like MATLAB before we get them. Multimedia programming (images, audio, and video): people can learn basic image manipulation on their own; audio and video are harder, mostly due to a lack of documentation, but they aren't important enough to enough people to belong in our core. Regular expressions: are a great way to illustrate the idea that programs are data, and are very useful, but everything in the core seems more important, and it'll be hard enough to get through all that in the time we have. This is probably the one I most regret taking out... HTML/XML: there are lots of excellent tutorials on writing HTML, and while XML processing is a good way to introduce recursion (and, if XPath is included, to talk about programs as data once again), I believe once again that it's not important enough to displace any of the material in the core. Object-oriented programming: is probably the omission that raises the most eyebrows. We can introduce it fairly naturally when talking about design for test (more specifically, about interface vs. implementation), but in practice, most people get along fine using lists, dictionaries, and the classes that come with the standard library without creating new classes of their own. Plus, showing people how to do OOP properly takes a lot more time than just showing them how to declare a class and give it methods. Desktop GUIs: an excellent way to introduce reactive (event-driven) programming and program frameworks, but is less important than it was ten years ago (most people would rather have a web interface these days). Web programming: the only thing we can teach people in the time we have is how to create security vulnerabilities. Security: the principles are easy to teach, but translating them into practice requires more knowledge (especially of things like web programming) than we can assume our learners have. Visualization: everybody wants it, but nobody can agree what it means. Should we show people how to use a specific library to create 3D vector flows? Or the principles of visual design so that they can make nicer 2D charts? And no matter what we teach, will they actually learn enough to make a difference? Performance and parallelism: the most important lesson, which is in the core, is that the right data structures and algorithms can speed programs up more than any amount of code tuning. Everything after that is either inextricably tied to the specifics of a particular langauge implementation (performance tuning), or offers no low-hanging fruit (parallelism). The one exception is job-level parallelism, which could be included in the material on the Unix shell if an appropriate cross-platform tool could be found. C/C++, Fortran, C#, or Java: more to introduce fixed typing and compilation, but these are relatively low priority topics. We're going to start implementing this plan (or some derivative of it) at the beginning of February, to be ready for workshops starting at the end of that month. We'd welcome feedback; in particular, have we taken something out of the core that you think is more important than something that's in, and that could be taught in the time that's actually available? If you have thoughts, please let us know. Read More ›

The First Bootcamp of 2012
Greg Wilson / 2012-01-20
We just wrapped up the first bootcamp of 2012 at the Space Telescope Science Institute. 14 scientists with a wide variety of computational backgrounds spent two days learning about testing, version control, program structure, the basics of Python, and the psychology of learning and programming. We're following up with 6 weeks of online material, partly because that's what fits everyone's schedules, and partly to see whether a blended approach works better than either strategy on its own. And on a completely different topic, this diagram from the Discover magazine web site sums up every scientist-vs-journalist debate ever: Read More ›

Why Is This Hard?
Greg Wilson / 2012-01-15
I've been teaching scientists to program since 1998 (or 1986, if you want to start with my first lunch-and-learn for grad students in physics at the University of Edinburgh). Technology has advanced by leaps and bounds in that time, but I don't think it's any easier than it used to be to get basic software skills into people's heads. What makes it hard? Programming is intrinsically difficult. It's fashionable to claim otherwise, but abstract thinking is a fairly recent innovation in evolutionary terms, and our brains still find it hard. On the other hand, I don't believe that state machines and data transformations are any harder than high school algebra, and everyone we're trying to help has long since mastered that. Today's languages and tools make it more difficult. Setup (particularly installation) is, if anything, harder than it was twenty years ago, and even the cleanest languages are full of accidental complexity (particularly in their libraries). (And if you think otherwise, try running a programming workshop for non-programmers working on half a dozen different operating systems, with two or three slightly different versions of your favorite language installed, and then post your dissenting comment.) It's heartening to see that people are finally reviving research from the 1970s and 1980s into the usability of programming languages, but as we found out the hard way, it will be a long time before computer "scientists" start accepting scientific answers to these questions, much less acting on them. Our students' diverse backgrounds make teaching more difficult too. Our recent workshop at the University of Toronto had students from linguistics through chemistry to astronomy. Some of them had never used a command shell before; others were their labs' unofficial sys admins, and we saw similar variation in almost every other aspect of their computing knowledge. The solution, of course, would be to divide them into levels by topic, but— We don't have resources to teach widely or deeply. Tens of thousands of people could teach scientists and engineers basic computing skills [1], but we have no way to reach them—yet. One of our goals for the next six months/five years is to increase the pool of instructors by several orders of magnitude [2]. Even on a five-year timescale, though, we'll have to continue to rely mostly on volunteers, because— There's no room for computing in the curriculum. More precisely, faculty won't make room, because they think computing is less important than thermodynamics, phonology, or whatever other subjects make up the core of their discipline. I used to grumble about this, but I now accept that it's a rational choice: unless and until journal reviewers and grant agencies start asking hard question about how scientists produce their computational results, investing time in improving computational skills is a cost with uncertain rewards. And yes, there are a few exceptions here and there, but until we move to five- or six-year undergraduate degrees, they'll continue to be exceptions. Realistically, I think the best we can hope for in the next decade is that computing has the same standing as statistics, i.e., everybody has to know the basics because their other work depends on it, but more advanced knowledge is acquired on a discipline-specific need-to-know basis. Follow-through is hard. OK, so you just spent a couple of days at some kind of workshop: what now? If you're lucky, you learned enough about Python or the shell to start automating a few data analysis tasks, so a positive feedback loop will kick in. But if the problem in front of you is to speed up 80,000 lines of legacy C++, those two days probably aren't going to make a big difference. Yes, there are a lot of tutorials online that are supposed to help you, but in practice, you'll probably find those more frustrating than anything else they assume a lot of background knowledge you don't have, so you're not sure which ones actually move you closer to your goal. The proposed Computational Science area at Stack Exchange might help here, if it takes off, and we're hoping that running lesson-a-week online classes after workshops will help too, but it will always be hard for people to find time for "deep" learning, which is precisely what will make the next problem they run into easier to solve. Most of today's online teaching tools implement bad models of teaching. We've known for decades—literally, decades—that watching a video and doing some exercises is a lousy way to teach (see recent posts by Frank Noschese and Scott Gray for discussion). In programming terms, the root of the problem is that canned instruction assumes the teacher can accurately predict how learners are going to interpret and mis-interpret lessons—in software engineering terms, it's plan-driven rather than adaptive. In practice, different learners will mis-interpret lessons in different (and hard-to-predict) ways; in order to be effective, teaching needs some sort of agile feedback loop to correct for this, but that's exactly what most approaches to web-based teaching take out of the equation [3]. So, is it hopeless? Of course not: over the next six months, and (hopefully) the next five years, I believe we can make real progress on several fronts. We can certainly recruit and train more workshop organizers and instructors, and experiment with different kinds of online learning to see which will make follow-through easiest and most effective (which in turn depends on us coming up with ways to assess the impact we're having). If you'd like to help, please get in touch. [1] I get "tens of thousands" by taking a million competent programmers, multiplying by 10% (the proportion who can teach), and then multiplying by 10% again (the proportion who might be interested). Your made-up stats may vary. [2] The other reason this has to be a priority is that our learners' needs are as diverse as their backgrounds. Our learners want to jump straight from "what's a for loop?" to "how do I detect glottal stops in lo-fi audio?" or "how do I visualize turbulent flow of interstellar gas?" We're never going to be able to cover these with just a handful of content creators. [3] Note that I'm using "online" to mean recorded and/or automated, i.e., things that learners can do when they want. Other approaches that deliver traditional lectures or seminars over the web synchronously and interactively are a bit better, but don't scale: no webinar system I've ever seen gives the instructor the kind of feeling for the room that s/he'd get in a regular lecture hall. Read More ›

The What, Why, and How of Bootcamps
Greg Wilson / 2012-01-13
We've just added a single-page description of the two-day bootcamps we're planning to run in the next six months. In brief, their aim is to ensure that people have a few core skills, so that they can tackle our online material productively, and to help them get past startup hurdles such as software installation. If you have questions, comments, or suggestions, please add them to that page; if you'd us to help you organize and run a bootcamp, please get in touch. Read More ›

Sloan Foundation Grant to Software Carpentry and Mozilla
Greg Wilson / 2012-01-11
We are very pleased to announce that the Sloan Foundation has generously agreed to fund six months of work by Software Carpentry and the Mozilla Foundation. The proposal we submitted, which outlines what we're going to try to do, is included below—it's a lot of work, but we're very excited to have the opportunity to move Software Carpentry forward. Details are below the fold... Teaching Scientists to Think Like the Web: Accelerating Scientific Discovery Through the Effective Use of Technology 1. What is the main issue, problem or subject and why is it important? Sharing information on the web, automating common processes, managing large volumes of data, and similar tasks are no longer the sole preserve of professional programmers. Increasingly, journalists, filmmakers, educators, artists, and other "end user programmers" find it necessary not just to use, but to create new software. This is especially true for scientists, yet the training available is usually outmoded, overly complex, or focused on the wrong skills. Mozilla and Software Carpentry hope to change all this. A decade after Udell's seminal report Internet Groupware for Scientific Collaboration, only a small minority of scientists use computers and the web to their full potential. The hidden costs of this are painful: tasks that should take minutes wind up taking hours, insights are missed, and collaboration is impeded. A 2008 survey found scientists spend 30% of their time wrestling with software, and most expect this figure to increase. The root cause is a lack of the basic skills that allow scientists to create and customize software or use the web as more than a publishing medium. But it does not have to be like this: open source software and browsers' ubiquitous "View Source" allow everyone to "look under the hood" to see how things are done. Scripting languages, HTML5, GitHub, and the like permit a Lego-like approach to programming that can allow scientists to manipulate data sets, crowdsource solutions, and share findings — if they know how. 2. What is the major related work in this field? A number of studies on how scientists use computers and the web have appeared in the past decade. The largest, by Hannay et al., found that most scientist learn what they know about using computers and the web through osmosis, which leads to crippling gaps in their skills. On the education and training side, only a handful of the more than one hundred papers presented at [SIGC11] focused specifically on scientists. Those that did invariably asked, "How can we use computers to teach science?" rather than, "How can we teach scientists to make technology do what they want?" Many recognize that this lack of skills is slowing scientists down, but most existing training meant to address the problem is flawed in one or more ways: Does not target scientists' specific needs. Most "Computing 101" courses are run for students from a range of disciplines, so applications and examples often fail to engage students from the sciences. Too much emphasis on programming. Programming is only one part of building useful software and using the web. Scientists almost always have to figure out "the other 90%" (discussed below) on their own. Too much emphasis on calculation. Number crunching is also only one part of how scientists use computers today. Managing data and sharing ideas with colleagues are already key to effective practice, and becoming more so every day. Too much emphasis on "big iron". Scientific computing is often identified with high-performance computing, which skews discussion and training toward the concerns of a small (but vocal) minority. Dr. Wilson's Software Carpentry project first started working to address these shortcomings in 1997. Now in its fourth major revision, the Software Carpentry web site has an active user base of 350 to 1000 individuals per day and its content regularly appears in courses delivered at laboratories and universities. Dr. Wilson's experience indicates that a modest investment in training can increase scientists' productivity significantly, while making their technology-based work more reliable and shareable. Hard data is difficult to obtain, but follow-on surveys, qualitative feedback, and testimonials often report "order[s] of magnitude" improvement in productivity; even the most conservative of these are typically phrased as "saving [me] a day a week". The key to increasing productivity is to focus on fundamental skills such as version control, testing, task automation, data management, and program design. While these are not as exciting as things with "cloud" and "peta" in their name, they are what actually empower scientists to solve today's problems efficiently and tackle new ones tomorrow. Our initiative will build on Software Carpentry's experience, and on Mozilla's efforts to make technology easy to understand and use through its work on Firefox, standards-based computing, and the Mozilla Developer Network. 3. Why is the proposer qualified to address the issue or subject for which funds are being sought? Mozilla Foundation Mozilla is a non-profit, 501(c)3 organization dedicated to promoting openness, innovation, and opportunity online. Best known as the maker of Firefox, we work to empower individuals to use and shape technology to their own ends. The principal activity of the Mozilla Foundation is teaching "webmaking" to non-programmers, such as scientists. The activities detailed in this proposal will build on three, existing Mozilla programs that support this objective: School of Webcraft (SoW): A partnership with P2PU and an online platform to support self-, peer-, and expert-led instruction and study groups. The program will house the online training developed and delivered under this initiative. Mozilla Developer Network: A large repository of DIY resources and best practices on how to build and create with the technology and tools of the open web. More than 10 million people visited the MDN web site in the last year. Open Badges: A distributed accreditation framework to support the award and display of badges by peers, experts, and institutions. Badges are a key mechanism through which to incent, recognize, and expand participation in distributed, peer-led, and other forms of learning. Dr. Greg Wilson Dr. Wilson is a 25-year veteran of the software industry who received ComputerWorld Canada's "IT Educator of the Year" award in 2010 and was co-winner of the Jolt Award for Best General Technical Book in 2008. His Software Carpentry initiative, which began as a training course at Los Alamos National Laboratory, has been accessed by more than 100,000 visitors since May 2007. The materials are freely available and have been used in courses at over a dozen universities and labs in six different countries. 4. What is the approach being taken? A Five-Point Approach To turn scientists from passive consumers of software into empowered users and makers, a successful approach must: Target graduate students. Their time is more flexible than that of undergraduates, but they are still focused on learning. They are often face-to-face with the challenge of making computers and the web work for them, instead of the other way around. Provide peer- and institutionally-recognized rewards to encourage students to make acquiring these skills, and passing them on, a priority. Solve immediate problems. Scientists always have pressing deadlines, so any training must be seen by them to solve problems that they realize they have. Use face-to-face instruction as a complement to online learning. A 2010 report from the US Department of Education found that combining the two produced better results than either on its own, which is consistent with our experience. Engage scientists in a larger learning community, so that they will pass their skills on to an ever-larger circle of colleagues. Institutional Engagement Much of Mozilla's work seeks to challenge and transform established practices within various fields. Borrowing from agile development methodology, Mozilla builds, launches, and tests new programs in short, iterative cycles; projects are allowed to experiment, fail, and regroup without significant up-front planning. Traditionally, this innovation takes place outside of institutional contexts to avoid potential pitfalls: burdensome process requirements, never-ending calls for "more research", and continual re-design at the planning stage. However, the nature of this project — working with graduate students — provides an impetus and opportunity for Mozilla to explore how to overcome these obstacles and engage with formal institutions. We have allocated a significant portion of the budget to engage computer science faculty in the delivery of the training content. Once we have established a successful framework, we will work with existing faculties and institutions to see the program become a standard component of their scientific training. Our hope is that the resulting exchange will provide a model to facilitate future collaboration between Mozilla and academic institutions. The Resulting Program Framework Under the experienced leadership of Dr. Wilson, Mozilla will: Migrate existing and produce new training materials that directly address the technology learning needs of scientists; Design and launch the first iteration of a self- and peer-led learning and badge program through the SoW; Organize and document the results of 4 in-person workshops; 2 through grassroots, peer-based instruction and 2 through conventional coursework at universities; Work with at least 4 institutions to examine their requirements for formal engagement and the necessity of this engagement to affecting the desired learning outcomes; and Gather and document the project findings to underpin future program iterations. Training Materials: The materials available through MDN and Software Carpentry will be framed to the specific needs of scientists. Mozilla community members will produce screencasts showing how to perform tasks of use to scientists. The resulting videos will be enriched with instructional and reference metadata using Mozilla's Popcorn.js technology, which allows for the integration of web content and video. Online Learning: Self- and peer-led courses will be offered through the School of Webcraft. Participants will complete learning challenges, find support, ask questions, and connect with the broader community of scientists across disciplines and institutions. A badge program will provide near-term incentives for both learning and mentoring; a framework to support viral, peer-driven engagement with the program; and facilitate recognition by partner institutions and potential employers. In-person Workshops: We will run 4 in-person workshops at colleges and universities across the United States, Canada, and the United Kingdom. (Letters from universities and colleges indicating their support for this are appended to this proposal.) The workshops will be hands-on, interleaving short tutorials with live coding sessions. Two workshops will follow a peer-led, grassroots model and take place at universities but outside of formal, faculty engagement. Two additional workshops will be offered in concert with faculty at universities. The resulting baseline will facilitate a comparison to inform future efforts. We presently have strong expressions of interest in hosting such workshops from: University of Wisconsin — Madison Michigan State University Georgia Tech University of British Columbia Utah State University Indiana University Queen Mary University London University of Toronto 5. What will be the output from the project? New, tailored technology training materials for scientists. Workshop curriculum and instructional materials. 15 hours of "how-to" video content produced by the Mozilla community and enriched with Popcorn.js enabled metadata. Online training in 'webmaking for scientists'. A minimum of 80 students completing 10 self-led learning challenges, and a badge program offered through the SoW and supported by the Mozilla community. Pilot implementations of in-person workshops. Two workshops delivered through a grassroots, peer-led approach, and two additional workshops delivered in concert with university faculty, both continued online through the self-led learning challenges mentioned above. Documented analysis of the relative success of each approach, as well as comparisons to the online-only workshops. Recommendations and plan for institutional engagement. Results and feedback from institutions and computer science faculty regarding their interest and requirements to engage in future iterations of the project. Evidence-based recommendations regarding the importance of this approach. Metrics The ultimate measure of our success will be whether scientists can do more research in less time and tackle problems they could not have tackled before. Both are difficult to measure, especially in the short term, so we will use several proxy metrics to gauge the project's success: Repeat participation and peer recruitment. The percentage of online course participants that offer or plan to offer in-person workshops and study groups on their own, as well as recommend the online courses to their peers. Badge display and associated reputation. The number of participants showcasing their badges through social media and other web sites will provide insight into the real and perceived reputation of the program. Institutional engagement. The number of universities who choose to experiment with and/or integrate the materials into their overall scientific training. Former students become makers. Mark Surman, our Executive Director, recently wrote, "Everything we're doing is about learning through making and collaborating on the web." Early participants in this program creating content for others to use will be the surest possible sign of success. Read More ›

Settings Our Sights a Little Bit Lower
Greg Wilson / 2012-01-04
A couple of days ago, I posted replies to some of the comments that people had made on my posts about Software Carpentry's future. To recap, I want SC to: offer learning materials so that people can work through them on their own; be a repository where people can evolve those materials; coordinate people who are organizing live workshops, offering technical support [1], etc.; and coordinate a distributed research program to evaluate Software Carpentry's effectiveness. It would be easier to achieve these four goals if we had merge-friendly formats for learning materials (both micro and macro), a large team of core content developers, and stable long-term financial support. Since we don't have any of those unicorns, what should we try to do in the next six months? Offer learning materials for self-directed use. If I was grading our existing content, I'd give it a weak B. While only a few episodes are outright failures from a teaching point of view, quite a few could use an overhaul based on learner feedback. What would make more difference, though, would be supplementing them with partially-worked examples, self-tests, and errata-style lists of common misconceptions and their corrections. I think it would take one person-month each to do this for our core topics (the shell, version control, basic programming, testing, sets & dictionaries, and software engineering). Be a repository for evolving those materials. Anyone who wants our stuff can get it from our Subversion repository, but the "evolving" part is harder. I know a lot of groups have used our content, usually after some extensive tweaking to meet their particular needs. What I don't know is how to get them to give their material back—there isn't a "make, share, and remix" culture in teaching as their is in open source coding [2]. Absent any brilliant insights, I'm going to set this one aside for now. Coordinate workshops. The inimitable Katy Huff came up to Toronto in November to help us run a workshop, and there are a couple of others in the pipeline elsewhere. This is where I hope to make the biggest strides in the next six months: if all goes well, I will put out a call later this month for people to run workshops in their labs, schools, and workplaces this spring. Evaluate Software Carpentry's effectiveness. This is easily the most important item in this post: without some reliable way to tell what's working and what isn't, improvement will be slow (and might not happen at all). I still don't understand why most software developers ignore most empirical studies in sofware engineering, but if we do organize a bunch of workshops this spring, I think we can also do some before-and-after questionnaires and interviews [3]. Of course, this would all be easier with your help—if you want to help out, please get in touch. [1] "Technical support" could be Stack Overflow-style Q&A, but I think real-time desktop sharing with voiceover would be more helpful, since by definition, novices usually don't know what's important to describe and what isn't. [2] Lots of people (including me) Google for other teachers' slides when they're making up lessons of their own; what they don't do is send their hacks back to the original authors. There are exceptions, of course; I'm particularly interested in LessonCast, a site where teachers can share video tips about lessons and other ideas. [3] Perhaps modeled on the ones in this paper, but with more of an emphasis on human (development) time than computer (running) time.. Read More ›

The Fire Last Time
Greg Wilson / 2011-12-31
Back in November, Justin Reich wrote a post titled "Will Free Benefit the Rich?" (re-posted as "Open Educational Resources Expand Educational Inequalities"). In it, he outlines two possible futures. In scenario 1, everyone benefits from free online resources in ways that narrow the gap between the rich and the poor. In scenario 2, everyone benefits, but the well-off benefit more, so the gap widens. What caused a fuss was his claim that the empirical data we have so far supports scenario 2: in the year of Occupy, saying that something is going to increase inequality is going to stir up strong emotions. Some of the responses were thoughtful: see, for example, Audrey Watters' "OER and Educational Inequality". Others, like Tom Vander Ark's "How Digital Learning Will Benefit Low Income Students", were more defensive. He is CEO of Open Education Solutions and a partner in Learn Capital, so he has a vested interest in online education being made of pure win. His post presents ten reasons why it online ed will help low-income students, but (a) he doesn't directly address Reich's point that it will help upper-income students more, and (b) he doesn't back his arguments up with data. Meanwhile, in the world at large, Anu Partanen (a Finn herself) wrote a widely-linked piece for The Atlantic titled "What Americans Keep Ignoring About Finland's School Success", which in turn refers to Pasi Sahlberg's book Finnish Lessons: What Can the World Learn from Educational Change in Finland?. Partanen's piece closes with this: It is possible to create equality. And perhaps even more important—as a challenge to the American way of thinking about education reform—Finland's experience shows that it is possible to achieve excellence by focusing not on competition, but on cooperation, and not on choice, but on equity... The problem facing education in America isn't the ethnic diversity of the population but the economic inequality of society, and this is precisely the problem that Finnish education reform addressed. More equity at home might just be what America needs to be more competitive abroad. Again, that's a politically sensitive thing to say in the year capitalism became a dirty word in American politics [1], but I think it's relevant to Software Carpentry. First, as complained before, the small minority of scientists who do high-performance computing get the lion's share of both money and attention. The majority, who mostly think of themselves as scientists who happen to use computers instead of computational scientists, aren't just less well served—in most cases, they're completely ignored. I think that if funders and reviewers focused on the needs of the majority, the rising skills tide would life everyone's computational boat. Second, and much more importantly, we want to help everyone, not just people in well-funded labs in first-world countries. We want to help geologists in the Nebraska outback, astronomers in Patagonia, and epidemiologists in Bihar do more with less pain. To do that, we're going to have to think hard about how our stuff is actually used, and listen to a lot more people. Coincidentally, Scott Gray just wrote a post titled "My Thoughts on Codecademy" that's relevant to this discussion. Scott has been doing online education longer than many of Software Carpentry's students have been doing arithmetic. His post begins like this: There is yet another new wave of start-ups emerging in the educational technology space and like those that came before, most of this new wave neglects to address some critical issues. Every few years, a new set of companies comes out with what they refer to as, "the next wave in digital education." However, these "new" methods and technologies are rarely actually new. Experienced educators who have followed the evolution of digital education since its inception over fifty years ago, have seen it all. The new distribution technologies offered by the new web don't actually enable new pedagogies that haven't been tried yet. Since the mid-1980s, there has been adequate technology and tools available to allow us to try out the entire array of pedagogical theories. Believe me, every combination of existing tools has been employed, and with a slight variance from subject to subject, very few methods used in conjunction with technology have been effective at improving educational outcomes. His six lessons are based on a lot more experience than my five, and I think everyone who's gung ho about online education (not just computing education) should read it carefully—twice. Going through it, I was reminded of two books on my father's shelves: Neil Postman and Charles Weingartner's Teaching as a Subversive Activity, and a collection edited by John Birmingham titled Our Time is Now: Notes from the High School Underground (with an introduction by none other than Kurt Vonnegut, Jr.). Both date from about 1970, and both were written in the belief that yes, this time things would be different. This time, they really would fundamentally change the nature of education. (The second book even thinks that new media—in their case, videotape—would be the great enabler.) They were wrong. Forty years later, the only changes are higher costs, stagnant or lower outcomes, greater inequality, demoralization, and the hollowing out of training in skilled trades [2]. Most of the people who are now touting online education don't say, "This time will be different," because they don't even know there was a last time, or a time before that. Most have never heard of Larry Cuban's Oversold and Underused; hell, most have never heard of Larry, or even of Seymour Papert, and know more about JQuery than they do about what research tells us about how people learn. If we ignore history, we repeat others' mistakes. If we ignore context, we reinforce existing inequalities. Avoiding both traps is this project's official new year's resolution; what's yours? [1] I'm sure they'll get over it pretty quickly. [2] Many of the places that used to teach people how to be welders or sound engineers have tried and failed to turn themselves into universities, primarily because of the latter's higher social status. Read More ›

Some Responses to Some Comments
Greg Wilson / 2011-12-31
Several people have written some useful comments on my recent "where are we going?" posts. It's exactly the kind of feedback I was after, so here are my answers. Goal #1: helping thousands of people each year. You propose two very broad ideas for what this would mean: a) community "co-learning" initiatives like Hacker Within, presumably using Software Carpentry content, or somehow organised by SWC? and b) more people contributing to SWC content, as well as supporting others online. In essence, the vision is for both offline and online co-learning communities to exist. It doesn't sounds like you envision SWC as an authorative source for instruction or community, but it is to be a hub of some sort, right? Hm... I don't like the word "authoritative" (do you think of python.org that way?). Setting that aside, I'd be satisfied if we were either helping people who are running workshops, or a hub for people to share learning materials. Of course, I'd be happier if we were both, and I think they go hand-in-hand: Research has shown that blended learning is more effective than either offline or online on its own, so we should try to encourage that model. It's easier to run your first few workshops if you don't have to create all the material from scratch. If people are trying to meet local needs, they're going to be creating materials to meet those needs. Other people are likely to have the same or similar needs, so we ought to make it easy for them to find and recycle what others have done. The merging problem. You've identified the "merging problem" as central to ramping up SWC's reach...why is [it] so central? It isn't as "central" to Software Carpentry as coming up with a way to tell if we're on the right track or increasing the project's bus factor. I emphasized it in that post because (a) it isn't as widely recognized as an impediment to developing and sharing learning resources as it should be, and (b) it's something that lends itself to technical solutions (which I, as an engineer, am always more comfortable with than purely social solutions). ...even if the merging problem turns out to be a relevant hindrance... [it is] so deep and ill-defined that...I have no reason to believe the payoff from tackling it will arrive in any practical amount of time. Agreed. That said, there are things we can do to make it easier for people to contribute customizations and extensions. The most important is to use a mergeable format like hand-written HTML for our slides instead of PowerPoint. I moved away from this because it forces us to segregate text and graphics, where PowerPoint makes it easy to mingle them; I suspect that if we do switch, I'll decide a year from now that we should switch back (again). Institutional support. One hindrance you identify is that without institutional support, taking software classes will be hard for students/profs/etc. to justify doing. Does this suggest that another vision in 5 years is for there to be a certain level of institutional support for Software Carpentry? Realistically, I don't expect more institutional support or recognition five years from now than we have today. I think that when our existing system of higher education implodes, it's going to do so faster than anyone ever thought possible (cf. the final days of the Soviet Union), but I think it would be foolish to count on this happening within five years. Conversely, I think that today's model of scholarly publishing is going to last a lot longer than many optimists think it will. That means that journals and funding bodies will look closely at scientific software as infrequently five years from now as they do today, which in turn means that most scientists still won't have a compelling reason to up their game. However, I hope that in five years many (most?) will believe that the writing is on the wall. Goal #2: We know what we're doing is helping. The 5 year vision is then...what? We have some justifiable and principled way of gauging the usefulness of these teachings, and that we actually are measuring and reporting on them? Yes to "gauging" and "reporting", but I don't know about "measuring". We could show that it takes people (much) less time to write a script to analyze their data after we've shown them a few things. We might even be able to show that their scripts are (much) less likely to be buggy. But what we're really trying to do in many cases is change what they're trying to do. We don't want them to copy and paste faster; we want them to write a script that does the copying and pasting, and then write a Makefile that runs the right scripts in the right order whenever there's new data to process, and then a cron job to poll for new data files, and so on. Marian Petre, Jorge Aranda, and others finally made me understand that rigorous qualitative methods are a better way to tackle these kinds of questions; Mozilla's Geoffrey MacDougall has a good post about the fetishistic (ab)use of metrics in the public and non-profit sectors, and thinks that approaches like Most Significant Change are more useful. That lengthy caveat aside, yes, I think it's essential to develop some generally-accepted way to tell if we're actually doing good or not, to apply it regularly, and to share the results. Without that kind of feedback, we'll have to (continue to) rely on individuals' gut instincts to steer the project; just the thought makes me weary. Misconceptions. While "the space of possible misconceptions grows very, very quickly," does the space of common misconceptions grow that quickly? ... I would suggest that it likely pays to start collecting them in a formal way to see. I strongly agree; many of the improvements in math and physics education since the 1980s are built on the realization that clearing up students' misconceptions is at least as important as giving them new facts. Collecting and classifying misconceptions in order to sharpen teaching is what I'd be doing in academia if anyone had been willing to fund me, but as I've said elsewhere, NSERC, Google, Microsoft, and almost everyone else turned down every application I sent them. (The one exception was The MathWorks [1], whose support allowed us to survey how almost 2000 scientists use computers in their work.) Discussion and community. ...there is currently no way for a community to build up around this site that can communicate with each other. Maybe that's not something that you want, maybe a forum would take too much time to manage, but if you want people to get involved, I think you need to give them a space they can post their ideas/questions/comments and have other people respond to them. Agreed. We've had forums for courses, but have otherwise relied on comments and counter-comments for discussion of our topics. It would be easy to set up something more sophisticated, but getting people to come and use it would be much harder. The computational science area on Stack Overflow that's currently in beta might turn into this for us; I'd be very interested in suggestions for ways to mash up our stuff and theirs. My thanks to Jon Pipitone, Elizabeth Patitsas, and Bill Goffe for their input; please keep it coming. [1] Thanks, Steve. Read More ›

Fork, Merge, and Share
Greg Wilson / 2011-12-30
As George Pòlya said, sometimes the best way to solve a problem is to solve a more general one. In that spirit, this post was originally going to be about the mechanics of helping thousands of people a year (which is the first of our five-year goals). After getting feedback from a few people on early drafts, though, it has morphed into a discussion of something that I hope you'll find more interesting [1]. Let's start with version control. As this potted history explains, what really made version control invaluable wasn't its "infinite undo". Instead, it was the ability to merge things, which meant that many people could work independently and then bring what they'd done together when it made sense to do so. CVS was the first system built on this model, but its latest incarnations, like Mercurial and Git, have pushed the idea even further. With them, there is no "master copy"; instead, every copy is a peer of every other, so that anyone can merge with anyone at any time. Yes, it can be chaotic, but the last couple of years have proven that the benefits—particularly the increased freedom to tinker that this model supports—outweigh the risks. GitHub is the poster child for this. Like SourceForge before it, GitHub allows anyone to create a repository for an open project. Crucially, though, it also makes it easy for people to clone projects, make changes, and then offer those changes back to the author of the original. This was always possible with earlier system, but GitHub has made it routine. And when I said "open project", I didn't just mean software: there are books being developed through GitHub as well. Admittedly, most are on technical topics, but there's no reason the model couldn't be used for other kinds of content [2]. Could it be used for learning materials? I.e., would it be possible to create a "GitHub for education"? Right now, I think the answer is "no", because today's learning content formats make merging hard. PowerPoint remains the tool (and format) most commonly used for individual lessons, but there aren't good open tools to merge PowerPoint files [3]. As a result, if someone takes the Software Carpentry lecture on regular expressions, moves a few slides around, and cleans up a few examples, it can take me almost as long to merge their changes back into my copy as it would take me to make those changes myself. Shifting from micro to macro, the closest thing we have to a standard format for lessons is SCORM, but it's as clumsy and expensive to work with as SOAP. What's more, to the best of my knowledge there aren't any tools out there to help people find differences between two SCORM packages, much less merge them. And having the kind of metadata that's in SCORM really does matter if we want to reach lots of people. There's more to teaching that putting facts in front of people; when it's done well, teaching is about organizing those facts into a coherent narrative so that learners can see how the facts fit together. Using open source software as an analogy once again, learning plans are like architectural documentation; you don't have to have it, but people will find it a lot easier to understand, use, adapt, and contribute to your project if you do. Whatever a "GitHub for education" would look like, it would not be yet another repository of open learning materials. There are lots of those already, but almost all their content is write-once-and-upload, i.e., they seem to be thinking in terms of re-use rather than collaboration. Sites like the Khan Academy and P2PU don't do this either: both are free, but the first isn't open (I can't hack their content), and the second is about setting up courses, rather than sharing course content in a reusable, remixable way. And that, I think, is going to be the key to reaching our goal of helping thousands of people a year. Research has shown that blended learning—the combination of traditional synchronous classroom instruction with its online asynchronous counterpart—works better than either on its own. Its concrete realization for Software Carpentry would be to combine intensive two- or three-day workshops with weeks of slower self-directed exploration [4]. Since every group's needs will be slightly different, we need to make it easy for people to clone material (each other's as well as ours), customize it, and then share those changes. The third is currently missing, which is why this project's bus factor is still 1. We don't have the resources to build the tools, hub, and community that would solve this problem, but other interested parties do. As I said at the outset, maybe the way to solve Software Carpentry's problem is to solve one that's more general... [1] And less despondent. It's hard to talk about the online teaching tools that are available today without sinking into an epic grump of nearly Scottish magnitude. [2] This description makes GitHub sound a lot like some weird kind of wiki. It certainly does share some of the social aspects of things like Wikipedia, but version control works a lot better for complex content (like source code or high-quality learning materials). [3] An attempt to get some built as part of GSoC 2011 led nowhere; there are some closed source options, but those are mostly aimed at Word and Excel. [4] Combined with desktop sharing and crowdsourced assessment, but those are subjects for a future post. Read More ›

Yet Another Survey
Greg Wilson / 2011-12-29
Prakash Prabhu and others recently published "A Survey of the Practice of Computational Science" based on information from 114 researchers at Princeton University. The emphasis is different from that of the survey Hannay and others (including me) did in 2008-09, but the findings are broadly similar. Read More ›

What Success Looks Like Five Years Out
Greg Wilson / 2011-12-24
Having talked about what I've learned and how well our teaching measures up, I'd like to explore what success would actually look like for Software Carpentry. Our long-term objective is to make productive and reliable computing practices the norm in science and engineering. My model is the way statistics became a "normal" part of most scientists' training in the 20th Century. Most psychologists and geologists are emphatically not statisticians, but most of them know (or at least have known, at some point) what a t-test is, what a correlation coefficient means, and how to tell when they're out of their depth. Equally, most scientists should know how to pipe a few simple filters together to clean up their data, how to use version control to track and share their work, whether their computational results are as trustworthy as their experimental results [1], and when to go and talk to either a real numerical analyst or a professional software developer. On a five-year timescale, this translates into three concrete goals: We are helping thousands of people every year. We know that what we're doing is helping. We have a double-digit bus factor. We are helping thousands of people every year. Right now, we change the lives of a few dozen each year—certainly no more than a hundred. In order to scale up, we need: dozens of people running Hacker Within-style workshops in labs and at universities, and dozens more contributing content (particularly exercises), answering questions, offering one-to-one support [2], etc. We need to do a lot to make this happen: create meta-content to tell people how best to teach and learn the actual content, build a distributed community, and so on. The key piece, though, is some kind of institutional recognition for participation, such as non-credit transcript items for grad students who organize workshops. Self-sacrifice for the general good only goes so far: given a choice between cranking out one more paper, however obscure, or organizing a workshop, most scientists have to do the former (at least until they have tenure), because the latter isn't taken into account by promotion scoring formulas. I think Mozilla's Open Badging Initiative is a good solution to the general problem of accrediting non-classroom competence, but we need to find a way to translate that into something that universities and industrial research labs can digest. Why not just get schools to offer Software Carpentry as a credit course? We've been down that road, and the short answer is, they mostly won't. Every curriculum is already over-full; if a geology department wants to run a course on programming skills, they have to cut thermodynamics, rock mechanics, or something else. I might believe our stuff is more important, but existing scientific disciplines don't consider it part of their core (not even computer science). Until we build a critical mass of supporters [3], we'll have to work in the institutional equivalent of that windowless room in the basement that still smells like the janitorial storage closet it once was. We know that what we're doing is helping. Testimonials from former students are heartwarming, but how do we know we're actually teaching the right things? I.e., how do we know we're actually showing scientists and engineers how to do more with computers, better, in less time? The only honest way to answer the question would be to do some in-depth qualitative studies of how scientists use computers right now in their day-to-day work, then go back and look at them weeks or months after they' had some training to see what had changed. I think this kind of study has to be done qualitatively, through in-person observation, for two reasons: We ran the largest survey ever done of how scientists develop and use software. Almost 2000 people responded to dozens of questions, and while we discovered a lot about when and how they learn what they do know, it didn't tell us anything about how much they know. The problem was calibration: if someone says, "I'm an expert at Unix," does that mean, "I've hacked on the Linux kernel" or "I use a Mac when everyone around me uses a PC"? Follow-up questions uncovered both interpretations; without some probing one-to-one analysis as a foundation, we have no way to estimate their distribution. Our goal isn't really to change how quickly people do the same old things; it's to change the things they do. Someone who understands ANOVA probably designs and runs their experiments differently (and more effectively) than someone who doesn't; similarly, someone who understands the value of unit testing probably writes more reusable and more reliable code. Time-on-task measurements don't reveal this. As I've said many times before, though, nobody seems to want to fund this kind of study. I've personally applied to three different government agencies and four different companies, without luck, and I know other people who have similar stories. It wouldn't take much: $50K to get started, or even $250K to do the whole thing properly, is peanuts compared to what those same agencies and companies are throwing into high-performance computing, or to the cost of the time scientists are wasting by doing things inefficiently. But as I've also said many times before, most people would rather fail than change... Ranting aside, Software Carpentry needs to do this kind of study on a regular (ideally, continuous) basis in order to keep its offerings relevant and useful. That means we need to have stable long-term funding, and that brings us to the third (and possibly most important) point: We have a double-digit bus factor. Software Carpentry is still essentially a one-person show. If we're to help more people and do the kinds of studies needed to find out if we are actually helping, we need more people to get involved. I dislike the phrase "building community", but that's what we need to do. [1] See Cameron Neylon's post on good practice in research coding for a thought-provoking discussion of what it's fair to expect from scientists. [2] Support could be either classical tech support ("Share your desktop with me for a minute, and I'll see if I can figure out why the Python Imaging Library won't install with JPEG support on your machine"), or personalized feedback and assessment ("I just looked at your solution to exercise 22, and here's what I think"). As I said in the previous post in this series, research has shown that this kind of "deep feedback" is crucial to real learning, but it is exactly what automated grading of drill exercises doesn't provide. [3] We actually only need supporters in two camps: journal editors (who could insist that scientists provide their code for peer review, and that those reviews actually get done), and funding bodies (who could insist that code actually be reusable). Given those requirements, it would be in most scientists' own interests to improve their computing practices. Oh, and I'd also like a pony for Christmas... Read More ›

Organizing Instruction and Study to Improve Student Learning
Greg Wilson / 2011-12-24
I had breakfast a couple of days ago with Jon Pipitone, a former student who has helped out with Software Carpentry off and on in the past. When we discussed my post summarizing what I've learned so far about online education, he had several questions and suggestions (thanks, Jon). I'm still digesting everything he said, but there is one point I'd like to act on now. Back in 2007, the US Department of Education's Institute of Education Sciences published a 60-page report Organizing Instruction and Study to Improve Student Learning. It's a great resource: seven specific recommendations are summarized in clear language, along with the evidence that backs them up. The report even classifies that evidence into three levels: Strong: supported by studies with both high internal validity (i.e., ones whose designs can support causal conclusions) and high external validity (i.e., studies that include a wide enough range of participants and settings to support generalization). This includes the gold standard of experimental research: randomized double-blind trials. Moderate: supported by studies with high internal validity but moderate external validity or vice versa. A lot of studies necessarily fall into this category because of the difficulty (practical or ethical) of doing randomized trials on human subjects, but quasi-experiments and/or experiments with smaller sample sizes qualify. Low: based on expert opinion derived from strong findings or theories in related areas and/or buttressed by direct evidence that does not meet the standards above. For example, applying a well-proven theory from perceptual psychology to classroom settings, but only validating it with experiments involving small numbers of students, constitutes, would be considered a low level of evidence. The recommendations themselves are: Space learning over time. Arrange to review key elements of course content after a delay of several weeks to several months after initial presentation. (moderate) Interleave worked example solutions with problem-solving exercises. Have students alternate between reading already worked solutions and trying to solve problems on their own. (moderate) Combine graphics with verbal descriptions. Combine graphical presentations (e.g., graphs, figures) that illustrate key processes and procedures with verbal descriptions. (moderate) Connect and integrate abstract and concrete representations of concepts. Connect and integrate abstract representations of a concept with concrete representations of the same concept. (moderate) Use quizzing to promote learning. Use pre-questions to introduce a new topic. (minimal) Use quizzes to re-expose students to key content (strong) Help students allocate study time efficiently. Teach students how to use delayed judgments of learning to identify content that needs further study. (minimal) Use tests and quizzes to identify content that needs to be learned (minimal) Ask deep explanatory questions. Use instructional prompts that encourage students to pose and answer "deep-level" questions on course material. These questions enable students to respond with explanations and supports deep understanding of taught material. (strong) This summary doesn't do the report justice: it devotes several pages to each recommendation, and includes advice both on how to implement them, and how to overcome common roadblocks. So how well does Software Carpentry implement these ideas? More specifically: What things are we doing already? Why aren't we doing the things we're not? If the answer is, "Because it's hard," can we make it easier? 1. Space learning over time. No. On one hand, students are dipping into the material when and as they want, and are free to return to any part of it at any time. On the other hand, they don't have to return to any of it, and when they do, they're returning to exactly the same material, not a different take on the same content. 2. Interleave worked example solutions with problem-solving exercises. We don't do this now. We could provide the worked examples, but the exercises would be much harder to do: We have no practical way to assess students' performance. (In fact, we can't even tell if we got their fingers on the keyboard.) Yes, we can check that they produce the right output for simple programming problems, but (a) that doesn't tell us if they got that output the right way (or a right way, since there's usually more than one), and (b) that doesn't work for second-order skills like designing a good set of tests for a particular function. In practice, I believe the majority of students wouldn't do the exercises unless the web site forced them to. I know I wouldn't, even when presented with the research behind this recommendation. The only way to make people do them would be to lock them out of some site content until they had successfully completed exercises on prerequisite material, which (a) conflicts with our open license, and (b) makes the site less useful to people who are really just looking for a solution to a specific problem that's right in front of them. 3. Combine graphics with verbal descriptions. The word "verbal" is important here. Research by Richard Mayer and others has shown that if you present images, text, and audio simultaneously, learning actually goes down, because the brain's two processing centers (linguistic and visual) are having two handle two streams of data each in order to recognize words in text and synchronize them with the audio and images. That said, we have to provide textual captions and transcripts for people with visual disabilities, people whose English comprehension isn't strong, and people who prefer to learn from the printed page. What we really need is tools that do a better job of disentangling different content streams (video, audio, narration, and diagrams); they are starting to appear, but the 21st Century replacement for PowerPoint that I really want doesn't exist yet, and we don't have the resources to create it. 4. Connect and integrate abstract and concrete representations of concepts. We certainly try to. I don't know how to gauge whether we succeed. 5a. Use pre-questions to introduce a new topic. The report recommends giving students a few minutes to tackle a problem on their own with what they already know to set the scene for introducing a new concept. We don't do this per se, though some lectures (like the ones on regular expressions and Make) do use a problem-led approach. Again, I doubt students working on their own would actually take the time to do the exercise unless we forced them to, but perhaps we could show them someone going up a couple of blind alleys (e.g., trying to parse text with substring calls instead of regular expressions) before introducing the "right" solution to the problem? 5b. Use quizzes to re-expose students to key content. I used to hate cumulative midterms and final exams because they forced me to dredge up things I hadn't used in weeks. Weeks! Now, as a teacher, I think they're great, and the evidence showing they reinforce learning is strong. But while we could make some sort of cumulative self-test available, most students wouldn't benefit come back to our site to do it, and so wouldn't benefit. Even if they did come back, we have no way to give them a meaningful assessment of their performance: the richer the question, the more ways to answer it there are, and the less useful automated marking would be. 6a. Teach students how to use delayed judgments of learning to identify content that needs further study. We can't do it. All we can do is tell them that they'll learn more if they come back and review things later to see how much they've actually understood, but (a) they've heard that before, and (b) if they're coming to our material for help with a specific problem that's in front of them right now, they're unlikely to make time in a month to review what they learned. 6b. Use tests and quizzes to identify content that needs to be learned. This recommendation is meant to be read from the student's point of view: they should use tests to figure out what they don't understand so that they can focus their review time more effectively. Again, we can create self-text exercises, but we have no practical way to give students useful feedback on any but the simplest. 7. Ask deep explanatory questions. The recommendation continues, "These questions enable students to respond with explanations..." Once again, we're limited by the fact that we can't assess anything except drill-level skills. Our overall score is pretty poor: of the nine specific practices, we only do three or four. The others stumble over two issues: Most students won't voluntarily revisit material (they're too busy solving their next problem). There's no practical way to assess how well they're doing on anything except simple drill exercises. The second worries me more. As I wrote several weeks ago: To paraphrase Tolstoy, successful learners are all alike; every unsuccessful learner is unsuccessful in their own way. There's only one correct mental model of how regular expressions work, but there are dozens or hundreds of ways to misunderstand them, and each one requires a different corrective explanation. What's worse, as we shift from knowing that to knowing how—from memorizing the multiplication table to solving rope-and-pulley dynamics problems or using the Unix shell—the space of possible misconceptions grows very, very quickly, and with it, the difficulty of diagnosing and correcting misunderstandings. Some research has found that crowdsourcing assessment among peers can be just as effective as having someone more knowledgeable do the grading. We don't know if that works for programming, though, and even if it does, will enough people voluntarily give feedback on other people's work (promptly enough) to be useful? Locking down content until they have will just drive people away rather than helping them learn. Solving the "meaningful feedback" problem is, in my opinion, one of the biggest challenges open online learning faces; I'd welcome your thoughts on where to start. Read More ›

What I've Learned So Far
Greg Wilson / 2011-12-20
I worked on Software Carpentry full-time for a year starting in May 2010. In that time I created over 140 PowerPoint decks, about 12 hours of video, and taught the course six times (twice online and four times in person). I also learned a few things that I hope other people will find useful: Online teaching isn't as effective as in-person teaching. Today's tools suck. If content is king, then community is queen, pope, emperor, and president-for-life. If you don't know how to tell if you succeeded, you're going to fail. If you don't take the time to explore what other people have done, you deserve to fail. Online teaching isn't as effective as in-person teaching. "I talk but don't listen, you listen but don't speak" is a lousy teaching model, but that's all recorded video is. Ask-and-answer is much more effective, particularly when you can see the baffled looks on students' faces, but no present-day web-based technology can deliver that for more than a handful of students at a time. In my gloomier moments, I fear there's some sort of conservation law at work: what we gain in reach, we lose in effectiveness. However the experiments we did with chat sessions seemed to work well; we should have experimented with that more, and with desktop sharing as well. (But hey, did you see what I did there? I said that one kind of teaching isn't as effective as another, but didn't tell you what I mean by "effective", how I'm assessing it, or what data I've gathered. I'll come back to this later.) Today's tools suck. More specifically, they suck time out of instructors. It takes me half a day to put together a one-hour lecture on something I know well; doing an hour of video takes three days or more. That might be economical if measured in hours of prep per student reached (even when "reached" is scaled down as per the previous point), but what really hurts is how long it takes to do revisions. If I spot a typo in a PowerPoint deck, I can fix it in a couple of minutes. If I want to patch a video, though, it'll take at least half an hour; that's a fifteen-fold increase, which is a powerful disincentive to incremental improvement. What's worse, the result will be disappointing: a person's voice never sounds exactly the same from one day to the next, so patches to a lecture's sound track are painfully obvious without expensive professional editing. And what am I supposed to do if I want to recycle someone else's material? Re-record the whole thing, or drop a couple of minutes of Canadian tenor into someone else's Australian contralto? The only practical solution I can see is to break things down even further than I have: instead of five-minutes videos, I should do thirty-second clips so that re-recording the whole thing is an acceptable cost [1]. I should also: Deliberately mix up the narrators so that voice changes are regular occurrences instead of occasional jarring distractions. Provide separate audio tracks for people who want to listen along while paging through the slides, rather than watching a video. (We get a lot of hits from developing countries, where broadband can't be taken for granted.) Use something like Popcorn.js to link the media, the transcripts, and the screen captures. (Right now, the videos are on the same page as the transcripts and screen captures, but otherwise disconnected.) Allow in situ commenting. Embed the source code for all our example programs in the pages as well (assistive tools for the blind don't understand things that appear only as pixels in a screen capture, and neither do search engines). Embed the diagrams in a format I can hyperlink and dynamically reformat, e.g., as SVG than as JPEGs or PNGs. None of these things is particularly hard except the last—the tool I want doesn't appear to exist yet. They'd all take time, though, both up front and for maintenance. Before anyone puts in that time, there's another point to consider: If content is king, then community is queen, pope, emperor, and president-for-life. And not just for users: it's at least as important for developers. I should have stopped creating new content in December 2010 and spent the next four months getting more people involved in the project. Running the course online the way we did wasn't nearly enough; only three grad students made episodes (for pay) since work on Version 4 started in May 2010, and only one volunteer has contributed stuff pro bono. (Thanks, Ethan—you're a star.) A few other people have helped teach the course, but its bus factor is still 1. That isn't just a threat to its future: if only one person is deeply into the project, who can they bounce ideas off? Who will hold down the fort when they're busy with other things? And who will keep them going when their enthusiasm flags? The answer to this one is something I've long resisted. Lots of open teaching hubs have sprung up in the past few years, like P2PU, Udemy, Student of Fortune, and BlueTeach. They all have some kind of community tooling (some more than others), but none seem to offer better tools for building content than I'm using already. The main argument for moving Software Carpentry there is the same argument behind open source portals like SourceForge and GitHub: better findability, the increased usability that comes from having a standard interface, and so on. The main argument against is the risk of platform capture; the real reason I haven't done it, though, is that I'm reluctant to see my pet project become just another bunch of pages on someone else's site. I'm sure I'll work through that eventually... My final two takeaways are somewhat personal as well. If you don't know how to tell if you succeeded, you're going to fail. Software Carpentry's goal is to help scientists and engineers use computers by teaching them basic, practical skills. I think I can tell who "gets it" in person, but I have no idea how to tell remotely. How long did it take someone to solve that exercise? How many blind alleys did they go down? And do they now understand the concepts well enough to apply them to the next problem they run into? Scores on timed drill exercises might tell us some things, but without (literally) years of painstaking refinement, they can't help us diagnose misconceptions [2]. More generally, Software Carpentry's real aim isn't to show people how to do today's work faster. It's to show them how to do entirely new kinds of work, things that were out of reach of their old skills. The only way to tell if we've accomplished that would be to observe them before and after the course [3]. Surveys and other kinds of self-reporting can't tell us, not unless they've been calibrated by exactly this kind of before-and-after observation. The problem is, no-one seems to want to fund it: they'll spend millions on supercomputers, and millions more on salaries for faculty and grad students, but not a penny to figure out whether those investments are working, or how to make them pay higher dividends. Things are different in K-12 education, where there's actually too much emphasis on assessment right now (or at least, too much emphasis is misdirected), but even there, we're still struggling to figure out what web- and computer-related skills people ought to have, much less how to tell if they've got 'em. Finally, if you don't take the time to explore what other people have done, you deserve to fail. This was my biggest mistake; it's no consolation to see others making it as well. Imagine someone came to you and said, "I'm going to build a massive online multiplayer fantasy game, and boy howdy, it's going to be great!" If you asked them how it was going to be different from World of Warcraft©, and their response was, "What's that?", you'd probably feel as weary as I now do when I hear someone brush aside decades of pedagogical research. How many of the people who are trying to teach online have ever taken an online course themselves? How many have taken more than one? I haven't: I signed up for an online course about online teaching at a local university last summer, but dropped out when it became clear that my fellow students were there to earn professional advancement credits rather than to learn. Now that I am doing my homework, I've discovered dozens of wheels I didn't need to reinvent, and dozens of misconceptions that I shared with most other well-meaning but uninformed newcomers [4]. I'm still struggling to figure out how to apply what I'm learning to the specific problems in front of me, but that's only fair. After all, I'm asking Software Carpentry students to do that every day. So here's the bottom line. Some people will figure things out no matter what or how you teach, so survivor bias will easily fool you into thinking that your teaching is responsible for their success. Knowing what other people have done, how effective they've been, and how we know isn't just a matter of professional courtesy. It's what you have to do if you're serious about maximizing the odds that someone actually learning from what you're doing. [1] Lowering production values is of course another option, but some research has shown that this leads to lower student retention. If anyone knows of more recent or more relevant results, I'd welcome a pointer. [2] As Will Rogers said, "It isn't what we don't know that gives us trouble, it's what we know that ain't so." One of the reasons peer instruction is so effective is that it's better than lecturing at correcting students' misconceptions. [3] If they go through the material in some kind of course. If they're dipping into it on their own, as they need it, then I have no idea how to tell how much we're helping. [4] I've also discovered that most books about education are badly written—it's as if Jossey-Bass took Strunk & White, put "don't" in front of each recommendation, then gave it to their authors. I think this is one of the reasons people like me don't do their background research: it's hard to keep plugging away when it feels like being smothered under a pile of wet sheep. Of course, I could just be looking for excuses to skip the thinking bits and get back to coding. Read More ›

It Just Keeps On Hurting
Greg Wilson / 2011-12-20
I received email a few days ago from someone who had just found this site (reproduced here with permission): I am working on getting myself set up to do scientific programming in Python on a MacBook Pro. I plan to use mySql and Pentaho. I am new to open source, Mac, and Python. Where I've run into the most problems is getting a working environment established with all the tools in place. For instance, I ran into a huge problem trying to install mysqldb, the Python/MySQL interface package. Trying to resolve that, I've encountered a bewildering array of alternative ways to do things, each accompanied by strong opinions and each entailing a different group of packages. One example being package managers: easy_install, pip, various brews, macports, etc. I don't really care about some absolute right way—I just need working, consistent tool set. I responded: I wish I could help you—I really wish I could help you, but installation is Python's weakest point. Every time we run a class it seems as if students spend almost as much time wrestling with packages as they do with everything else put together. "Bewildering" is an accurate description, and while there's some hope of things getting better, we aren't there yet. Virtualenv + pip isn't a complete solution, but it does make it easier to back out failed installations (which I guess is something). I'm sorry I don't have anything better to offer. And a colleague's response was: Instead of having a systematic package installation solution, I mostly just plow through a series of failures. That is, if I find I don't already have a module via the EPD [the Enthought Python Distribution], I usually also find it's not available through macports. Next, I usually end up trying to easy_install it. If that doesn't work, I resort to pip. Doing this for each new package that I need is not pretty. I wouldn't recommend it, and I applaud you for trying to bring order to your life instead... Anyway, unfortunately for you, neither pentaho or mysqldb are installed with the EPD, and neither apears to be available through macports, so my only pieces of positive advice here are really quite useless. This continues to be the biggest headache our students face, not least because they have to deal with it on day zero, before they've learned what they need to know in order to diagnose and fix problems. I'm slowly coming around to the notion that we should just give students a virtual machine to use for the first couple of days, and only then try to get things installed natively on their machines. Opinions on this (particularly ones backed up by experience) would be appreciated. Read More ›

New Features in Excel for Scientists
Greg Wilson / 2011-12-13
Carly Strasser, of the Digital Curation for Excel Project, has posted some ideas about new features that could be added to Excel to make it more useful for scientists. In brief, the list is: Generate metadata. Generate a data citation for the data file. Check the spreadsheet for export compatibility. Link to archive services. Feedback and suggestions would be welcome... Read More ›

How to Teach Webcraft and Programming to Free-Range Students
Greg Wilson / 2011-12-07
I will be running a P2PU course starting in January on teaching free-range learners how to program and build stuff on the web. The blurb is below; anyone who wants to can sign up to follow along or take part (we expect it will require 3-4 hours/week from mid-January to some time in April). I'm not an expert on these subjects by any means, but I've learned a few things from running Software Carpentry that I think are worth sharing, and hope that this course will give me a chance to learn more. If you're interested in teaching scientists how to do things with computers, please come and join us. How to Teach Webcraft and Programming to Free-Range Students What do we know about how novices learn webcraft and programming, why do we believe it, and how can we apply that knowledge to free-range learners? Right now, people all over the world are learning how to write programs and create web sites, but or every one who is doing it in a classroom there are a dozen free-range learners. This group will focus on how we, as mentors, can best help them. Topics will include: What does research tell us about how people learn? Why are the demographics of programming so unbalanced? What best practices in instructional design are relevant to free-range learners? What skills do people need in order to bake their own web? How are grassroots groups trying to teach these things now? What's working and what isn't? Read More ›

Three Short Thoughts
Greg Wilson / 2011-11-29
A BBC article title "Coding — the New Latin" resonated: Latin was the language of learned discourse in the formative years of modern science, but not something most people spoke day-to-day. I think that's a good model for computing in the sciences; like statistics, it requires familiarity, not expertise. Jorge Aranda's review of Codermetrics talks about the limitations to quantification in software engineering. I've said before that we need to measure how much time is lost due to poor computing skills in order to get people to take this kind of training seriously, but I'm very conscious of just how much measurement can't tell us. Another thought-provoking post by Cameron Neylon asks what it's reasonable to expect from scientists, and their software. I'm sure he'd enjoy hearing what you think... Read More ›

Building a Bibliography
Greg Wilson / 2011-11-25
With help from several of our regular readers, we have assembled a bibliography of research related to software engineering and computational science to go with our recommended reading list for students. We hope you find it useful, and we would welcome corrections and additions. Read More ›

Knowledge of the Second Kind
Katy Huff / 2011-11-19
Over the last three years, a group of students has quietly been converting snacks and enthusiasm into scientists who can program. The Hacker Within is a student club at the University of Wisconsin — Madison which came about when a number of nuclear engineering graduate students needed a forum to exchange tools and share best practices for their increasingly software intensive research. The success that followed provides an example of an educational model that has fostered necessary software skills among science and engineering graduate students. Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information upon it. —Samuel Johnson Since 2008, we've met every other week for an hour to discuss some useful computing tool (and eat snacks). We cover a broad range of topics, from peer-taught fundamental skills to more technical invited talks. The meetings attract students mostly from engineering, biology, and physics, but also have regular members from less predictable fields, such as psychology and limnology (the study of lakes!). We also pool the skills of our members to teach three and four day intensive, example-driven bootcamps. These attempt to impart fundamental programming skills such as C++, UNIX, and Python, and focus on a great deal of curriculum inspired by Software Carpentry. The bootcamps have received great praise from attendees (who hail from a staggering array of disciplines, see this cool chart). This bootcamp educational model has the advantage over a traditional course in that the time intensive nature of scientific coursework limits the feasibility of formal curriculum in software skills for scientists. That is to say, even if the right course were offered (and what would that be, exactly?), scientific curriculum leaves no room for a software development course (or worse, many). For this reason, students in scientific disciplines typically lack the software skills with which to conduct computational research effectively, but are unwilling or unable to invest time in formal training. The current state of affairs in academic research is often one in which students and researchers are programming in a vacuum, teaching ourselves computational tools unfamiliar to peers in our fields, and then using those tools to do our 'peer reviewed' research. This toxic situation demands a real change in the way we educate students in preparation for scientific computing. The Hacker Within community model has the potential to alleviate this situation in any institution that has a few individuals to spearhead it. A few snacks and some enthusiasm can replace a disconnected collection of researchers scattered across disciplines, with an inter-departmental forum in which those researchers can find and share knowledge efficiently with their peers. Read More ›

Show Me the Data
Greg Wilson / 2011-11-18
I got mail from a colleague at a prominent US university yesterday saying (in part, and elided to protect the guilty): ...the graduate student representative to the curriculum committee reported that the students did not want a scientific computing course, that they would instead figure it out themselves.... How does one respond to statements like this...that have...basically frozen skill levels? The options I see are formal ("in curriculum") training, bootcamps and /workshops, and letting them "figure it out themselves". Are there arguments about the successes of each? There are certainly arguments: the problem is, there's practically no data. After 14 years, the conclusion I've reached is that we will be ignored until we do empirical field studies to show people just how many potential research hours are being wasted due to inadequate computational skills. Surveys won't tell us: we need to get someone out in the field to shadow grad students for a few weeks, watching what they actually do and how they do it, so that we can compare the median with the 90th percentile (or 75th, or whatever). I estimate it would take one person 4-5 months to do a preliminary version, and then another 15-20 researcher-months to collect enough data to show senior faculty just how bad things are. Of course, many would ignore the results (just look at how many doctors smoke), but I'd like to think it would change at least a few minds, and I frankly don't know what else will. We know that such studies are possible, but I haven't found anyone willing to fund one in this particular area: I asked NSERC—Canada's equivalent of the NSF—twice in the three and a half years I was a professor; they said "no" both times, and I've had no more success elsewhere. As scientists, shouldn't we study the effectiveness of training just as rigorously as we'd study the effectiveness of a new treatment for diabetes? And if we're not going to do that, shouldn't we stop calling ourselves scientists? Read More ›

Quantifying Installation Costs
Greg Wilson / 2011-11-18
A few months ago, I tried to quantify the cost of poor software skills. A recent post from Adam Klein gives is a good excuse to try to do something similar for the cost of installing software. In his post, Klein describes the 17 steps he went through to set up a Python data hacking environment on a new machine. If we assume that each step has a 5% chance of failing for some reason (packages have moved on, the compiler isn't exactly the same version as Klein was using, whatever), then the chances of the whole process working are (1-.05)17, or roughly 42%. In other words, his process will fail to work the first time for over half of the people who try it. In some cases, they'll be able to figure out why, fix the problem, and move on, but in many others they won't—as I said earlier this week on my personal blog, we've taken something that may or may not be intrinsically hard (programming), and made it much harder by burying under layer upon layer of grief. The end result is that when a scientist sits down to try something new, s/he has no way of knowing whether it will take an hour, a day, or forever. It's hard to build a career on top of that kind of uncertainty... Read More ›

Accessible to All?
Greg Wilson / 2011-11-18
I just posted an article on my personal blog about the (in)accessibility of online educational material (including Software Carpentry's). As I said there, there aren't any easy answers, but if we do find funding to keep this project going, I'd like to find ways to make our content easier for everyone to use. Read More ›

Surviving the Tsunami
Greg Wilson / 2011-11-14
The October 2011 issue of ACM Queue features an article by Bruce Berriman and Steven Groom titled "How Will Astronomy Archives Survive the Data Tsunami?" The figures are scary: astronomers already have a petabyte of publicy-available data, and are adding half a petabyte per year, a rate which will increase dramatically as new instruments come online. The only way to avoid this all becoming write-only is to bet on emerging technologies, from general-purpose GPUs to cloud computing. The problem, of course, is that "emerging" usually means "flaky", both because the tools haven't had time to mature, and because we, their users, don't have the years of experience needed to know how best to use them. (As far as I'm concerned, we're still trying to figure out how best to use object-oriented programming in science, and we've been at it for thirty years...) But here's the good news. Instead of just the usual perfunctory nod toward education and training, Berriman and Groom put a spotlight on it: An archive model that includes processing of data on servers local to the data will have profound implications for end users, who generally lack the skills not only to manage and maintain software, but also to develop software that is environment-agnostic and scalable to large data sets. Zeeya Merali [...] and Igor Chilingarian and Ivan Zolotukin [...] have made compelling cases that self-teaching of software development is the root cause of this phenomenon... Berriman and Groom go on to recommend that we "...make software engineering a mandatory part of graduate education, with a demonstration of competency as part of the formal requirements for graduation." As I've discussed before, there's little chance of this happening in the short or medium term: everyone's curriculum is already over-full, and senior professors who only know what they taught themselves a generation ago are unlikely to push aside core courses in stellar dynamics or planetary physics to make room for version control and design patterns. What we can do, I think, is make resources like Software Carpentry more usable, and implement some sort of badging system to give students recognition for having completed the training themselves, and for passing it on to others (which would in turn encourage the formation of self-help groups like the University of Wisconsin's Hacker Within). All we need is funding for a couple of people for a couple of years... Read More ›

Clearing Up Code
Greg Wilson / 2011-11-14
The November/December 2011 issue of IEEE Software has a good article by the Climate Code Foundation's Nick Barnes and David Jones titled "Clear Climate Code: Rewriting Legacy Science Software for Clarity". In it, they describe how and why they rewrote a program used to calculate and compare global surface temperatures based on historical data. The original had been attacked by climate change denialists, first because it wasn't publicly available, and then because it was tangled and hard to run. Their rewrite produced something smaller, faster, and much easier to understand; most importantly, though, it validated the results of the initial program. Which made me wonder: what scientific program would you most like to see rewritten? To keep the question realistic, it has to be something small enough that two good programmers could do it in six months or less. What would you like rebuilt, and what do you think the benefit would be? Read More ›

Successful Bootcamp
Tommy Guy / 2011-11-11
Our 2011 Software Carpentry bootcamp, hosted with help by The Hacker Within and Scinet, was a huge success. We hosted 25 students for two very full days of hands on introductions to Python, The Shell, Nose tests, SVN, and sqlite. So what did we learn? We're still waiting on participant feedback, but a few things come to mind. First, wow, how did we screw software installation up so very badly? Most of our technical problems can from inconsistencies between various Python builds. Some students installed numpy separately from python and ran into version mismatches. The demo packages were missing from some people's iPython installations (including my own!) That doesn't even mention Cygwin. We got by thanks to the help of the wonderful volunteers who were able to sort out most of the problems, but it's hard to pitch a produce that is that difficult to get up and running. Imagine if we tried to publish papers as inaccessible as our code. Second, this bootcamp was a success because we got people using software on their machines from the very start. People have the programs, they have example code, and they are ready to use the thing we taught them. It's no small thing to meld good exercises with lectures, especially when people get confused or ask questions and get behind. It takes a team of people who can jump in and help participants through error messages, lost connections, typos, and bugs. So in the end, it was all about the people. Thanks to Jonathan Dursi, Jonathan Deber, Keven Brown, David Wolever, Katy Huff, Orion Buske, and Greg Wilson for making this bootcamp a success. Here's picture proof! Read More ›

The Ladder of Abstraction and the Future of Online Teaching
Greg Wilson / 2011-11-08
Up and Down the Ladder of Abstraction is one of the most thought-provoking things to hit the web in a long time. Its author, Bret Victor, doesn't just talk about the design process—he shows us what a great interactive tutorial ought to look like. (For a shorter, simpler, but equally inspiring example, have a look at the home page for his Tangle project, and ask yourself what learning would be like if learners could play with every diagram and quantitative statement in their "textbooks" that way...) Examples like these, and reflection on things I've learned by following people like Mark Guzdial and Audrey Watters, have made me realize that there's a big difference between online teaching and online learning. Unfortunately, Software Carpentry has focused on the former rather than the latter—on presenting content, rather than on how (and how much) people actually learn. Partly, this is because the former is easier, since I have control over the notes, the videos, and so on. Partly too, though, it reflects an academic culture in which professors focus on lecturing, rather than on changing students' understanding of the world. (We've all had students who got B's, or even A's, without really understanding the course material...) And partly, I've focused on production rather than consumption because the latter is very hard to assess. Even when we're teaching this stuff in person, as we're doing right now in Toronto, it's very difficult to get a handle on how much students have actually absorbed. For example, suppose we're teaching Python (which we are), and one of the exercises is to read a bunch of numbers from a file and print their mean. Ignoring floating-point issues, there's only one right answer, but that doesn't mean we can say, "If your program prints 6, then you understand loops, file I/O, and string-to-number conversions." What we'll actually see is people hacking and tweaking their code, more or less at random, until voila, a 6 pops out and they're done. Their programs will be littered with unused variables, five-stage assignments like: a = 5 b = a c = b print c and so on (what Katy Huff called cargo cult programming). We will have taught, and students will have produced the right answer for one specific case, but in many cases, they won't have learned. Now, you'd think this would be easy to check: give them a similar problem, and see if they can transfer their knowledge. But that's not going to work, because they can hack and tweak their way to an accidentally-correct answer to the second problem just as they did for the first. We could time them, on the assumption that if they've learned, they'll solve the second problem faster than the first, and the third faster than the second, but all that's going to do (at best) is identify the people who aren't learning; it isn't going to tell us why they aren't, or what they don't understand, which in turn means that we won't know what to explain to them to clear things up. And that's the real problem with many production-oriented approaches to online learning, from Software Carpentry to the Khan Academy. To paraphrase Tolstoy, successful learners are all alike; every unsuccessful learner is unsuccessful in their own way. There's only one correct mental model of how regular expressions work, but there are dozens or hundreds of ways to misunderstand them, and each one requires a different corrective explanation. What's worse, as we shift from knowing that to knowing how—from memorizing the multiplication table to solving rope-and-pulley dynamics problems or using the Unix shell—the space of possible misconceptions grows very, very quickly, and with it, the difficulty of diagnosing and correcting misunderstandings. In the long run, we may be able to develop expert systems (or cognitive tutors) to help with some of these issues. Right now, though, I think the only option is to keep a human mentor in the loop. I only have to look over someone's shoulder for a couple of minutes to see whether they've understood a lesson or not; watching how they produce an answer tells me more about their learning than the answer itself. Using today's tools, it would be relatively easy to have students record screencasts of themselves solving simple programming exercises and submit those along with their source code. However, I suspect most learners would be uncomfortable doing this, as it would feel very Big Brother-ish. I'd welcome your thoughts on all of this: on how we can shift Software Carpentry's focus from teaching to learning, and on how to assess the latter so that we can tell what's working and what isn't. After all, we're here to learn too... Read More ›

The Best vs. the Good
Greg Wilson / 2011-11-08
Cameron Neylon recently posted another thought-provoking piece, this one titled, "Building the perfect data repository...or the one that might get used". In it, he talks about why big institutional efforts to create scientific data repositories have mostly failed to take off, and points at simpler grassroots efforts that scientists might actually adopt because they immediately and obviously solve problems that scientists actually realize they have. One that he points to is DataStage, "a secure personalized 'local' file management environment for use at the research group level, appearing as a mapped drive on the user's PC." Another is If This Then That, which is the simplest "dataflow" tool you could imagine, and which I've been using regularly since it burst on the scene a couple of months ago. Read More ›

Nirvana on Monday Night
Greg Wilson / 2011-11-06
We are running a two-day bootcamp at the University of Toronto tomorrow and Tuesday (Nov 7-8). As part of that, we'll be heading to Nirvana, at 434 College Street (just east of Bathurst) around 6:00 pm on Monday for pints, good food, and conversation. You're all welcome to join us. Read More ›

Research Without Walls
Greg Wilson / 2011-10-22
Open science, reproducible research, and better computational skills aren't necessarily connected, but in my experience, people who care about one usually care about the others as well. In that light, I just signed the "Research Without Walls" pledge: effective today,I will assist in the peer review process (as a reviewer, board/committee member, chair, editor, etc.) only for conferences, journals, and other publication venues that make all accepted publications available to the public for free via the web. If you believe that sharing ideas is the heart and soul of science, please sign up as well. Read More ›

Slides from Hans-Martin
Greg Wilson / 2011-10-21
Hans-Martin von Gaudecker has posted the slides for his course "Effective programming practices for economists". They're a nice complement to the ones on this site—thanks, HM. Read More ›

American Scientist Article on Empirical Studies of Software Engineering
Greg Wilson / 2011-10-19
An article that Jorge Aranda and I wrote for American Scientist about empirical studies of software engineering is now up on the web. We hope it's a good introduction to the area, and we look forward to your feedback. If you'd like to know more about the area, please check out our joint "neat results" blog at http://www.neverworkintheory.org/. Read More ›

Updating to HTML 5
Greg Wilson / 2011-10-14
We are looking for a volunteer to update our episodes on HTML to the new HTML 5 standard. The changes are (probably) fairly minor, and as always, we'll help with post-production (editing of audio, etc.). If you're interested, please get in touch. Read More ›

The Science Code Manifesto's Five C's
Greg Wilson / 2011-10-14
The Science Code Manifesto comprises five core principles: Code: All source code written specifically to process data for a published paper must be available to the reviewers and readers of the paper. Copyright: The copyright ownership and license of any released source code must be clearly stated. Citation: Researchers who use or adapt science source code in their research must credit the code's creators in resulting publications. Credit: Software contributions must be included in systems of scientific assessment, credit, and recognition. Curation: Source code must remain available, linked to related materials, for the useful lifetime of the publication. In a way, Software Carpentry's goal is to give people the skills they need to do these things right: to create code that other people can read, to share and evolve that code in maintainable ways, and to create code that's useful in the first place. If you agree with their principles, please take a moment to endorse them, and please help make them a reality. Read More ›

Four New Episodes on Databases Using Microsoft Access
Greg Wilson / 2011-10-07
Thanks to the indefatigable Ethan White, we have just added four more episodes on using databases with Microsoft Access: aggregating data combining data with 'join' nested queries handling missing data We hope you enjoy them. Read More ›

Revamping This Site
Greg Wilson / 2011-10-05
We have been using WordPress as a content management system (CMS) for Software Carpentry since the launch of Version 4 in July 2010, and while nothing in particular is broken, I'm increasingly dissatisfied with its interface and performance. Looking at other online tutorial sites, there are a lot of things we could add; I'd be grateful for your feedback on which you would actually use, and what we might have missed. Our feature list is: Hierarchical organization of topics and episodes. Explaining things is only the first half of teaching; the other half is organizing information into a coherent, comprehensible narrative so that learners aren't constantly stumbling over things they don't yet know. Pages and sub-pages with the occasional cross-link are as good a way to do this as any, so we'll stick with that. Video, text, and images for each episode. The combination of video+audio for those who prefer learning by watching, and text for those who'd rather skim things, seems to work well. The text also makes things more findable, since Google and Bing don't know how to search video or images. Going forward, we'd really like to integrate these all better: instead of screen captures of slides, for example, we should use (searchable and stylable) HTML5, but as I've blogged several times, the tools to mix freehand drawing with text aren't there yet. We'd also like to show more live coding in our videos, but again, the tools we need to do this affordably don't seem to exist. A way to ask and answer questions about particular episodes. So far, we've been encouraging people to do this via threaded comments on the WordPress pages for particular episodes; the only advance I can think of is to allow people to attach those comments to particular locations in the episode transcript, rather than requiring them to tack 'em on at the bottom. A way to report mistakes. I'm not sure whether this should be separate from the above or not. On the one hand, the more channels there are, the more likely people are to miss something (which in this case means duplicate postings). On the other hand, saying, "You misspelled 'mispell' on slide 8," is different from, "I don't understand why you divided by zero on slide 10." A channel for larger questions, comments, and discussion. Some of the things people want to talk about—careers, open publication, etc.—aren't tied to specific topics or episodes, though we hope they are at least tangentially related. I can't think of anything better for this than the kinds of forums we've all been using for years... Sub-sites for study groups. Our material is designed for self-directed use, but we also want to encourage groups to work through it concurrently—having a study partner who's struggling with the same things you are makes learning a lot more fun. Last autumn and winter we just set up a few sub-pages on the WordPress site, but we clearly need something more: at the very least, something students can write as well as read. A wiki per study group seems like the logical choice, but my experience with wikis has been pretty disappointing: after an initial flurry of interest, they (almost) always seem to go cold. Exercises with solutions (either tied to study groups or not). If someone is leading the study group, or if students are doing more than just perusing the material, they'll need problems to solve. I think this is just a sub-case of the item above, though there are obviously questions about when solutions are visible, and to whom. A blog. There needs to be a channel for announcements (like this one). There should also be an iCal calendar feed of upcoming events, and both should be echoed to Twitter. Aggregation of external comments and discussion. Every time we apply for funding, we have to scour the web for mentions of Software Carpentry to show what impact we're having. I think this information would be useful to other people as well, so something that pulls together links and comments from elsewhere would be very helpful. Downloadable source code, data files, etc. Our Subversion repository is publicly-readable, but people regularly ask for a downloadable tar or zip file of source code (and, somewhat less frequently, for the source of the lectures themselves). Access to the raw material of videos. The Camtasia project files and the raw audio and video are archived, but not under version control (they are megabytes each). We've only had a couple of requests for these in 18 months, but as new technologies make these more mashable, we should provide them. One thing that isn't on this list is a Facebook group. That's because we've had two in the past, both of which quickly went cold. I know a lot of people are using it for study groups of various kinds, so maybe it could serve that role, but I'd need to be convinced. Talking about Facebook brings up the most interesting question of all: how much of this should we host ourselves? Should we run the wiki, or should we grab space somewhere else? What about forums—should we host those, or should we point everyone at some public discussion area. And commenting—there are now web services that allow people to attach comments to arbitrary web pages, so should we rely on one of those, or install and integrate something ourselves? Your thoughts would be welcome... Read More ›

2011 Software Carpentry Bootcamp Sold Out!
Tommy Guy / 2011-10-04
Software Carpentry is teaming up with The Hacker Within to offer a 2 day bootcamp for scientific programmers on November 7-8. Forty tickets and ten faculty observer slots filled up within two weeks of announcing the workshop on campus. So what are we going to cover? From what we know, the participants are a diverse group. They include Industrial Engineers, Physicists, Psychologists, and many other departments on campus. One thing they all deal with is data, and often lots of it. Data analysts need a place to store data, so we'll introduce databases using SQLite and version control with Subversion. Data analysts also need programming tools to reformat, clean, and analyze their input. We will spend over half of the time introducing Python, and we'll also introduce the shell. How can you help? One thing we really want to get right is a running example that we can use to motivate the topics we'll be teaching. What is a good topic that is accessible and interesting to engineers and people in the life scientists, that looks like data analysis, and that would benefit from SVN, databases, and Python? Read More ›

Plus Ca Change...
Greg Wilson / 2011-09-22
Once again, Compute Canada has sent out a document for "review" without leaving time for people to provide meaningful input. And once again, it's all about "big iron", as if scientists were somehow, magically, going to acquire the basic skills needed to write modular code, test it, maintain it, and—oh, what's the use? I've known for years that the only way we're going to fix this is to show a younger generation of scientists that doing things the right way also makes them more productive, and then wait for them to have enough seniority to fix our backward-looking institutions. Now, back to trying to figure out how to explain packaging and installation to geologists... Read More ›

I'm Not Normally Lost for Words
Greg Wilson / 2011-09-20
I mentioned last week that I'm trying to put together a lecture on packaging and installation. It's proving harder than I expected: I'm not normally lost for words, but I'm struggling to get these ones to come together. My goal is to help people understand what happens when they install software on a computer, so that they can: diagnose and fix problems when things go wrong; understand why the things that are in packaging tools are there, and how to use them. After a few false starts, I think the best way to do this will be challenge/response, i.e., trace the evolution of a typical scientific software package from a single Python file shared by email to a mix of Python and C that plays nicely with distutils2. Steps I've identified are: A single-file Python script that I email to my labmates whenever I make changes to it. Adding a version number to that file so that I can keep track of which copy someone has (see the essay on provenance for details). Splitting that file into multiple Python files and putting them in a directory with an __init__.py file to make a package, and distributing the result as a gzip'd tar file (again, by email). What are the next steps? At some point this lecture will have to explain PATH and PYTHONPATH, what .so and .dll files are, what /etc, .rc files, and the Windows registry are... How do I get there from here? What steps have your projects gone through, and in what order? Later: just to clarify, challenge/response is a teaching style in which you introduce a simple problem, demonstrate a simple solution, point out a flaw or shortcoming in that solution, show how a slightly more complex solution addresses the flaw, and repeat. It's similar to Lakatos' "proofs and refutations", and I find it a good way to help people understand why complex things are complex. Read More ›

The Simplest Web That Could Possibly Work
Greg Wilson / 2011-09-17
A new web tool called If This Then That generated a flurry of interest this week (see for example Scott Hanselman's blog post). Simply put, IFTTT lets you connect web services together using if-then rules. It's as simple as: Click on "this" to start. Select "Twitter", and tick off boxes to create a rule (such as "when Alan Turing tweets"). Click on "that". Select what you want to happen (e.g., echo the tweet to your own Facebook page). There is no step 5. That's the beauty of it: there is no step 5. It knows how to work with all sorts of services, from Instapaper and Pinboard to weather reports (yes, weather reports), and there is no step 5. Just as Twitter stripped blogs down to their bare essentials, IFTTT takes graphical workflow tools like Yahoo! Pipes and says, "What's the simplest version of this idea that could possibly be useful?" What's the equivalent for scientists? Michael Nielsen's answer is, "IFTTT itself." "Facebook for scientists" and "Twitter for scientists" projects have foundered because we already have Facebook and Twitter, and they already do most of what scientists need—why would this be different? But as Cameron Neylon pointed out: The problem on the research side is the variety of "output types". There are lots of inconsistent and non-standard outputs that can never quite be connected up the way you want, formats wrong, header broken, wrapped up in the wrong ASCII encoding or whatever. Once again, it comes back to the third of Jon Udell's principles of computational thinking: knowing the difference between structured and unstructured data. A lot of what's on scientists' hard drives cannot be understood by programs, despite being digital. A lot of the rest falls into the "long tail" trap: so few people (and files) use the format that writing tools to handle it isn't economical (by which I mean, general-purpose services won't, and the scientists who have their data in that format are too busy getting their next paper out to get around to it). This is scientific computing's "last mile"; if we really want to make the world writable, we need to focus on this rather than petaflops. Read More ›

Progress Of A Sort
Greg Wilson / 2011-09-13
As I mentioned a few months ago, I'm going to turn Software Carpentry into a book. Here's the present status: Chapter Word Count Introduction 768 Spreadsheets 3924 Subversion 6405 Python 6345 Interlude: What Is Text? 1071 Functions and Libraries 9409 Interlude: Boolean Logic 921 Case Study: Invasion Percolation 8929 Interlude: How Are Numbers Stored? 1828 Testing 4549 Error Handling 1529 The Shell 12919 Make 5668 Interlude: Provenance 1810 Sets and Dictionaries 7627 Case Study: Phylogenetic Trees 1689 Systems Programming 60 Interlude: Configuring Programs 2146 Numerical Programming 6193 Multimedia Programming 4125 Steganography 1657 Installation 135 HTML and XML 6404 Databases 10676 Regular Expressions 8037 Object-Oriented Programming 4002 Building Desktop GUIs 137 Interlude: Persistence 5984 Web Programming 9639 Security 1285 Performance 8043 Parallel Programming 931 Software Engineering 6391 Epilog 873 Acknowledgments 202 Glossary 12074 Bibliography 3378 Total 167,763 167,673 words might seem like a lot, but based on past experience, I think I'm about halfway to a readable book—editing is always as much work as writing stuff in the first place. Read More ›

What Happens When You Install Something?
Greg Wilson / 2011-09-08
The most frustrating part of this course is always getting things set up on students' machines. (Yes, we could give them all a Linux VM with everything installed properly, but if they're going back to Windows or Mac OS when the course is over, that's what they want to learn on.) Setting up a development environment is often just as frustrating for professionals (it took many of us half a day to get Lernanta running on our machines a few weeks ago), so I think we'd be doing Software Carpentry students a favor if we taught them what actually happens when you install software. Knowing that would help them debug things when they go wrong (and they always go wrong), and also help them understand why packaging their own work to share with others is as complicated as it is. My first cut at a syllabus for such a lecture is: how $PATH and variants work how file type associations work how .dll and .so files work what /etc, .rc, and the registry are and with that background in place, what needs to happen when something is installed but then there's the problem of versions and dependencies so talk about apt-get, easy_install, and other lookup-on-the-web installation tools and compare them to .exe-based installers on Windows and DMG-based installers on the Mac and to "build from source" on Linux and elsewhere I know that's a lot—probably three or four episodes of ten minutes each—but by the end, people should understand the kinds of things that can go wrong during an install, what version hell really is, and what they can do to make their own stuff more shareable. I don't think we can get them as far as creating RPMs or Python eggs, but I'd be willing to be pleasantly surprised... Thoughts? Read More ›

Where is the Puck Going to Be?
Greg Wilson / 2011-09-05
Looking at the schedule for Science Online London 2011 makes me feel that Software Carpentry is showing people how to solve yesterday's computational problems—that it's answering the questions people had (or should have had) in 1995, when desktop applications were the only game in town, and when computers were primarily used for calculating, rather than for sharing. They still are, and I think that version control and regular expressions and what-not are still the rock on which more novel things are built, but a lot of other things are taking shape in the fog. You can call it "open science" or "science 2.0" or whatever you want, but its focus is on sharing information more effectively by changing publication models, enabling reproducibility, making data findable, encouraging scientists to use groupware (sorry—we're supposed to call it "social media" these days), and so on. I'd like Software Carpentry to help people get ready for this brave new world. No, scrap that: I'd like Software Carpentry to help people create it, but I'm finding it hard to make up a syllabus that will prepare people for something that doesn't yet have a clearly-defined core. Should we teach people how blogs actually work, on the assumption that RSS (or something like it) will be how scientific information is exchanged and aggregated in the future? Should we spend more time on provenance, or on next-generation build tools, or something else entirely? And crucially, what should we drop to make room? Scientists certainly don't have any more time to learn this than they did when we started in 1997; how do you think priorities have changed since then, and what should we do about it? Read More ›

Teaching Security to Scientists
Greg Wilson / 2011-09-02
Thanks to everyone for their suggestions regarding what we should teach about computer security if we only have one hour (the usual constraint for topics in this course). The outline below is based in part on the lecture on security from Version 3 of this course, in part on Rick Wash's excellent study of folks models of computer security, and in part on mistakes I've seen (or made) myself in the past five years. Feedback would be very welcome, but remember: we're teaching scientists and engineers who are programming as a way to do science, not as an end in itself. Introduction "steal your data" is the Hollywood threat correlate your data is just as big a threat that we all ignore injecting data (corrupting your database with evil intent) steal your credentials to attack something else denial of service attacks botnet: use your computer for spam, click fraud, DDOS not after your data in particular, so infection isn't particularly visible more is better, so attackers are not just after big fish Overview a hacker is a human criminal (yes, the word used to have another meaning, get over it) not geeky teenage graffiti artists (though some kids run canned warez) live real-time attacks by human beings are rare because they're not cost-effective social engineering attacks are a much "better" use of human criminals' time a virusis a piece of software that infects a target computer and tries to propagate itself what "infection" means (usually relies on access control failure) not the same as "buggier than usual" although bugs in software are often the targets of attack what anti-virus software does no, your Mac/Linux machine is not magically immune to all viruses ways that viruses can spread "download and run this program" "open this attachment" (you explicitly run the program) "put this USB into your computer" (computer may run it automatically) "open this text file in your editor" almost certainly can'tinfect your machine unless there's a bug in the text editor — see below on stack smashing so keep patches up to date but those aren't the only ways to spread in a networked world "click this link" may run some JavaScript in your browser honestly, viruses have nothing to do with pop-ups many more programs (services) running on your computer these days than you realize all of which can be attacked, even if you don't click on anything send them data to trick them into doing things — see below on stack smashing (again) 'ps' command or equivalent shows what they are (there are lots) port scanning (what command to use?) shows how many are listening for data what a firewall does Framework and Examples need a framework for thinking about this authentication (something you know, something you have, something you are) authorization (who's allowed to do what) access control (enforcement of rules) usability (can/will people actually understand and use the above) digital signatures have been around for years... ...and almost nobody uses them ("Why Johnny Can't Encrypt") running example: WebDTR is a password-protected web interface to a database of drug trial results example: someone steals your laptop you really should have encrypted the answers you downloaded and saved... but "password at reboot" isn't really much security (I haven't rebooted my machine in a month) example: easily-guessed password is an authentication and usability failure dictionary attacks on encrypted passwords (never store as plain text) XKCD cartoon idea: requiring 8 characters and unmemorable is less secure than longer phrase confirming identity for bad passwords is a bad idea: tells the attacker "this is a valid ID" example: listening to unencrypted network traffic to steal password access control failure replay attacks example: getting a file defeated by "../.." attack authorization: the web server shouldn't have read permission access control: program shouldn't be able to reach that file example: using user ID in URL to track the user authentication failure (is this the actualy user?) authorization failure: not checking that this person (logged in as someone else) actually allowed to do things example: burying the user ID in a hidden form field same as above: someone can craft an HTTP POST example: SQL injection authorization: you're not supposed to be able to run code example: displaying stack trace for exceptions useful for debugging but now the attacker knows some of the libraries you're using, and can look up exploits that target them log the stack trace instead of displaying it but remember: security by obscurity doesn't work example: flood the application with login requests no information lost, but no service provided example: phishing the text displayed with a link has nothing to do with where the link sends you what a page looks like tells you nothing about where that page is actually hosted example: smashing the stack wave hands really use this example to show people that code is data Keep Calm and Carry On how does this all apply to scientists? have to do everything that regular people do to stay safe plus everything programmers do when creating web services, sharing code libraries, etc. are you sure that FFT library you downloaded doesn't contain an attack? is its author sure that the compiler she used doesn't inject attacks without her knowing about them? plus everything IT departments do when managing data patient records and other sensitive information are obvious ClimateGate: if your science actually matters, someone will want to cast doubt on it, honestly or otherwise it's easy to be crippled by fear or to use fear as an excuse for clutching at power which would be a tragedy, since the web has so much potential to accelerate science the bigger picture (or, please help us engineer a more secure world) computer security is a matter of economics: extent of damage vs. cost to prevent or clean up keep usability in mind facial recognition software to spot terrorists 1% false positive rate, 300K passengers per day in an airport, equals one false alarm every 30 seconds do you think the guards will still be paying attention to the alarms on Tuesday? Risk Importance Discussion Denial of service Minor Researchers can wait until the system comes back up Data in database destroyed Minor Restore from backup Unauthorized data access Major If competitors access data, competitive advantage may be lost Backups corrupted, so that data is permanently lost Major Redoing trials may cost millions of dollars Data corrupted, and corruption not immediately detected Critical Researchers may make recommendations or diagnoses that lead to injury or death what do do? top-down mandates of precautions against specific threats hasn't worked, and won't criminals are smart people who adapt quickly best model is how we deal with credit card fraud make credit card companies liable for losses, then let the free market figure out the balance need changes to legislation so that: creators of software vulnerabilities are liable for losses whoever first collects data is liable for its loss, no matter where it is when it's stolen on their own, such changes would stifle open science scientists are broke university bureaucrats don't like risk result is no sharing ever so we need: equivalent of copyright's "fair use" provisions meaningful academic credit ("points toward tenure") for creating data and software that people use in the short term, get professional help! the only time in this course that we've said that so please take it seriously Read More ›

Renting Cycles Has Never Been Easier (For Some Definition of 'Easier')
Greg Wilson / 2011-09-01
This video, titled "Building an AWS Cluster in 10 Minutes", is both inspiring and depressing: inspiring, because you really can set up a cheap supercomputer for almost no money in just a few minutes, and depressing, because it's effectively out of reach for most scientists, who won't understand the terms being thrown around (certainly not well enough to get themselves out of trouble if anything goes wrong). The good news is, Titus Brown's course has shown that scientists can get from zero to cluster in just two (full-time) weeks. Read More ›

Demos Reinforce Errors, and Confusion is Good
Greg Wilson / 2011-08-17
Mark Guzdial has posted a good summary of this year's ICER keynote by physics education guru Eric Mazur, in which he reported the results of several recent experiments. The most important for us are: Giving people a demo of something actually results in them understanding it less well, because they fit what they've seen into their preconceptions (which are then reinforced). Guzdial interprets this to mean that CS educators need to do more live coding. Students like teachers who clarify things, but students who are confused are actually more likely to learn and understand. Students' self-reported understanding of a topic has no relation to their actual understanding of it (which highlights once again the fact that self-assessment is useless). There are obvious implications for this course... Read More ›

Introducing Programming a Different Way
Greg Wilson / 2011-08-08
Our quick introduction to Python is the module I'm least happy with, so I've been thinking about how to re-design it. I've included a new outline below; comments would be very welcome. Programming is what you do when you can't find an off-the-shelf tool to do what you want Scientists' needs are so specialized that there often isn't such a tool Even if it exists, the cost of finding it may be too high (can't Google by semantics) Why is programming hard to teach/learn? Trying to convey three things at once: Here's something interesting tha tyou might actually want to do. Here's the syntax of whatever programming language we're using. Here are some key ideas about programs and programming that will help you do things. #1 is what engages your interest #2 is what you'll grapple with (and will need to master in order to do #1) #3 is what's most useful in the long term But it's hard or impossible to learn the general without first learning some specifics And you have deadlines: getting that graph plotted is more important right now than the big picture We will teach basic programming by example Show you small programs, then explain the things they contain Also explain a few principles along the way We will use Python Widely used for scientific programming, but that's not the main reason Our experience is that novices find it more readable than alternatives And it allows us to do useful things before introducing advanced concepts (e.g., OOP) Will not start with multimedia programming, 3D graphics, etc. Guzdial et al have found that it's more engaging for students But getting those packages installed (and for us, maintaining them) is hard work We assume that you've done some programming, in some language, at some point Have at least heard terms like "variable", "loop", "if/else", "array" Quick test: can you read a list of numbers, one per line, and print the least and greatest? Before we dive in, what is a program? Cynical answer: "anything that might contain bugs" Classic answer: "Instructions for a computer" Our answer adds: "...that a human being can understand" Takes a lot of work to turn the things we write into instructions a computer can execute Time to solution is (time to write code that's correct) + (time for that code to execute) The latter halves every 18 months (Moore's Law) The former depends on human factors that change on a scale of years (languages) or generations (psychology) Programs store data and do calculations Use variables for the first Write instructions that use those variables to describe the second Put the following in a text file (not Word) and run it # Convert temperature in Fahrenheit to Kelvin. temp_in_f = 98.6 temp_in_k = (temp_in_f - 32.0) * (5.0 / 9.0) + 273.15 print "body temperature in Kelvin:", temp_in_k body temperature in Kelvin: 310.15 Variable is a name that labels a value (picture) Created by assignment [Box] versus declaration (static typing) or create-by-read (and why the latter is bad) Usual rules of arithmetic: * before +, parentheses Print displays values Put text (character strings) in quotes Print automatically puts a space between values, and ends the line Need to know it: use "5/9" instead of "5.0/9.0" # Convert temperature in Fahrenheit to Kelvin. temp_in_f = 98.6 temp_in_k = (temp_in_f - 32.0) * (5 / 9) + 273.15 # this line is different print "body temperature in Kelvin:", temp_in_k body temperature in Kelvin: 273.15 Run interpreter, try 5/9, get 0 Shows that Python can be used interactively Integer vs. float, and what division does Automatic conversion: 5.0/9 does the right thing [Box] Why are so many decimal places shown in 5.0/9 Need to know it: sometimes Python doesn't know what to do # Try adding numbers and strings. print "2 + 3:", 2 + 3 print "two + three:", "two" + "three" print "2 + three:", 2 + "three" 2 + 3: 5 two + three: twothree 2 + three: Traceback (most recent call last): File "add-numbers-strings.py", line 5, in <module> print "2 + three:", 2 + "three" TypeError: unsupported operand type(s) for +: 'int' and 'str' In this case, "2three" would be sensible But what about "1" + 2? The character "1" is not the number 1 On your own, try "two" * 3 Back to useful things Computers are useful because they can do lots of calculations on lots of data Which means we need a concise way to represent multiple values and multiple steps Writing out a million additions would take longer than doing them # Find the mean. data = [1, 4, 2, 3, 3, 4, 3, 4, 1] total = 0 number = 0 for value in data: total = total + value number = number + 1 mean = total / number print "mean is", mean mean is 2 Use list to store multiple values Like a vector in mathematics Use loop to perform multiple operations Like Σ in mathematics But we have to break it down into sequential steps And since we're doing that, have to be able to update variables's values Can trace execution step by step manually or in a debugger An important skill Want to write programs so that tracing is easy See what this means when we talk about functions Did you notice that the result in the example above is wrong? 25/9 is 2, but 25.0/9.0 is 2.77777777778 Problem is that total starts as an integer, we're adding integers, we wind up doing int/int (again) Could fix it by initializing total to 0.0 Or use a function to do the conversion explicitly # Find the mean. data = [1, 4, 2, 3, 3, 4, 3, 4, 1] total = 0 number = 0 for value in data: total = total + value number = number + 1 mean = float(total) / number # this line has changed print "mean is", mean mean is 2.77777777778 Functions do what they do in mathematics Values in, values out Spend a whole chapter on them, since they're key to building large programs Right now, most important lesson is that just because a program runs, doesn't mean it's correct Could check the original program by using a smaller data set E.g., [1, 4] produces 2 instead of 2.5 Writing programs so that they're checkable is a big idea that we'll return to Need to know it: the len function # Find the mean. data = [1, 4, 2, 3, 3, 4, 3, 4, 1] total = 0 for value in data: total = total + value mean = float(total) / len(data) # this line has changed print "mean is", mean mean is 2.77777777778 Need to know it: list are mutable # Calculate running sum by creating new list. data = [1, 4, 2, 3, 3, 4, 3, 4, 1] result = [] current = 0 for value in data: current = current + value result.append(current) print "running total:", result data = [1, 4, 2, 3, 3, 4, 3, 4, 1] Start with the empty list result.append is a method Like a function, but attached to some kind of object like a list noun and verb Important enough that we'll spend a whole chapter on this, too How to double the values in place? # Try to double the values in place. data = [1, 4, 2, 3, 3, 4, 3, 4, 1] for value in data: value = 2 * value print "doubled data is:", data doubled data is [1, 4, 2, 3, 3, 4, 3, 4, 1] New values are being created, but never assigned to list elements Easiest to understand with a picture Need to know it: list indexing Mathematicians use subscripts, we use square brackets Index from 0..N-1 rather than 1..N for reasons that made sense in 1970 and have become customary since Python and Java use 0..N-1 (like C) Fortran and MATLAB use 1..N (like human beings) # Try to double the values in place. data = [1, 4, 2] data[0] = 2 * data[0] data[1] = 2 * data[1] data[2] = 2 * data[2] print "doubled data is:", data doubled data is [2, 8, 4] Clearly doesn't scale... Need to get all the indices for a list of length N The range function produces a list of numbers from 0..N-1 Examples Exactly the indices for a list You will almost never be the first person to need something It's probably in the language, or in a library Hard part is finding it... # Double the values in a list in place data = [1, 4, 2, 3, 3, 4, 3, 4, 1] length = len(data) # 9 indices = range(length) # [0, 1, 2, 3, 4, 5, 6, 7, 8] for i in indices: data[i] = 2 * data[i] print "doubled data is:", data doubled data is: [2, 8, 4, 6, 6, 8, 6, 8, 2] Fold this together by combining function calls (like \sqrt{sin(x)}) # Double the values in a list in place. data = [1, 4, 2, 3, 3, 4, 3, 4, 1] for i in range(len(data)): data[i] = 2 * data[i] print "doubled data is:", data doubled data is: [2, 8, 4, 6, 6, 8, 6, 8, 2] Usually won't type in our data Store it outside program [Box] Explain difference between memory, local disk, and remote disk And why we don't do everything one way or the other: performance vs. cost # Count the number of lines in a file reader = open("data.txt", "r") number = 0 for line in reader: number = number + 1 reader.close() print number, "values in file" 9 lines in file What about mean? # Find the mean. reader = open("data.txt", "r") total = 0.0 number = 0 for line in reader: total = total + line number = number + 1 reader.close() print "mean is", total / number Traceback (most recent call last): File "mean-read-broken.py", line 7, in <module> total = total + line TypeError: unsupported operand type(s) for +: 'float' and 'str' Data in file is text, so we need to convert # Find the mean. reader = open("data.txt", "r") total = 0.0 number = 0 for line in reader: value = float(line) total = total + value number = number + 1 reader.close() print "mean is", total / number mean is 2.77777777778 Notice that we're using the original program as an oracle Pro: start simple, make more complex, confidence at each step Con: bug in original perpetuated indefinitely Real-world data is never clean Count how many scores were not between 0 and 5 # Count number of values out of range. data = [0, 3, 2, -1, 1, 4, 4, 6, 5, 5, 6] num_outliers = 0 for value in data: if value < 0: num_outliers = num_outliers + 1 if value > 5: num_outliers = num_outliers + 1 print num_outliers, "values out of range" 3 values out of range Need to know it: combine tests using and and or # Count number of values out of range. data = [0, 3, 2, -1, 1, 4, 4, 6, 5, 5, 6] num_outliers = 0 for value in data: if (value < 0) or (value > 5): num_outliers = num_outliers + 1 print num_outliers, "values out of range" 3 values out of range Need to know it: in-place operators # Count number of values out of range. data = [0, 3, 2, -1, 1, 4, 4, 6, 5, 5, 6] num_outliers = 0 for value in data: if (value < 0) or (value > 5): num_outliers += 1 print num_outliers, "values out of range" 3 values out of range Don't actually "need" to know it But it's a common idiom in many languages Data cleanup Values are supposed to be monotonic increasing Check that they are, report where it fails if they're not # Report where values are not monotonically inreasing data = [1, 2, 2, 3, 4, 4, 5, 6, 5, 6, 7, 7, 8] for i in range(2, len(data)): if data[i] < data[i-1]: print "failure:", i i = i + 1 failure: 8 Group by threes # Combine successive triples of data. data = [1, 2, 2, 3, 4, 4, 5, 6, 5, 6, 7, 7, 8] result = [] for i in range(0, len(data), 3): sum = data[i] + data[i+1] + data[i+2] result.append(sum) print "grouped data:", result Traceback (most recent call last): File "group-by-threes-fails.py", line 6, in <module> sum = data[i] + data[i+1] + data[i+2] IndexError: list index out of range 13 values = 4 groups of 3 and 1 left over First question must be, what's the right thing to do scientifically? Let's assume, "Add up as many as are there" # Combine successive triples of data. data = [1, 2, 2, 3, 4, 4, 5, 6, 5, 6, 7, 7, 8] result = [] for i in range(0, len(data), 3): sum = data[i] if (i+1) < len(data): sum += data[i+1] if (i+2) < len(data): sum += data[i+2] result.append(sum) print "grouped data:", result grouped data: [5, 11, 16, 20, 8] But this is clumsy How do we add up the first three, or as many as are there? Don't want to have to keep modifying the list as we try out ideas So use a list of lists. # Add up the first three, or as many as are there. test_cases = [[], # no data at all [10], # just one value [10, 20], # two values [10, 20, 30], # three [10, 20, 30, 40]] # more than enough for data in test_cases: print data [] [10] [10, 20] [10, 20, 30] [10, 20, 30, 40] Can now try all our tests by running one program Back to our original problem: sum of at most the first three # Sum up at most the first three values. test_cases = [[], # no data at all [10], # just one value [10, 20], # two values [10, 20, 30], # three [10, 20, 30, 40]] # more than enough for data in test_cases: limit = min(3, len(data)) sum = 0 for i in range(limit): sum += data[i] print data, "=>", sum [] => 0 [10] => 10 [10, 20] => 30 [10, 20, 30] => 60 [10, 20, 30, 40] => 60 That looks right Though if there were 100 tests cases, we would want different output Come back to this idea later Need one more tool: nested loops # Loops can run inside loops. for i in range(4): for j in range(i): print i, j 1 0 2 0 2 1 3 0 3 1 3 2 Easiest to understand with a picture Final step: instead of starting at zero every time, start at 0, 3, 6, 9, etc. Need more test cases Don't need to test everything (which is why we skip from 40 to 60 to 80) We'll come back to how we decide what is or isn't a useful test case # Sum up in groups of three. test_cases = [[], [10], [10, 20], [10, 20, 30], [10, 20, 30, 40], [10, 20, 30, 40, 50, 60], [10, 20, 30, 40, 50, 60, 70, 80]] for data in test_cases: result = [] for i in range(0, len(data), 3): limit = min(i+3, len(data)) sum = 0 for i in range(i, limit): sum += data[i] result.append(sum) print data, "=>", result [] => [] [10] => [10] [10, 20] => [30] [10, 20, 30] => [60] [10, 20, 30, 40] => [60, 40] [10, 20, 30, 40, 50, 60] => [60, 150] [10, 20, 30, 40, 50, 60, 70, 80] => [60, 150, 150] Understand this in pieces Outer for loop is selecting a test case So think about its body in terms of one test case Inner loop is going in strides of three So think about its body for some arbitrary value of i limit is as far as we can go toward three values up from i So range(i, limit) guaranteed to be valid indices for list Human beings can only keep a few things in working memory at once "Seven plus or minus two" How we actually understand this program is: for data in test_cases: result = sum_by_threes(data) print data, "=>", result to sum_by_threes given a list data: result = [] for i in range(0, len(data), 3): limit = min(i+3, len(data)) sum = sum_from(data, i, limit) result.append(sum) to sum_from given a list data, and start and end indices: sum = 0 for i in range(start, end): sum += data[i] The computer doesn't care one way or another But what we need is a way to write our programs in pieces, then combine the pieces That's the subject of the next chapter Read More ›

Computing in Physics 101: What We're Doing Wrong
Greg Wilson / 2011-08-04
Mark Guzdial and his colleagues do top-notch research on computing education—that's "teaching people computing", not "using computers to teach people", though for obvious reasons, the two frequently overlap. He recently wrote three blog posts that I think everyon pushing for more computing in the classroom should read. In them, he describes the results of Daniel Caballero's PhD research, in which he compared first-year physics students doing a traditional course with ones doing an equivalent course which included a large programming component. His findings were: The students in the traditional course came out with a better grasp of basic physics. This isn't surprising&dmash;their homework assignments were all on physics, rather than on a mix of physics and programming—but it does show that any extra insight that comes from playing with computational models doesn't compensate for the time required to learn how to compute (at least not on the timescale of one year). Students who took the computationally-oriented course had less favorable attitudes toward computational modeling after the course than they had at the start; their attitudes were also less favorable than those of students who took the conventional course. Guzdial summarizes this work by saying: We need to produce STEM graduates who can model us[ing] computers and who positive attitudes about computational modeling. The challenge for computing education researchers is that...we don't nkow how to do that yet. Our tools are wrong (e.g., the VPython errors get...in the way), and our instructional practices are wrong (e.g.,...students are more negative about computational modeling after instruction than before). These are sobering conclusions, particularly for someone who has spent a year or more building material to teach computing to scientists. Caballero's research may not tell us what we should do (though Mark's comments about the value of live coding have got me thinking once again), but knowing that we're doing it wrong right now is a necessary first step. Read More ›

Software Carpentry in HPCWire
Greg Wilson / 2011-07-22
HPCWire has run an interview with me about Software Carpentry (a follow-up to one they did several years ago). Regular readers will have seen the main points before, but I hope it's a good summary of the current state of play, and of what's wrong with equating "scientific computing" and "high performance computing". Read More ›

The Case of Abinit
Greg Wilson / 2011-07-20
The latest issue of Computing in Science & Engineering has a good article by Yann Pouillon, Jean-Michel Beuken, Thierry Deutsch, Marc Torrent, and Xavier Gonze titled "Organizing Software Growth and Distributed Development: the Case of Abinit" (unfortunately behind a pay wall). It describes the infrastructure and practices used to manage Abinit, a half-million line open-source program. It's a lot more engineering than most teams need, but it provides a lot of insight into what small projects might grow into. If anyone would like to do a similar paper about what a smaller-scale team uses, it would be a welcome counterpoint. Read More ›

Material from Newcastle Workshop Now Available
Greg Wilson / 2011-07-20
Back in June, we mentioned a workshop on computing skills at the University of Newcastle. It reportedly went very well, and presentations and other materials are now available. Many thanks to Elizabeth Petrie, the other organizers, and the presenters. Read More ›

How Much Do You Need?
Greg Wilson / 2011-07-20
Michigan State's Titus Brown has posted a good discussion of what kind of computing hardware you need to do bioinformatics, and why. Long story short, it's about $100K/year (US). Read More ›

And Speaking of Titus Brown...
Greg Wilson / 2011-07-20
Michigan State's Titus Brown recently ran his course on analyzing next-generation sequencing data for the second time. Judging from his report, it was just as successful as last year's. Congrats! Read More ›

Architecture of Open Source Applications Webinars Tuesday July 13 and 20
Greg Wilson / 2011-07-11
Smart Bear Software is hosting two online panel discussions about The Architecture of Open Source Applications, at 1:00 pm EST on Wednesday, July 13, and again at the same time (with different panelists) a week later. You can sign up on their site; we look forward to seeing/hearing from lots of you. Read More ›

Stanford Course Went Well
Greg Wilson / 2011-07-10
Prof. Risa Wechsler, along with Alex Ji and Zahan Malkani, recently ran a short course at Stanford based in part on Software Carpentry called Physics 91S1: Practical Computing for Scientists. The course notes and handouts are available online, and the open source bits of code are on GitHub. We hope you find them useful... Read More ›

Reproducible Computational Geophysics
Greg Wilson / 2011-07-06
A summary of a recent workshop on Open Software Tools for Reproducible Computational Geophysics is now online. Lots of interesting stuff, and once again, the things we teach in this course are prerequisites for doing/using/understanding much of it. Read More ›

Mentioned in Nature Methods
Greg Wilson / 2011-07-01
A recent article in Nature Methods by Jeffrey Perkel titled "Coding your way out of a problem" makes mention of Software Carpentry. Elsewhere, Mike Croucher has written a nice article called "In defense of inefficient scientific code", which makes many of the same points we've been making here. Read More ›

It Will Never Work in Theory
Greg Wilson / 2011-06-29
Inspired in part by Lambda the Ultimate, which reports on what's new in programming language research, Jorge Aranda and I have started a new blog called "It Will Never Work in Theory" to bring you the latest results in empirical studies of software engineering. The first posts discuss: Rahman and Devanbu's "Ownership, Experience, and Defects: A Fine-Grained Study of Authorship", which found that code worked on by one developer (rather than many) is more often implicated in defects, but that a developer's experience with a particular file (rather than the project in general) reduces defect rates. Stolee and Elbaum's "Refactoring Pipe-like Mashups for End-User Programmers", which applies the "code smells" meme to Yahoo! Pipes (and by implication shows that refactoring ideas can be applied to other end-user programming systems). Mockus's "Organizational Volatility and its Effects on Software", which found that an influx of newcomers into a project doesn't increase fault rates (since they're usually given simple tasks to start with), but that organizational change can still account for about 20% of faults. Our aim in starting this blog is to continue the work begun in Making Software: to let practitioners know what researchers have discovered, and what kinds of questions they can answer, and to give researchers feedback on what's useful, what isn't, and what they ought to look at next. We look forward to your feedback. Read More ›

Michael Nielsen Talks About Open Science in San Francisco on June 29
Greg Wilson / 2011-06-22
As per his blog post, the inimitable [1] Michael Nielsen will be talking about "Why the net doesn't work for science—and how to fix it" next Wednesday in San Francisco. It's sure to be both informative and enjoyable—hope you can make it. [1] Well, you try to imitate an Australian quantum physicist turned open science advocate... Read More ›

Doing the Math
Greg Wilson / 2011-06-20
Let's do some math. Suppose that working through the Software Carpentry course takes the average scientist five full-time weeks. It doesn't matter whether that's one five-week marathon, or whether the time is spread out over several months; the cost is still roughly 10% of the scientist's annual salary (if you're thinking like an administrator) or 10% of their annual published output (if you're thinking like the scientist herself). How big a difference does it have to make to her productivity to be worthwhile? Well, the net present value of n annual payments of an amount C with an interest rate of i is P=C(1-(1+I)-n)/i. If we assume our scientist only keeps doing research for another 10 years after taking the course (which I hope is pessimistic), and depreciation at 20% (which I also hope is pessimistic), then the present value works out to 4.2 times the annual savings. Doing a little long division, that means this training only has to improve the scientist's productivity by 2.4% in order to pay for itself. That works out to just under an hour per week during those ten years; anything above that is money (or time) in the bank. Now suppose the feedback we get from former students is right, and that this training saves them a day per week or more. Let's assume the average scientist (whatever that means) costs $75,000 a year. (That's a lot more than a graduate student, but a lot less than the fully-loaded cost of someone in an industrial lab.) 20% of their time over the same ten years, at the same 20% discount rate, works out to roughly $63,000; at a more realistic discount rate of 10%, it's roughly $93,000. That's roughly a ten-fold return on $7500 (five weeks of their time right now at the same annual salary). So my question is, why do scientists—who are certainly supposed to be able to do basic math—ignore this? More to the point, why do the people who organize conferences on "e-science" persist in ignoring two facts: The biggest bottleneck for the overwhelming majority of scientists (90% or more if you believe our 2008-09 survey) is development time, not CPU cycles. Faster machines can improve turnaround times a bit, but mastering a few basic skills will make a much bigger difference. Even those scientists who really need supercomputers to do their work would get more done faster if they were wasting less time copying files around, repeating tasks manually, and reinventing sundry wheels. They are trying to solve two open problems at once: whatever is intrinsic to their science, and high-performance parallel programming. Tackling the latter without a solid foundation is like trying to drive an F1 race car on the highway before you've learned to change lanes in a family car. I know from personal experience that the crash and burn rate is comparable... I will believe that computational science is finally outgrowing its "toys for boys" mentality when I see an e-science conference that focuses on process and skills: on how scientists develop software at the moment-by-moment, week-by-week, and year-by-year scales. I will believe that people really care about advancing science, rather than in the bragging rights that come from having the world's biggest X or its fastest Y, when supercomputer centers start requiring courses on software design, version control, and testing as prerequisites to courses on GPUs and MPI. I'll believe it when journals like Nature and Computing in Science & Engineering require every paper they publish to devote a section to how (and how well) the code used in the paper was tested. And I'll believe in Santa Claus when I see him up on my roof saying, "Ho ho ho." What I won't do is take bets on which will happen first. Read More ›

Health Informatics Resources
Greg Wilson / 2011-06-18
Via William Hopper, a list of online healthcare informatics resources that might be of interest to some readers. If you have others, I'm sure he'd enjoy hearing from you. Read More ›

New Episode: MATLAB Structs and Cell Arrays
Greg Wilson / 2011-06-15
The title says it all: thanks to the tireless Tommy Guy, we have a new episode on MATLAB structs and cell arrays. Read More ›

A New Look
Greg Wilson / 2011-06-14
I'm fond of the Software Carpentry logo, but the blue-to-white color fade is difficult to print on coffee mugs, and impossible to embroider on shirts. Thanks to the talented Veronica Wong, we have a new one: We'll be converting things over piece by piece as we rebuild the website over the summer. Read More ›

Audio Processing in Python
Greg Wilson / 2011-06-10
Thanks to Becky Stewart, we now have a 12-minute episode on audio processing in Python. We hope you find it useful—as always, feedback is very welcome. Read More ›

Practical Computing for Everyone (not just biologists)
Greg Wilson / 2011-06-07
Steven Haddock and Casey Dunn: Practical Computing for Biologists. Sinauer Associates, 2010, 0878933913. My copy of Practical Computing for Biologists arrived last week, and I've been very impressed. It is a well-written, well-paced guide to basic computing skills for scientists and engineers of all stripes (not just biologists). Many of the topics will be familiar: editing text files (including how to use regular expressions in an editor) the Unix shell basic Python programming (including debugging strategies) relational databases SSH installing and configuring software There are also a few that we don't cover, such as interacting with hardware, and some that are covered in more depth than we give them, like image manipulation. The pace is gentler than Software Carpentry, but the last couple of years have convinced me that's a good thing: I think Haddock & Dunn have it right for this target audience. And it's beautifully produced: full-color printing and great graphical design make this book a joy to read. If I ever do turn Software Carpentry into a book, I might skip the topics PCB covers and just tell people to go and buy it. Recommended. Read More ›

Programming for Scientists at Newcastle University: June 20, 2011
Greg Wilson / 2011-06-04
From the announcement: Programming is becoming an increasingly important part of scientific research, yet many scientists are self-taught programmers with little formal training. This means that we are often unfamiliar with simple tools that can make programming and dealing with data faster, more reliable and more reproducible. This event is a day-long workshop to develop awareness of the skills and tools that help make computing more efficient and provide results that are less prone to error. If you've ever thought "Surely there must be a better way to do this", then this is the event for you! There is also a fuller description—check it out. Read More ›

Five on Systems Programming
Greg Wilson / 2011-06-02
Thanks the Software Sustainability Institute's Mike Jackson, we now have five episodes on how to inspect and manipulate files and directories from inside a Python program—many thanks. If you would like to contribute to this project as well, please get in touch. Read More ›

Workshop at CEF'11
Greg Wilson / 2011-06-01
I will be giving a one-day workshop on software skills for computational economists at the 17th International Conference on Computing in Economics and Finance (CEF 2011) in San Francisco on Tuesday, June 28. I'm going to talk about version control, tracking provenance, systematic testing, and key results in empirical software engineering; there will also be a guest lecture from Michael Nielsen about the future of scientific practice in a networked world. Read More ›

'The Architecture of Open Source Applications' is Now Available
Greg Wilson / 2011-05-23
It has been slightly over a year in the making, but it's finally here: The Architecture of Open Source Applications has been published. You can buy the book directly from Lulu.com at http://www.lulu.com/product/paperback/the-architecture-of-open-source-applications/15819207, or view the contents online at http://aosabook.org. My thanks to all the people who contributed to the book, and especially to Amy Brown, my tireless and diligent co-editor. We hope you enjoy it. Read More ›

More Interested in the Asides
Greg Wilson / 2011-05-14
So how did the term go, you ask? Here's what traffic on this site looked like: The three spikes in March where we had up to ten times our usual 250-a-day visitors were the articles on tuple spaces, literate programming, and graph layout—in other words, the articles that weren't about teaching scientists and engineers basic computational skills. As the man said, "So it goes." Read More ›

Damn the Torpedoes (but I could use some help navigating)
Greg Wilson / 2011-05-13
Despite my own calculations (which were optimistic to begin with, and are more so now that I have a new job), I'm going to try to turn Software Carpentry into a book. Here's what's I'm planning to include, and its presesnt status; I would be very grateful for feedback, especially from alumni, about what doesn't need to be here, and what's missing. I think I'm still overshooting the knowledge and needs of far too many people; if you think so too, please let me know. Status codes: done partial undone Total 129623 The Shell 12709 Databases 9778 Regular Expressions 7987 Performance 7590 Sets and Dictionaries 7566 Software Engineering 6378 Matrix Programming 6232 Make 5389 Phylogenetic Tree Example 1652 Invasion Percolation Example 1276 Spreadsheets 3893 Configuring Programs 2126 Tracking Provenance 1786 Version Control 6569 Persistence 5978 XML 5957 Functions and Libraries 5335 Testing 4501 Images and Sound 3720 Web Programming 3372 Data Types and Data Management 2823 Parallel Programming 905 Security 318 Lists, Loops, and Conditionals 79 Systems Programming 45 Objects and Classes 45 Building GUIs 24 Read More ›

The Architecture of Open Source Applications
Greg Wilson / 2011-05-06
My apologies for how quiet this site has been recently—finding a new job takes time [1], and we had to make one final push on the other project I've been working on. But I'm pleased to announce that The Architecture of Open Source Applications is now at the printer's—we're going to give the first copy a final going-over, then make it available for purchase (hopefully within two weeks). I'm very pleased with how it looks, and very grateful to my co-editor Amy Brown, the contributors, and the reviewers for all their hard work. One note, though: if and when you buy it, please buy it directly from Lulu.com, rather than through Amazon or another outlet—it makes an enormous difference to how much money we raise for charity: You buy from: Lulu Amazon You pay: $35.00 $35.00 Lulu gets: $3.74 $0.94 Amazon gets: $0.00 $17.50 Amnesty gets: $14.98 $3.78 Read More ›

The Hacker Within at MSU in June
Greg Wilson / 2011-05-03
The University of Wisconsin's Hacker Within team are running a bootcamp at Michigan State University June 4-5 2011—see the announcement for details. Space is limited, so sign up now! Read More ›

Managing Data
Greg Wilson / 2011-05-02
We have just posted a screencast on managing data written by Orion Buske, a graduate student in bioinformatics at the University of Toronto (and former Software Carpentry TA). We hope you enjoy it. Read More ›

Chapters
Greg Wilson / 2011-04-23
I've converted three more Software Carpentry lectures to chapter format, bringing the total to four: the shell number crunching with NumPy regular expressions software engineering It's all CC-A licensed, and yeah, the images could be a lot better: they're just screen captures from the original PowerPoint slides, because re-doing them as SVG felt even more like yak shaving than converting the text. Please let me know if this is useful. Read More ›

In Praise of Street Fighting
Greg Wilson / 2011-04-22
Sanjoy Mahajan's Street-Fighting Mathematics is subtitled "The Art of Educated Guessing and Opportunistic Problem Solving". As the author says in the introduction, "Too much mathematical rigor teaches rigor mortis: the fear of making an unjustified leap even when it lands on a correct result." It's only 134 pages long (including the index), but it's packed full of practical ideas for tackling mathematical problems; in a sense, it aims to train people to do the kind of rough-and-ready calculations that David MacKay deploys so effectively in Sustainable Energy—Without the Hot Air. Books like this make me wonder what a computing equivalent might look like. What are some useful heuristics for tackling programming problems? Does it even make sense to think in those terms, given that programming mostly a non-numerical activity (our programs may push numbers around, but we don't produce them via calculation)? Still pondering... Note: both SFM and SEWHA are available online. Read More ›

Holding Up a Mirror
Greg Wilson / 2011-04-18
Cameron Neylon always has interesting things to say. In a recent talk, he commented on my tendency to idealize laboratory practice when contrasting it with the sorry state of computational work. Thought-provoking... Read More ›

Prototyping
Greg Wilson / 2011-04-11
The numbers might not make sense, but I'm still curious about whether Software Carpentry would make sense as a book. To find out, I spent five hours converting the existing lecture on regular expressions to chapter format. I'd welcome feedback about how useful this is. In particular, without the highlighting and animation of PowerPoint slides and video, how much harder is it to understand what's going on? What could be done to make it easier? More pictures? (I really want to draw arrows to point at the tabs and spaces in the sample data files...) Fully hyperlinked audio? Side-by-side display of code and text (which would be difficult to squeeze in, but not impossible)? Please let me know... Read More ›

By The Numbers
Greg Wilson / 2011-04-09
So what next for Software Carpentry? One possibility is to turn the scripts for the episodes, the examples, and the diagrams in the slides into a book. Let's crunch some numbers to see how feasible that is: Topics 24 ...mostly done (say, 80%) 18 ...not even started 6 Words per topic 5,000 Words to write 48,000 Words per day 1,500 Days required 32 Editing, diagrams, etc. ×2 Total days 64 Code complete July 15, 2011 That completion date assumes I start this coming Monday (April 11, 2011). The schedule assumes a 5-day rather than a 7-day week, but that's because the longer hours would actually be less productive (see Evan Robinson's "Why Crunch Mode Doesn't Work" for a summary of the research). 1500 finished words per day, and the factor of two for editing and diagrams, are both based on my experience with several previous books, so there's not much give there. Unfortunately, mid-July two and a half months past the end of my funding. So close, and yet so far... Read More ›

Using Bein
Greg Wilson / 2011-03-31
Many thanks to Frederick Ross for putting together a short screencast on Bein, a workflow manager and miniature laboratory information management system (LIMS) built in Python that fills the gap for the working scientist between the classical shell and big workflow managers like Galaxy and major LIMS systems like OpenBIS. Please have a look and see if it could help make your computational life easier. Read More ›

Harder Than It Should Be
Greg Wilson / 2011-03-31
Someone once said, "Chemistry is basically anything chemists will give each other awards for doing." Or something like that—Google doesn't find matches for that exact quote. Even if I've mangled it, the idea is sound: art is no more and no less than what great artists accept as being art. So what is computer science? More particularly, what constitutes the core of computer science? What's the stuff that everyone who calls themselves a "computer scientist" should know, or at least have seen? One way to answer the question would be to look at what people are given prizes for, but that's turning out to be harder than I expected, and the reason highlights a gap in this course. Let's start with the two biggest academic prizes open to the whole spectrum of CS: the ACM Doctoral Dissertation Award, and the A. M. Turing Award, which is often called "the Nobel Prize of computing". The page I linked to lists the names of the Dissertation Award winners from 1978 to the present, but those links take you to pages that have nothing more on them than the name of the prizewinning thesis (and in some cases, a press release or a photo of the winner accepting a check). There's no useful metadata anywhere to be seen: not keywords (which is what I'm after), not links to scholarly databases (so that I could write a script to harvest keywords), nothing. I could write a script to googlewhack the author's name and thesis title, but the half-dozen pages I looked at were formatted in three different ways, so that smells like a lot more effort than I'm willing to put in to do something that my local second-hand stereo parts store has supported since 2005 (or maybe even earlier). The Turing Award site is a bit better: once you figure out that you have to select a sorting order to get the landing page to display more than the most recent winner, the sub-pages that the main page links to do contain a few sentences explaining why each winner won. There's still no structured metadata, though, so something that I know could be done in 10 minutes looks like it would take half a day, which means I'm not going to do it. Software Carpentry doesn't really talk about this issue anywhere. It shows you how to use a database, and the essay on provenance nods to the value of structured metadata without going over to say hello, but that's about it. I'm constantly taken aback by how much time real scientists spend looking things up and chasing things down (journal editors are unlikely to take "or something like that" as sufficient citation for a quote like the one that started this post). We really should include something about the computational side of knowledge management and discovery in this course, but for the life of me, I don't know what—if you do, please tell me. And if you have any clout with the ACM, please point out that since they require people to specify topic keywords when submitting papers for publication, it would be only fair of them to give us back a few keywords when we need them... Read More ›

Spring 2011 Course Over
Orion Buske / 2011-03-30
The Spring 2011 course is now over! We had a wonderful time, learned a lot, and hope the same is true for everyone who participated in the course. We would love it if you would take a minute or two and give us any comments, feedback, and suggestions you have regarding the course and your experience in it. To start the ball rolling, Erin Osborne gave us some awesome feedback, posted with permission: Orion, I'm really glad I took this class. I often had experiences where something that we covered in the Software Carpentry course was brought up in a lecture or in lab the following week. Lucky me! So I think I was at the perfect level to benefit from the class. I think the most difficult lectures for me were: — Python — Testing — Objects and Classes — This one was challenging, but I was used to the pace by the time we got to this part of the course. The easiest lectures for me were: — The shell command — Regex — mysql I think the biggest determining factor as to whether a module was easy or hard for me was based on previous experience. This must make it challenging for you guys to teach this class since everyone has some different set of previous experience. I don't think I would have been able to get through the course without 1) the TA's and 2) outside reading materials. Once I realized I was in for more than I expected, I went to the library and rented a lot of books listed on the web pages... especially python books. These were really helpful. The sections I will make use of most from here on out are: — svn — A lot of my programs were already set up by labmates using SVN, but I wasn't taking full advantage of SVN's capabilities. — regex — I use regex all the time, but some of the themes covered in the regex lecture helped me to branch out of my typical searches. — piping in the shell — My shell commands are much more streamlined and efficient now — testing — I have incorporated tests into some of my existing programs — sql — I would really like to use this more, and there are some existing sql databases available in my field! Though I really don't think I'll be using python too much after this class, I'm glad I was exposed to it and I wish I had learned it earlier. My lab is deeply entrenched in perl, so I'll probably stick with that. However, I really found it fascinating to understand the python way of thinking. Very elegant and nice! Thanks for sharing. I tried to think of a things that could make the class more effective. Some of these issues may purely have been me missing something obvious, but it may help. — A clear syllabus with dates on it. I didn't know until the end what the syllabus was. Maybe it was just somewhere obvious but I couldn't find it. — I could have really used the answers to the previous week's homework at some point. I think I learn a lot from reading other people's scripts and from deciphering how the codes are different from my own. In previous courses I have taken, the TA's just dumped a selection of different students' work into a folder for perusal. It was nice to read the different strategies. — I could have used a little more intro and instruction into python and in the testing lecture. At the very least, some links to web pages or materials with background would have been helpful. You guys did a great job. Thanks so much! Erin. This is exactly the sort of information that is crucial to making this course as beneficial as it can be. What worked? What didn't? We intended to highlight a variety of student solutions on the forum each week, but it never ended up happening. What else did we miss that would have helped you? Thank you all again. It has been a pleasure, Orion Read More ›

Practical Computing for Scientists at Stanford
Greg Wilson / 2011-03-30
Prof. Risa Wechsler writes from Stanford: We are teaching a course at Stanford this term inspired by Software Carpentry. The course, "Practical Computing for Scientists", will be a Student Initiated Course, led by two physics undergrads (former summer research students with my group)—the course will be targeted at physics undergrads with a bit of programming experience to prepare them for summer research, but we expect participation from undergrads from other fields and some grad students and postdocs as well. I am hopeful that we will be able to keep this going as a regular student-led course with richer material as it develops. We expect that we will use some of your materials as well as some materials developed here. The website for the course is http://physics91si.stanford.edu/, where you can find the syllabus and the preliminary handouts. We start tomorrow—it's a 10 week course. We'll be happy to keep you posted on how it all goes if you are interested. Read More ›

And I'm on a Horse
Greg Wilson / 2011-03-26
Patrick Mackenzie (whom I've never met) gave a good lightning talk at the Business of Software that sums up a lot of what we haven't done for Software Carpentry. Any reworking of the material really (really) has to be built around what you (the researchers) need, rather than what we (the programmers) know. Read More ›

A Better Way to Teach Programming to Scientists
Greg Wilson / 2011-03-24
This year's SIGCSE conference on computer science education featured a very cool paper by Robbins, Senseman, and Pate, of the University of Texas at San Antonio, called "Teaching Biologists to Compute using Data Visualization". In it, they describe CS 1173: Data Analysis and Visualization using MATLAB, which introduces students in the life sciences to programming using a problem-first approach. Like the media-first approach that Mark Guzdial and his colleagues introduced at Georgia Tech, this rewards students right away—they don't have to wade through a "CS first" morass of data types and arcane rules about Boolean expressions for three or four weeks in order to get to the useful bits. Given another year, I'd have rewritten our intro to Python this way... Read More ›

Our First Episode on Microsoft Access
Greg Wilson / 2011-03-23
We have just posted our first episode on using a database with Microsoft Access — many thanks to Utah State's Ethan White for creating it. Read More ›

You'll Need a Large Screen
Greg Wilson / 2011-03-22
I've been working on a graph showing the connections between the questions this course tries to address, our answers to them, the knowledge and skills needed to understand and apply those answers, and the big concepts behind that knowledge and those skills. The latest version is linked from the thumbnail below. I know it's tangled, but I hope it's the first step toward a roadmap for redesigning this course to serve your needs better. Feedback and assistance would both be very welcome. Read More ›

I'd Settle for 0.1%
Greg Wilson / 2011-03-22
In a recent article about computational thinking, Carnegie-Mellon's Jeannette Wing says: ...every scientific directorate and office at the National Science Foundation participates in the Cyber-enabled Discovery and Innovation, or CDI, program, an initiative started four years ago with a fiscal year 2011 budget request of $100 million. CDI is in a nutshell "computational thinking for science and engineering." 0.1% of that would keep Software Carpentry going for another year... Or if I'm allowed to be a curmudgeon for a moment, hands up those people who believe that $100 million is going to do 1000 times more for science and engineering than another year of work on these materials? *sigh* Read More ›

Videos of Autumn School Lectures
Greg Wilson / 2011-03-21
I'm pleased to announce that video recordings of the Software Carpentry lectures I gave in London last fall are now online at http://soundsoftware.ac.uk/autumnschool2010video. Please excuse the mustache—I don't know what I was thinking... Many thanks to Chris Cannam, Ivan Damnjanovic, Luis Figueira, and of course Mark Plumbley. Read More ›

Using a Debugger
Greg Wilson / 2011-03-21
We've just uploaded a new video showing how to use a debugger to track down a problem. Using a debugger instead of 'print' statements will save you a lot of time, and the skill transfers directly to pretty much any language. Please let us know what you think. Read More ›

On a Personal Note...
Greg Wilson / 2011-03-18
Since money hasn't materialized for another year of full-time work on this course, I'm now looking for a job. My CV is up to date, and my interview shoes are polished, so if you know of anything meaningful and interesting in downtown Toronto [1], please give us a shout. I'd obviously prefer something related to scientific computing, Python, and/or education, but wherever I wind up, Software Carpentry will keep going. As always, if you'd like to help with that, please get in touch. [1] After all the work we've done on this house in the last two and a half years, if I suggest relocating to my wife, I'll be doing it as a single man :-) Read More ›

Questions and Answers
Greg Wilson / 2011-03-17
A first cut of a question-and-answer matrix for this course is now up for viewing. It doesn't include everything, and the answers aren't particularly helpful without a second matrix showing how they connect to specific lecture topics and big principles, but I hope it's a start. Please give feedback as comments on this post; we'll add more Q's and A's in the coming days. Read More ›

Next-Generation Sequencing Course at MSU
Greg Wilson / 2011-03-16
Analyzing Next-Generation Sequencing Data June 6th — June 17th, 2011 Kellogg Biological Station, Michigan State University Instructors: Dr. C. Titus Brown, Dr. Ian Dworkin, and Dr. Istvan Albert. Applications must be received by March 25th for full consideration. See the full announcement at http://bioinformatics.msu.edu/ngs-summer-course-2011 for more information. Course Description: This intensive two week summer course will introduce students with a strong biology background to the practice of analyzing short-read sequencing data from Roche 454, Illumina GA2, ABI SOLiD, Pacific Biosciences, and other next-gen platforms. The first week will introduce students to computational thinking and large-scale data analysis on UNIX platforms. The second week will focus on mapping, assembly, and analysis of short-read data for resequencing, ChIP-seq, and RNAseq. No prior programming experience is required, although familiarity with some programming concepts is helpful, and bravery in the face of the unknown is necessary. 2 years or more of graduate school in a biological science is strongly suggested. Students will gain practical experience in: Python and bash shell scripting cloud computing/Amazon EC2 basic software installation on UNIX installing and running maq, bowtie, and velvet querying mappings and evaluating assemblies Materials from last year's course are available at http://ged.msu.edu/angus/ under a Creative Commons/use+reuse license. Read More ›

Graph Layout, Models vs. Views, and Computational Thinking
Greg Wilson / 2011-03-16
Money for me to keep working full-time on Software Carpentry hasn't materialized, so as I've mentioned in a couple of recent posts, I'm trying to find a way to organize the course material more coherently in the hope that other people will (finally) start to contribute content as well. As part of that, I have spent the last two days trying to draw a graph showing the questions Software Carpentry seeks to answer, and the concepts that underpin the answers. It's been harder than I expected, and I think the reasons might give some insight into how a computational thinker thinks. Option #1: use a generic drawing tool like Microsoft Paint. The upside is, it's easy. The downside is, it stops being easy as soon as I want to change things or collaborate with other people. Paint doesn't have any notions of boxes, circles, or text: instead, it manipulates the pixels in images directly. If I create a box with a text label, I can't group them together to make a single object, because there are no objects. I could select and cut a region containing the box and label, then paste it elsewhere, but that wouldn't move the links connecting the box to other boxes. Storing my graph as an image also makes it hard to collaborate. I can put the image in a version control repository, but if Grace edits her working copy while I'm editing mine, how do we merge our changes? It seems strange to me that image diff-and-merge tools don't exist for Subversion, Mercurial, and other systems, but that's a yak I'm not going to shave today. Option #2: use an object-based drawing tool like Visio (or a reasonably modern PowerPoint). This lets me group things, and links will stay attached as I move things around, but I still can't collaborate. Switching to OpenOffice or Inkscape doesn't help: yes, they can save things as XML instead of in a binary format, but existing diff and merge tools work don't understand the structure of that XML, never mind its semantics, so they report differences at the wrong level. It's as if my diff tool was working at the bitwise level, and reporting this: 01101101 01100001 01110100 01101101 01100001 01101110 instead of: m a t m a n The same is true of interactive graph editors like yEd, Gephi, and so on. If I have to eyeball two versions of a file and copy differences by hand, collaborating is going to be slow and error prone. Option #3: store my graph in a textual form that can be diffed and merged, and convert that textual form into the graphical representation I want (where "graphical" in this case means "visual", not "nodes and edges"). This is what LaTeX and HTML do: the human being creates content, and a tool transforms that content into something more readable. Most of the translation is automatic, but all tools of this kind provide some way to control things more exactly, e.g., to force hyphenation at a particular point in a word, to center-align a title, and so on. The best-known tool of this kind for graphs is probably GraphViz. Here's a snippet of the GraphViz .dot file I've written over the last couple of days: strict graph Course { q_automation [label="How can I automate this task?"]; q_avoid_bugs [label="How can avoid creating bugs in my programs?"]; q_code_reuse [label="How can I make my code easier to reuse?"]; ...more of these... a_algorithm_data_structure [label="Use the right algorithms and data structures"]; a_binary_data [label="Manipulate data at the bit level"]; a_build_tool [label="Use a build tool"]; ...more of these... q_automation -- a_build_tool; q_speedup -- a_parallelize; q_team_programming -- a_code_review; ...more of these... } So far so good: nodes and edges occupy a single line each, so differences will be easy to see. And if I'm brave, and speak a little C, I can put C preprocessor commands in my file to make it look like this: /* #define ANSWER(name, str) name [shape=box,fontcolor=red4,color=red4,margin="0.05,0.0",label=str] #define QUESTION(name, str) name [shape=octagon,fontcolor=navyblue,color=navyblue,margin="0.05,0.0",label=str] #define QA(q, a) q -- a [arrowhead=open] strict graph Course { QUESTION(q_automation, "How can I automate this task?"); QUESTION(q_avoid_bugs, "How can avoid creating bugs in my programs?"); QUESTION(q_code_reuse, "How can I make my code easier to reuse?"); ...more of these.. ANSWER(a_algorithm_data_structure, "Use the right algorithms and data structures"); ANSWER(a_binary_data, "Manipulate data at the bit level"); ANSWER(a_build_tool, "Use a build tool"); ...more of these... QA(q_automation, a_build_tool); QA(q_speedup, a_parallelize); QA(q_team_programming, a_code_review); ...more of these... } Why would I do this? Well, I'm eventually going to add two more kinds of nodes: concepts (like "metadata") and specific lecture topics (like "regular expressions"). I may want to show all four kinds in a single graph, but I will probably also want to show just the answers and lecture topics, or just the questions and concepts, and so on. With an interactive tool like Gephi, I'd have to hide some nodes, then rearrange the ones that were still visible (and then put them back when I un-hid the hidden nodes). If I'm compiling, on the other hand, I can undefine the macros for the nodes and links I'm not interested in on the command line when I run the C preprocessor, and then feed the output to GraphViz for layout. The key idea here is the separation between model and view. The model is the stuff: in this case, the nodes in the graph and the edges connecting them. The view is how that model is presented to a human being, such as a static image (almost impossible to edit meaningfully, but easy to understand), or a dynamic rendering in a tool like Gephi (easy to edit, and also easy to understand). The textual representation is actually just another view: it isn't the model any more than what's on screen in the Gephi GUI. We often think of the textual representation as being the model because it's what we store, and what other tools that are more obviously view-ish take as input. At this point, I'd like to say "Q.E.D." and move on, but there's still one big problem: my compiler is broken. Well, it's not really mine—I didn't write the GraphViz tools—and it isn't really "broken", it just does a lousy job of laying out my particular graph. I've tried tweaking various layout parameters to no avail; what I've fallen back on in frustration is to store my nodes and edges in .dot file, then load it into Gephi, let it lay things out, then tweak the results manually. This is time consuming, but I'm willing to live with that: I know that graph layout is a wickedly hard problem, and anyway, I only expect to re-organize the graph every once in a while. For the question-and-answer graph, the best result I've obtained so far looks like this (with labels removed for clarity): What I'm can't live with is that this approach doesn't let me round-trip my edits. What I have in my file isn't actually a GraphViz graph; instead, it's a bunch of C preprocessor macros that compile into such a graph: Gephi can save my changes in a .dot file, but that's not what I want to store. I want the thing I save to be written in terms of my macros. Yes, I could write a small program to read node coordinates out of the Gephi-generated .dot file and stuff them back into my source file, or build an output adapter for Gephi, but that would be yak shaving: my goal here is to redesign a course, not to write Java to store a graph in format no more than three people will ever use. I don't have a tidy solution yet, and probably never will—as Tom West said, "Not everything worth doing is worth doing well." But as I said at the outset, I hope this story gives a bit of insight into how I think when I'm thinking computationally, and helps you figure out how to manage your data when the time comes. Read More ›

Twenty Questions (Minus Two)
Greg Wilson / 2011-03-15
Following up on last week's musings about reorganizing the course, we've drawn up eighteen questions that we think cover the reasons people come to this course. That leaves us two short of the traditional twenty: what would you add to this list? How can I automate this task? How can I read/control hardware? How can I control my program? How do I count things? How can I clean up this data? How can I get insight into my data? How do I track down bugs in my program? How can I parse this legacy data file? How can I save data so that I can read it later? How can I tell if my program is working correctly? How can I make this program easier to use? How can I use remote machines? How can I reuse this legacy program? How can I share my work with others? How can I make my program faster? How can I make my code easier to reuse? How can I keep track of my work? How can I plug my code into a framework like Galaxy or Taverna? Read More ›

Call for Participation
Greg Wilson / 2011-03-15
Are you a computational scientist (but not a computer scientist) who develops scientific software? That is, do you use mathematical models to describe scientific processes and then implement these models in your software? Is your main reason for software development to advance science? Is your software used by a wider community of scientists (and maybe not only scientists)? Would you be willing to spare about one hour of your time to participate in a PhD study? If you answered yes to all the above questions, please email Aleksandra Pawlik, a PhD student from the Open University whose research focuses on software development practices of computational scientists. Your participation will only involve being interviewed for approximately one hour at a time suitable for you. The study is completely anonymous. Read More ›

What To Demand
Greg Wilson / 2011-03-12
Peter Norvig (formerly of NASA, now at Google) recently gave a talk titled "What to Demand From a Scientific Computing Language". It's a good talk (and not just because he explains why he's a fan of Python). I was a bit disappointed, though, by this list: Shouldn't there be a version control system on this list? And some sort of provenance tool? OK, that's a trick question: there aren't any provenance tools in widespread use, but what about testing tools? There are lots of those, and there's even a name ("xUnit") for the whole JUnit-style family of tools across different languages. Saying "these aren't really core to scientific computing" is sort of like saying "disinfection isn't really core to surgery". Read More ›

Science Illustrated
Greg Wilson / 2011-03-11
Videos of lectures from a two-day symposium on scientific visualization called Science Illustrated are now online. Many thanks to Mubdi Rahman for organizing it. Read More ›

Musing About Reorganization
Greg Wilson / 2011-03-11
I'm increasingly unhappy with the organization of this course. On the off chance that funding materializes and we're able to undertake a major redesign, I'd like to explain why and ask for your input. Right now, our lectures are broken into topics along lines a computer scientist would instantly recognize: basic programming, regular expressions, databases, and so on. That is not how members of our intended audience see things when they first come to us—if it was, they probably wouldn't need this course. They start with problems like: How do I read this data file? How can I share my program with other people? How should I keep track of thousands of input and output files? How do I save the state of my program so I can restart it? How can I use the program my supervisor wrote ten years ago to solve my current problem? Their answers cut across traditional CS divisions: re-using a legacy program, for example, may require basic programming, the shell, systems programming (such as subprocesses and I/O redirection), and some parsing. The traditional solution is to view this as a matrix, and order topics to get to problems as quickly as possible. If the matrix is: Topic A B C D Problem X + . . . Y + . + . Z . + + + then the "best" order for teaching is [A, C, {B, D}]. Of course, this assumes that we know the problems, and how they depend on topics. We had some vague ideas a year ago, and know a lot more now, but there's something else we ought to take into account: the big ideas of computational thinking. For example, the idea that "programs are data" crops up in many different places in this course: a version control system treats the source code of a program as data, while passing a function as a parameter or storing it in a list only makes sense if you understand that runnable code is just bits in memory. So should we build a matrix of problems vs. principles? Or a cube of questions, CS topics, and principles? I think the answer is "no", because I believe these principles cannot be taught or applied directly. In my experience, the only way to get them across is to come back after learners have been doing things that depend on them and point out the unifying principle. I therefore think that the next big step for this course is to: draw up a list of representative computational problems in science and engineering; figure out what researchers need to know in order to solve them; build the matrix; derive a topic order; and figure out when each principle can be pointed out. The tricky bit is that when we say "representative problems", most people think in terms of traditional disciplinary boundaries and offer us one fluid flow problem, one gene sequencing problem, and so on. Our notion of representative is different: we're thinking of things like reformatting data files, improving performance, sharing or testing code, and so on. That's why we need your help. Have another look at the list at the top of this post. What should we add? What problems are you wrestling with, and what have you needed to know to solve them? "How do I use the shell?" is the wrong kind of answer—we want to know what problem you think the shell is the solution to, and why. Read More ›

High Tech That Looks Low Tech
Greg Wilson / 2011-03-09
These videos about climate change are great: they look low tech, with hand-drawn diagrams and low-fi narration, but I imagine a lot more work went into their animation than into our slideshows. Read More ›

Advanced Scientific Programming in Python
Greg Wilson / 2011-03-09
Advanced Scientific Programming in Python A Summer School by the G-Node and the School of Psychology, University of St Andrews Scientists spend more and more time writing, maintaining, and debugging software. While techniques for doing this efficiently have evolved, only few scientists actually use them. As a result, instead of doing their research, they spend far too much time writing deficient code and reinventing the wheel. In this course we will present a selection of advanced programming techniques, incorporating theoretical lectures and practical exercises tailored to the needs of a programming scientist. New skills will be tested in a real programming project: we will team up to develop an entertaining scientific computer game. We use the Python programming language for the entire course. Python works as a simple programming language for beginners, but more importantly, it also works great in scientific simulations and data analysis. We show how clean language design, ease of extensibility, and the great wealth of open source libraries for scientific computing and data visualization are driving Python to become a standard tool for the programming scientist. This school is targeted at PhD students and Post-docs from all areas of science. Competence in Python or in another language such as Java, C/C++, MATLAB, or Mathematica is absolutely required. Basic knowledge of Python is assumed. Participants without any prior experience with Python should work through the proposed introductory materials before the course. Date and Location: September 11—16, 2011. St Andrews, UK. Preliminary Program Day 0 (Sun Sept 11) — Best Programming Practices Agile development and Extreme Programming Advanced Python: decorators, generators, context managers Version control with git Day 1 (Mon Sept 12) — Software Carpentry Object-oriented programming and design patterns Test-driven development, unit testing and quality assurance Debugging, profiling and benchmarking techniques Programming in teams Day 2 (Tue Sept 13) — Scientific Tools for Python Advanced NumPy The Quest for Speed (intro): Interfacing to C with Cython Best practices in data visualization Day 3 (Wed Sept 14) — The Quest for Speed Writing parallel applications in Python Programming project Day 4 (Thu Sept 15) — Efficient Memory Management When parallelization does not help: the starving CPUs problem Data serialization: from pickle to databases Programming project Day 5 (Fri Sept 16) — Practical Software Development Programming project The Pac-Man Tournament Every evening we will have the tutors' consultation hour: Tutors will answer your questions and give suggestions for your own projects. Applications You can apply on-line at http://python.g-node.org. Applications must be submitted before May 29, 2011. Notifications of acceptance will be sent by June 19, 2011. No fee is charged but participants should take care of travel, living, and accommodation expenses. Candidates will be selected on the basis of their profile. Places are limited: acceptance rate in past editions was around 30%. Prerequisites: You are supposed to know the basics of Python to participate in the lectures. Please consult the website for a list of introductory material. Faculty Francesc Alted, author of PyTables, Castelló de la Plana, Spain Pietro Berkes, Volen Center for Complex Systems, Brandeis University, USA Valentin Haenel, Berlin Institute of Technology and Bernstein Center for Computational Neuroscience Berlin, Germany Zbigniew Jedrzejewski-Szmek, Faculty of Physics, University of Warsaw, Poland Eilif Muller, The Blue Brain Project, Ecole Polytechnique Federale de Lausanne, Switzerland Emanuele Olivetti, NeuroInformatics Laboratory, Fondazione Bruno Kessler and University of Trento, Italy Rike-Benjamin Schuppner, Bernstein Center for Computational Neuroscience Berlin, Germany Bartosz Telenczuk, Institute for Theoretical Biology, Humboldt-Universitat zu Berlin, Germany Bastian Venthur, Berlin Institute of Technology and Bernstein Focus: Neurotechnology, Germany Pauli Virtanen, Institute for Theoretical Physics and Astrophysics, University of Würzburg, Germany Tiziano Zito, Berlin Institute of Technology and Bernstein Center for Computational Neuroscience Berlin, Germany Organized by Katharina Maria Zeiner and Manuel Spitschan of the School of Psychology, University of St Andrews, and by Zbigniew Jedrzejewski-Szmek and Tiziano Zito for the German Neuroinformatics Node of the INCF. Read More ›

Literate Programming
Greg Wilson / 2011-03-07
Last week's post about the tuple space programming model was so popular that I thought readers might enjoy a discussion of another beautiful idea that failed: literate programming. Like Lisp and other toenail-based languages, it inspires a kind of passion in its fans that is normally reserved for gods, sports teams, and angsty rock bands. And, like them, it leaves everyone else wondering what the big deal is. Literate programming was invented by Donald Knuth (one of the few real geniuses ever to grace computer science) as a way of making programs easier to understand. His idea was that the code and the documentation should be a single document, written in a free-flowing mixture of Pascal and TeX, C and LaTeX, or more generally, a text markup language and a programming language. Functions, classes, modules, and other things could be introduced and explained in whatever order made sense for human readers. One tool would extract and format the text-y bits to create documentation, while another would extract and compile the code-y bits to produce the runnable program. It's a great idea, and for about six months in the late 1980s, I was convinced it was the future of programming. I could use δ as a variable! I could call a function for calculating sums Σ(...)! My explanation of what the code was doing, and the code itself, were interleaved, so that whenever I changed one, I would naturally change the other, so that they never fell out of step! And with a bit of tweaking, I could produce a catalog of functions (this was before I started doing object-oriented programming), or present exactly the same content in breadth-first order, the way it was executed (which was usually easier for newcomers to understand). Cool! But then I had to maintain a large program (20K lines) written with literate tools, and its shortcomings started to become apparent. First and foremost, I couldn't run a debugger on my source code: instead, my workflow was: "compile" the stuff I typed in—the stuff that was in my head—to produce tangled C; compile and link that C to produce a runnable program; run that program inside a debugger to track down the error; untangle the code in my head to figure out where the buggy line(s) had come from; edit the literate source to fix the problem; and go around the loop again. After a while, I was pretty good at guessing which lines of my source were responsible for which lines of C, but the more use I made of LP's capabilities, the more difficult the reverse translation became. It was also a significant barrier to entry for other people: they had to build a fairly robust mental model of the double compilation process in order to move beyond "guess and hack" debugging, whereas with pure C or Fortran, they could simply fire up the debugger and step through the stuff they had just typed in. I also realized after a while that the "beautiful documentation" promise of LP was less important than it first appeared. In my experience, programmers look at two things: API documentation and the source code itself. Explanations of the code weren't actually that useful: if the programmer was treating the code as a black box, she didn't want to know how it worked, and when she needed to know, she probably needed to see the actual source to understand exactly what was going on (usually in order to debug it, or debug her calls to it). The only role in between where LP was useful lay in giving an architectural overview of how things fit together, but: that was something people only really needed once (though when they needed it, they really needed it), and that level of explanation is really hard to write—exactly as hard, in fact, as writing a good textbook or tutorial, and we all know how rare those are. So I moved on, and so did most other fans of LP. But then Java happened, and history repeated itself, not as tragedy, but as farce. The first time I saw Javadoc, I thought it looked like it had been invented by someone who'd heard about literate programming in a pub, but had never actually seen it. I later realized that was unfair: Javadoc was the closest thing to LP that Java's inventors thought they could get away with, and it actually did lead more programmers to write more documentation than they ever had before. But saints and small mercies, look at what it doesn't do: There's no checking: you can document parameters that don't exist, or mis-document the types and meanings of parameters that do. You can only put Javadoc at the start of a class or method, rather than next to the tricky bit of code in the middle of the method that implements the core algorithm. (Though to be fair, if the method is long enough that this is a problem, it should probably be refactored into several smaller methods.) There's no logical place for higher-level (architectural) documentation: Javadoc really is designed for describing the lowest (API) level of code. You have to type and view HTML tags. That last point might seem a small one, but it's the key to understanding what's actually wrong with this model. Think about it: everyone who's writing Java has, on their desktop, a WYSIWYG tool such as Microsoft Word that renders italics as italics, links as links, tables as tables, and so on. When they start writing code, though, they have to type <strong>IMPORTANT</strong> to emphasize a word, or something as barbaric as: <table border="1"> <tr> <td colspan="2" rowspan="2" align="center">Result</td> <td colspan="2" align="center">left input α</td> </tr> <tr> <td>>=0</td> <td><0</td> </tr> <tr> <td rowspan="2" align="center">right<br/>input<br/>β</td> <td>>=0</td> <td>1</td> <td>0</td> </tr> <tr> <td><0</td> <td>0</td> <td>-1</td> </tr> </table> to get something that anyone else in the 1990s (never mind the 21st Century) would create with one menu selection: Result left input α >=0 <0 right input β >=0 1 0 <0 0 -1 And don't get me started on diagrams: every decent programming textbook has block-and-arrow pictures of linked lists, dataflow diagrams, and what-not, because these aid understanding. Not source code, though; the closest you can come is to create a diagram using some other tool, save it as a JPEG or PNG, put it somewhere that you hope it won't be misplaced, and include a link to it in your source code. The picture itself won't be visible to people looking at your code, of course—they'll have to decode the link and open the picture manually, assuming of course that it hasn't been misplaced—but hey, if their intellects are so weak that they need pictures, well, what are they doing looking at code anyway? The tragedy (or irony) is that we know how to solve this problem, because we've been solving it for other people for almost forty years. Electrical engineers and architects don't use Microsoft Paint to draw circuit diagrams and blueprints; instead, they use CAD tools that: store a logical model of the circuit or building in a form that's easy for programs to manipulate; display views of that model that are easy for human beings to understand and manipulate; and constrain what people can do to the model via those views. What's the difference? In an architectural CAD package, I can't put a door in the middle of nowhere: it has to be in a wall of some kind. In Emacs or Eclipse, on the other hand, I can type any gibberish I want into a Java file, or write Javadoc about an integer parameter called threshold when in fact I have two floating point parameters called min and max. That CAD package will let me show, hide, or style bits of the model: I can see plumbing and electrical, but not air vents, or windows and doors but not floors, and so on, and I can see those things in several different ways. When I'm looking at source code, I can't even see my Javadoc rendered in place. The root of the problem is that programmers—including the ones who design programming languages—still insist that programs have to be stored as sequences of characters, and that that's all that will be stored. Even new languages created by really smart people stay stuck in this sandpit. Why? Because that's all that compilers and debuggers and other tools understand? Well, you're writing new ones anyway, aren't you? No, I'm convinced that the real reason is that plain old text is the only common denominator that programmers' editors understand. Most programmers will change language, operating system, nationality, even gender before they'll change editors. (Hell, I'm typing this in Emacs, rather than using a WYSIWYG HTML editor—how sad is that?) Most therefore assume, probably correctly, that if a language requires people to give up the years they have spent learning what Ctrl-Alt-Shift-Leftfoot-J does, they will ignore it. They'll continue to build level editors for computer games, but use a souped-up typewriter to do it. Sooner or later, though, one of the many multi-modal CAD tools for programmers that people have built over the years will take off, just as object-oriented programming and hypertext eventually did after gestating in obscurity for years. I've argued before that the most likely candidate is a proprietary programming environment like Visual Basic or MATLAB, where a a single vendor with a more or less captive audience can roll out a whole toolchain at once without worrying arguing it through standards committees. I'm not holding my breath, though; while the recent surge of interest in "innovative" programming languages is welcome, it feels to me like everyone is trying to design better balloons rather than saying, "Hey, birds are heavier than air—why don't we give that a try?" Read More ›

Tuple Spaces (or, Good Ideas Don't Always Win)
Greg Wilson / 2011-03-01
I've resisted adding a module on high-performance computing to this course for a lot of reasons: I think other things are more important, there's enough coverage elsewhere, the software is hard for novices to set up... But there's another reason, one that may not be as good, but still has a seat at the table. Deep down, the reason I'm reluctant to teach MPI (the de facto standard for parallel programming) is that there's a much better model out there, one that works on all kinds of hardware, is comprehensible to novices, and delivers good performance on a wide range of problems. Its name is tuple space, its most famous implementation is Linda, and unfortunately, for a lot of reasons that I still don't understand, it somehow became an "also ran" in parallel programming. How easy is Linda? The examples in this article, and this well-written little book, are pretty compelling, but since the first is behind a paywall, and the second is out of print, here's a short overview. A tuple space is, as its name suggests, a place where processes can put, read, and take tuples, which are in turn just sequences of values. ("job", 12, 1.23) is a tuple made up of a string, an integer, and a floating-point number; a tuple space can contain zero or more copies of that tuple, or of tuples containing other types of values, simple or complex. A process puts something in tuple space with put(value, value, ...). It can take something out with take(...), or copy something (leaving the original in tuple space) with copy(...). The arguments to take(...) and copy(...) are either actual values, or variables with specific types; values match themselves, while types match things of that type. For example: put("job", 12, 1.23) puts the tuple ("job", 12, 1.23) in the tuple space if f is a floating point variable, take("job", 12, ?f) takes that tuple out of tuple space, assigning 1.23 to f but take("job", 15, ?f) blocks, because there is no tuple in tuple space matching the pattern (12 doesn't match 15) and if i is an integer variable, copy("job", ?i, ?f) assigns 12 to i and 1.23 to f, but leaves the tuple in tuple space. There are non-blocking versions of take(...) and copy(...) called try_take and try_copy (the names vary from implementation to implementation) that either match right away and return true, assigning values to variables in their patterns, or fail to match, don't do any assignment, and return false. There is also eval(...), which takes a function and some arguments as parameters and creates a new process. Whatever (tuple of) values that function returns when it finishes executing is then put in tuple space—this is how one initial process can spawn many others. And that's it. That's the whole thing. It's easy, easy, easy for beginners to understand—much easier than MPI. And compile-time analysis of tuple in/out patterns can make it run efficiently in most cases; adhering to some simple patterns can help too. But for a whole bunch of reasons, it never really took off: not as a language extension to C, not as JavaSpaces, not in various homebrew implementations for agile languages like Python, and that makes me sad. It's as if the metric system had failed, and we had to do physics with foot-acres and what-not. But I guess that's the world we live in... Read More ›

We Got a Mention in Comm. ACM
Greg Wilson / 2011-02-25
The March 2010 issue of Communications of the ACM has an article on grid computing that mentions Software Carpentry and Titus Brown's course at Michigan State—yay! Unfortunately, Software Carpentry didn't get a URL, and the bit.ly URL for Titus's course currently redirects to the wrong place. Later: the bit.ly URL is apparently working again. Read More ›

An Easy Place to Start: Systems Programming
Greg Wilson / 2011-02-24
As a follow-up to recent posts on how to contribute, what better looks like, and Elango Cheran's secure shell episode, here's a specific request: we would like three episodes on systems programming, by which we mean writing programs to: manipulate files and directories (chdir, stat, etc.) working with archives (tar, zip, etc.) run other programs (the subprocess module in all its fearsome glory) A typical use case is writing a Python script to run a compiled legacy application a few hundred times for slightly different parameter values, putting each run's output in a separate directory that's created on the fly, and so on. If you're interested, please get in touch — it's easier than you think. Read More ›

Ask, And Ye Shall Receive
Greg Wilson / 2011-02-23
Thanks to Elango Cheran, we now have an episode on the secure shell—yay! Read More ›

What Better Looks Like
Greg Wilson / 2011-02-22
Paradise is exactly like where you are right now... only much, much better. — Laurie Anderson It's hard to make things better if you don't know what "better" looks like, so in the wake of some email responses to last week's post on how to contribute, here are some thoughts on what we'd like Software Carpentry to look like when it's finished (or as finished as something like this ever is). First, this site will offer short tutorials that are directly relevant to scientists and engineers who want to get more done with computers in less time and with less pain. Each of those tutorials will be available in several formats, including recorded video, plain HTML, and downloadable slides. Topics will be both practical (e.g., common Unix shell commands) and foundational (e.g., how hash tables work); the latter will be included to help people understand, connect, and generalize the former. Second, this material will be used in lots of courses: half-day conference tutorials on single topics, week-long bootcamps, single-semester classroom courses, peer-to-peer study groups, and of course self-directed online study. And we'd like to see lots of instructors: grad students teaching helping other grad students work through it, corporate trainers repackaging it and selling on-site delivery, professors combining some of our material with their own notes on bioinformatics or combustion dynamics, and so on. Third, there will be an active community discussing, updating, fixing, and enlarging the content. Right now, Software Carpentry has a bus factor of one: Version 3 languished from 2005 to 2010 while I was busy with other projects, and unless a "Wikipedia effect" kicks in (or more funding materializes so that I can keep going for another year), Version 4 will start going stale on May 1 as well. As I have said elsewhere, our long-term goal is to raise both standards and skill levels among the overwhelming majority of scientists and engineers who don't think of themselves as computationalists. We will know we have successed when their computational results are as reliable as their physical experiments, and when programming doesn't inspire any more fear and loathing than firing up an oscilloscope. Read More ›

Three More Episodes on MATLAB
Greg Wilson / 2011-02-19
Three more episodes on MATLAB are now available, two on input and output, and one on plotting. Thanks to Tommy Guy for putting them together; as always, feedback is appreciated. Read More ›

Scientific Computing Podcast
Greg Wilson / 2011-02-18
inSCIght is a podcast that focuses on scientific computing in all of its various forms. Every week we have four panelists engage head-to-head on poignant and interesting topics. The panelists are drawn from all across the scientific computing community. From embedded systems experts to very high level language gurus, from biologists to nuclear engineers, the hosts of inSCIght use computers to solve science and engineering problems everyday. This podcast throws people, ideas, and opinions into an audio-blender hoping to educate and entice each other and the world. Read More ›

Mirroring Software Carpentry
Greg Wilson / 2011-02-18
As another followup to the recent post on how to contribute, I'd welcome pointers to places that might host or mirror Software Carpentry material. Efforts like MIT's OpenCourseware and CMU's Open Learning Initiative seem to host only the content they produce themselves; the Khan Academy seems open to contributions, though I don't see any, and then there's Academic Earth. Where else could we put our content to get it in front of more people? Read More ›

Reddit on Scientific Programming
Greg Wilson / 2011-02-17
Nature's article on scientific computing has attracted 75 comments on Reddit. Some interesting ideas being kicked around; please go ahead and add your own. Read More ›

I Want Their Software
Greg Wilson / 2011-02-16
This blog post about the Khan Academy might gush a bit, but the embedded video is a must-watch for anyone who cares about education. I really want their software... Read More ›

How to Contribute
Greg Wilson / 2011-02-16
Following up on yesterday's post, I have written a short guide explaining how you can help Software Carpentry improve and grow. Reporting bugs, suggesting changes, contributing content, or running the course yourself: they're all a lot easier than you think, and a great way to gain some experience with new kinds of teaching and learning that are becoming more important with every passing year. Read More ›

Top Ten Why Nots
Greg Wilson / 2011-02-15
We've had quite a few firsts and successes in the past ten months, but our biggest failure continues to be the lack of contributions from users and educators: while quite a few people are using our material, only one person has (so far) volunteered to create material for us. If that doesn't change, Software Carpentry will stagnate as soon as I have to move on to other things (which, given my lack of success in raising another round of funding, will probably happen this spring). Coincidentally, Wynn Netherland recently posted a list titled "Top ten reasons why I won't use your open source project". It's not about contributing per se—I'll have to go back to Karl Fogel's excellent Producing Open Source Software to start figuring that out—but it's still worth going through. You don't have a README. We have an explanation of who this course is for and what it's about, but nothing that explains how to contribute. You don't include tests, specs, features, examples. Doesn't really apply. You have no project home page. We do. You need design help. I think we look OK. You don't have a domain name. We do. You don't have a Twitter account. We do. Your licensing is unclear. Our license is very clear and easy to find. You don't reach out to me. I think we pass this test. You don't speak about your project at conferences at meetups. We could do more; the question is where? You didn't submit it to the Changelog, which apparently is this week's hot "what's on in open source" blog. *shrug* All in all, #1 seems like the only major gap. If we fixed this—if we wrote a "how to contribute" guide—how many of you would be interested in creating short screencasts for us? It would help your peers, and look shiny on your CV too... Read More ›

First Four MATLAB Episodes
Greg Wilson / 2011-02-15
We have just posted the first four episodes of our introduction to MATLAB; more should be up this week and next. As always, feedback is very welcome. Read More ›

Two More Episodes on Spreadsheets
Greg Wilson / 2011-02-14
We have added two more short episodes on spreadsheets, covering pivot tables and named cell ranges. Please let us know if you find them useful, and what else we should cover to help researchers who use spreadsheets to work with data. Read More ›

Audio for Three Software Engineering Episodes
Greg Wilson / 2011-02-14
We've added audio (and a few images) for three of the episodes on software engineering (the introduction, agile development, and the principles of computational thinking). It's a slightly different format than most of our previous episodes, and we need to choose a better way of embedding MP3 audio in web pages, but we hope it's useful as is. Read More ›

Updates to Spreadsheet Lecture
Greg Wilson / 2011-02-11
We have just posted updates to the first seven episodes of the lecture on spreadsheets: the new version uses Excel 2010, and the screencasts are larger to make things easier to see. Please let us know what you think. Read More ›

What Computational Science Means to Me
Greg Wilson / 2011-02-08
My latest attempt to define what "computational thinking" actually means is now on the web in draft form—comments would be very welcome. In brief, the eight principles are: It's all just data. Data doesn't mean anything on its own—it has to be interpreted. Programming is about creating and composing abstractions. Models are for computers, and views are for people. Paranoia makes us productive. Better algorithms always trump better hardware. Automation is the key to acceleration. The tool shapes the hand. Read More ›

Scripts for Two More Software Engineering Episodes
Greg Wilson / 2011-02-03
We've posted the scripts for two more episodes on software engineering (but not the slides or videos, since they don't exist yet). I'm not happy with the discussion of agile: it feels like too much has been left out, but I'm not sure what to add without breaking our self-imposed 10-minute limit. I also don't know how much our audience needs to know: these episodes are meant to be end-of-course pointers to other material they could look at next, so we're after a big-picture guide to "should I dig deeper" rather than the details. As always, feedback is welcome... Read More ›

Three Months, Two Spikes, One Conclusion
Greg Wilson / 2011-02-02
Here are the traffic stats for the last three months at software-carpentry.org: November 2010 December 2010 January 2011 It's hard to make out what's happening because of the big spike in December, and the less prominent (but still significant) spike in November, so here's a summary: Once these spikes are removed, the average number of distinct visitors is slowly increasing each month. Posting a lecture on something popular boosts our readership dramatically, but only briefly, and only for that topic—most of those visitors don't stick around. So what were the spikes? The first, in November, came when we posted our Python lectures. The second, in December, was Tommy Guy's episode on how to build a simple recommendation engine, which in turn was based on an example from Toby Segaran's excellent book Programming Collective Intelligence. That was more popular than anything else we've ever put on the site, which is both encouraging (if we build it, you will come) and disheartening (I have no idea what else to build that will be that popular). Read More ›

First Episode on Software Engineering
Greg Wilson / 2011-02-01
I know we promised a lecture on high-performance computing, but today wound up going in a different direction: we have just posted an episode on empirical results in software engineering, which we hope will set the stage for discussion of how to manage larger teams and projects. We hope you enjoy hearing about some of the science behind the things this course teaches; as always, feedback is welcome. Read More ›

A Competence Matrix for Software Carpentry
Greg Wilson / 2011-01-31
Last summer, we tried to create a concept map showing how the big ideas in Software Carpentry related to each other. We weren't satisfied with it, so we set it aside. With our first anniversary coming up in three months, though, it's time to revisit the idea of a high-level overview of the course. To that end, I have posted a first attempt at a competence matrix that organizes skills into levels. Please leave comments on that page telling me what's wrong or missing. Read More ›

Research Study: How Do You Test Your MATLAB?
Greg Wilson / 2011-01-27
Have you ever wondered how scientists test their code? We have, and we'd like you to help us find out. If you use MATLAB, please have a look at our new research project—and please forward the link to other groups and lists where we might be able to recruit people. (Of course, we'd also welcome input from NumPy users...) See also these posts from Steve Eddins, who built the MATLAB xUnit testing framework, and thanks in advance for your help. Read More ›

Notes Toward a Lecture on High-Performance Computing
Greg Wilson / 2011-01-27
As I've said several times now, we're going to add a lecture on high-performance computing to the course. We have rough outlines of the content in the version control repository, and feedback would be very welcome. Introduction The Example Application Improving Sequential Performance Profiling Parallelism Defined Computer Architecture in Ten Minutes or Less Task Farming Basic MPI Collective Operations in MPI Read More ›

Bootcamp
Greg Wilson / 2011-01-27
A couple of weeks ago, I went to the University of Wisconsin — Madison to speak at a three-day software skills bootcamp run by The Hacker Within, a grassroots student group that's taking the same "teach what you know, learn what you don't" approach as the Peer-to-Peer University and other efforts. I was very impressed then, and am even more so now that I've had time to reflect on what I saw (and recover from my close encounter with Delta Airlines, whose motto appears to be, "We don't care because we don't have to"). And then, a couple of days ago, I got mail from a graduate student in Mississippi who'd like to organize an online run of Software Carpentry through P2PU. We'll post details soon, but it's got me thinking more about how we could experiment with different delivery models to get this stuff into as many hands as possible, as quickly as possible. I would therefore like to ask if you (yes, you) would like to organize a bootcamp on site wherever you are. We can provide content, share our experiences, and (perhaps most importantly) send someone who has done one of these before to wherever you are to help out. If you think you can get a couple of dozen grad students, faculty, lab mates, or fellow fans together for three days to learn and share, please add a comment to this post to let us know. Work as though you lived in the early days of a better nation. — Alasdair Gray Read More ›

Thinking Like the Web
Greg Wilson / 2011-01-26
Jon Udell's 1999 book Practical Internet Groupware was a revelation for me: it was the first coherent explanation I'd ever read of how the disparate collection of technologies and social conventions that we call "the web" fit together, and what the deeper patterns and concepts beneath them are. After a lot of further work and thought, Jon has condensed those ideas into seven principles—or as he puts it, "Seven Ways to Think Like the Web". These concepts are the most meaningful definition yet of what the phrase "computational thinking" actually means, and of what people who aren't programmers need to know in order to use the web effectively. As I said in the post on Tom Limoncelli's plea to software vendors, we'll know we're teaching the right things, the right way, when people who have done this course understand these principles and how to apply them. Read More ›

The Case Against Peer Review
Greg Wilson / 2011-01-26
Cameron Neylon recently made the case against peer review once again; the dialog near his posting's end is too accurate to be funny. In this light, calls for peer review of scientific software are probably misplaced; we need a better mechanism than one that's already mostly failing. Read More ›

Software Carpentry Sprint in July
Greg Wilson / 2011-01-26
Well this was a nice birthday present: the Python Software Foundation announced today that they will provide some support for a Software Carpentry sprint in Toronto in July. Thanks, folks—we'll post details about what, when, and where as soon as we have them. Read More ›

Fighting Spam
Jon Pipitone / 2011-01-26
We recently experienced a spate of spam and fake user accounts on the course discussion forums. In this post I'll briefly explain what we're doing to stop that from happening again, and how the technology works. The course forums are powered by bbPress, the forum software written by the WordPress folks. When I first installed it I left it with the default configuration settings and without doing anything in particular to lock it down from spammers. Bad idea. When I checked this morning we had over fifty fake user accounts, and two spam postings. I had to manually remove the fake user accounts — luckily this is easy to do since either the names (e.g. "google directions plus") or the user's website is obviously spammy. I then activated two plugins: Akismet to filter out any future spam posts and reCAPTCHA to stop automated computer programs (i.e. bots) from registering fake user accounts. Akismet works by inspecting posts or comments for known spam-like features and flags them as spam if they match. It is offered as a web service, so all we needed to install was a simple plugin for bbPress that sends each post and comment to the Akismet server and receives back an answer as to whether it is suspected to be spam or not. The reCAPTCHA plugin adds a visual or audio test to the registration form on the course forums. The test involves writing down scrambled words presented in an image or an audio clip and the test is designed to be possible for humans to answer correctly but not for computer programs. The reCAPTCHA system is unique in that by correctly solving the tests you are also helping to digitize books, newspapers and radio audio. That's because words for the reCAPTCHA tests come from one of those sources when they have been flagged as unreadable during an automated digitization process. Thus the words are known to be currently unreadable by computer programs and this guarantees that bots will be unable to pass the test. When a human passes the test they will have had to correctly read the word and the correct spelling is sent back to the folks digitizing the material. Nifty. You can learn more about exactly how the reCAPTCHA system works here. Read More ›

Scientists Aren't Stupid: Software Is
Greg Wilson / 2011-01-21
Last night, Mike Bayer (@zzzeek) tweeted: Why are "scientists", who are so dramatically smarter than me, such dumdums when it comes to basic programming skills: http://bit.ly/fWPtjW Taavi Burns (@jaaaarel) replied: Ask @gvwilson of @swcarpentry? So here I am, and what I want to say is: Scientists aren't stupid: software is. Seriously. I spent an hour and a half last night trying and failing to get Thunderbird and Dreamhost's email to play nicely together. Today, I'm wrestling with the fact that Python's multiprocessing library gets lost in infinite recursion if you try to do something crazy like, oh, I don't know, leave out the if __name__ == '__main__' check (but only on Windows). In both cases, the parties involved can explain why it does what it does, and in both cases, I just... don't... care. So here's what I want to say to software developers who wonder why scientists are dumdums. When a scientist says "a>b doesn't work for complex numbers", it's not their fault. When a programmer says "a>b doesn't work for email messages", odds are good that it's because two standards committees didn't talk to each other, or it's a deliberate fail for backward compatibility with something written in 1997, or something like that. The key concept here is one that Irving Reid introduced me to: learned helplessness. If something fails repeatedly, people learn that it's not worth trying to make it work again. That lesson sticks, even when circumstances change in ways that make success possible. Once someone has wasted hours trying and failing to get A, B, and C to play together nicely on their laptop—once they have spent hours figuring out why that failed, only to find that what they learned is useless when the next problem crops up—it's quite reasonable for them to stop trying, because the odds are against them succeeding without superhuman effort. As long as software conditions people to believe that, to first order, they're doomed—as long as "doubling your chance of succeeding" means raising the probability from 1% to 2%—maybe they're right to put that time into the science they wanted to be doing in the first place. Read More ›

MIT Rethinking OpenCourseWare
Greg Wilson / 2011-01-20
MIT's OpenCourseWare initiative was (and probably still is) the highest-profile "open and online" initiative in higher education. According to this article, they're rethinking their approach to help the self-directed learners who have emerged as their biggest users. I'm really excited by this, and by P2PU's peer-to-peer model, and all the others that are emerging (including the hybrid models we're trying out). I also think that asking "which model will win?" misses the point: each one is a best fit for a different combination of educator, content, and learner. Now, if only someone would build a presentation tool that played nicely with these models... Read More ›

How to Cite Software Carpentry
Greg Wilson / 2011-01-20
If you are citing Software Carpentry in papers or technical reports, the web site and the 2006 article in Computing in Science & Engineering are probably best; the other articles below provide other background that might also be useful. I hope to update the 2006 article some time this year, but a lot of other things are ahead of it in queue... @article{wilson-bottleneck, author = {Gregory V. Wilson}, title = {Where's the Real Bottleneck in Scientific Computing?}, journal = {American Scientist}, month = {January--February}, year = {2005}, note = {Discusses the difference between machine speed and human productivity, and explains why the latter is more important for most computational scientists.} } @article{wilson-software-carpentry, author = {Greg Wilson}, title = {Software Carpentry: Getting Scientists to Write Better Code by Making Them More Productive}, journal = {Computing in Science \& Engineering}, month = {November--December}, year = {2006}, note = {Summarizes the what and why of Version 3 of the course.} } @article{wilson-learn-history, author = {Greg Wilson}, title = {Those Who Will Not Learn From History...}, journal = {Computing in Science \& Engineering}, month = {May--June}, year = {2008}, note = {Argues that equating "scientific computing" and "high performance computing" is bad for the former, and detrimental to most computational scientists.} } @inproceedings{hannay-scientific-software-survey, author = {Jo Erskine Hannay and Hans Petter Langtangen and Carolyn MacLeod and Dietmar Pfahl and Janice Singer and Greg Wilson}, title = {How Do Scientists Develop and Use Scientific Software?}, booktitle = {Proc. 2009 ICSE Workshop on Software Engineering for Computational Science and Engineering}, year = {2009}, note = {Summarizes the largest survey ever done of how scientists actually use computers, what they know, and what they find difficult.} } @article{wilson-scientists-really-use-computers, author = {Gregory Wilson}, title = {How Do Scientists Really Use Computers?}, journal = {American Scientist}, month = {September--October}, year = {2009}, note = {A short (and more readable) summary of the survey reported in Hannay et al.} } @misc{software-carpentry, author = {Greg Wilson}, title = {Software Carpentry web site}, howpublished = {http://software-carpentry.org}, accessed = {December 2010}, note = {Main web site for Software Carpentry, replacing http://swc.scipy.org.} } Read More ›

Version Control and Newline Conventions
Luis Zarrabeitia / 2011-01-19
There are two widely used conventions for representing the end of a line and the start of the next. Unix-like systems have traditionally used one character, \n (ASCII 10, "line feed"), while Microsoft DOS and Windows used a sequence \r\n (ASCII 13, "carriage return", \n). Being invisible, the difference of conventions usually goes unnoticed, specially because most modern software can correctly display text written with either one. However, there can be interoperability issues when using editors that follow different conventions. One of such issues can manifest when using a version control system, like subversion. Suppose that one team member accesses the repository to make a small change. Most text editors will try to detect the convention used in the document, but some (broken) editors will display the text correctly, but will save it using its default convention, regardless of the original. If one of these editors is used to make a change, a one line modification could potentially span the whole file: the one line in which the modification was made, plus every line in the document where the editor decided to change the end of line symbol. If this change is committed, the diff for that revision would be nearly useless, because instead of highlighting only the change that was made, it would appear as if the whole document was removed and retyped. This revision would also be very difficult to merge (i.e, if there is a need to revert it in the future), and if another team member is also editing the same file, he will surely get a conflict, which may be hard to solve because it will be very difficult to spot the actual difference. There are other invisible characters that can cause similar issues, like tabs and spaces. It should be agreed beforehand which convention is going to be used, and each team member should make sure that their own commits follow it. It is a good practice to review the local editions before doing a commit (svn diff from the command line, or your favourite graphical diff tool). If lines that are apparently equal are reported to be different, that may be a sign that the text editor is changing the invisible characters. Reviewing the changes before committing can also help you make sure that only the desired changes make it to the repository, that throwaway or unnecessary changes are discarded, and that the commit contains one, and only one, logical change. Read More ›

Making System Administrators' Lives Easier
Greg Wilson / 2011-01-19
Google's Thomas Limoncelli (author of Time Management for System Administrators) has an article in ACM Queue titled "A Plea to Software Vendors from Sysadmins—10 Do's and Don'ts". His list is: Do have a "silent install" option. Don't make the administrative interface a GUI. Do create an API so that the system can be remotely administered. Do have a configuration file that is an ASCII file, not a binary blob. Do include a clearly defined method to restore all user data, a single user's data, and individual items. Do instrument the system so that we can monitor more than just, "Is it up or down?" Do tell us about security issues. Do use the built-in system logging mechanism. Don't scribble all over the disk. Do publish documentation electronically on your Web site. Every one of these applies equally to scientific software, big and small. We'll know we're teaching the right things, the right way, when people who have done this course know how to write software that obeys these rules and (more importantly) understands why they should. Read More ›

Exercises for Shell Posted
Greg Wilson / 2011-01-18
Orion Buske has posted exercises to go with the lecture on the Unix shell. These will get you to do real things with real bio data; please post questions and comments in the forum as you work on them. Read More ›

Demographics (part two)
Greg Wilson / 2011-01-17
Here are summaries (slightly edited) of what people taking the course do. There's quite a range... Stem cell biology, high-throughput screening, high-content analysis. My research aims to improve the resolution of ultrasonic images created during ultrasonic inspections of thin metal sheets and metal welds though the use of digital signal processing algorithms. I am currently working on learning Django to create websites. I am also developing new course content in programming and scientific computing for a 4-year degree in GIS. I do simulations of protein folding and assembly for naturally disordered proteins, the kind that are implicated in degenerative diseases like Alzheimer's and Parkinson's. I also use dimensionality reduction techniques to find clusters in the conformation spaces of the proteins. I am working on computational simulation, trying to understand the basic rules underlying protein dynamics I am currently working on modeling the accretion disks of cataclysmic variable stars. I also study High Mass XRay Binaries, as well as Low Mass XRay Binaries. We build and maintain global databases of fisheries statistics and analyze this information to uncover ways by which we can manage global fishery resources more sustainably. Use classical molecular dynamics running on clusters and supercomputers to simulate the dynamics of individual proteins from a variety of species. In particular I focus on proteins that are embedded in the cell membranes as these are involved in many different important physiological functions e.g. transport and cell signalling. Technical solutions to health and safety problems I am a molecular biologist and study how different genes affect cellular phenotypes. For example, I switch off specific genes (alone or in combination) using RNAi. Typically, I perform thousands of experiments using automated liquid handling stations (robots) and record the resulting phenotypes (typically as numbers). In addition, I study how gene expression is affected using next-generation sequencing techniques (RNAseq). Chemical health and safety, mostly data analysis of biological monitoring (that's urine and blood samples); creating software tools to help non-mathematicians run mathematical models (check out http://www.advancedreachtool.com/); some PBPK modelling (systems of differential equations of how chemicals move through the body). We study how information about sensory stimuli is transduced, transformed, and represented in the central nervous system. At present we focus on such processes in the early visual system. I am currently working on the fluid dynamics of blood droplet impact using a simulation method called Smoothed Particle Hydrodynamics. I am currently writing my code in Matlab, but may need to move to C or C++ to increase the speed of the computations. Developing algorithms for neuroimaging data analysis. This involves implementation of image processing algorithms as well as statistical tools. I am using statistical downscaling, which involves different types of multi-variate regressions, to predict surface observations from information provided by course resolution global climate models. Currently I am working on a project in which we are using downscaling to make future projections of the severity and likelihood of wildfire in BC. I am using R almost exclusively for this project. Running hydrologic models in a unix environment. Coding in R to analyze outputs including plotting and mapping. Querying data from netcdfs and adjusting code in a statistical downscaling model that can be in c, fortran, or shell. On the bioinformatics front I am comparing the evolution of the overlapping regions of the 4 genes from Hepatitis B Virus. I have used python to cut, copy and curate the sequences downloaded from Genbank to get a useable and well annotated dataset. Quite a task considering the sorry state of the data in the HBV database. I do research to understand how we recognize speech and how we learn to recognize speech. By speech recognition, I mean the mapping from the real-valued speech signal to abstract linguistic structure, like sentence structure and meaning. Methodology includes recording speakers of different languages, running psychological speech perception experiments on adults and infants, and computational modeling using machine learning algorithms. My research is in the fields of hydrology and aquatic ecology. I use both field and model approaches to understand the linkages between physical and chemical hydrologic processes and the resulting impact on aquatic ecosystems. At Census Bureau: Small area estimation for local government surveys. Ongoing: Statistical applications of specialized optimization methods; statistical modeling of systems of differential equations. Working on a Brownian dynamics simulation of secreted protein mass transport from adherent mouse embryonic stem cell (mESC) culture with perfusion of laminar flow; culturing mESCs for self-renewal or differentiation induces the secretion of many different signalling proteins. I'm working with a database of location and health information for 20,000 diabetes patients in California and trying to evaluate how/if their health condition is related to their neighborhood environments. I work with GIS to characterize their access to healthy and unhealthy food resources, parks, public transportation, and other environmental factors as the data becomes available. Applied Statistics with focus on Statistical Genetics. Numerical simulation of an impacting drop on to a solid surface with an open cavity which has a wide range of application such as optimizing the production cost of water repellent fabrics, predicting the quality of arc welding, improving printing quality... Long story short, application of CFD in simulation of multiphase fluids processes. The physics for blood flight in blood pattern analysis. Ultimately we're trying to create a program that can recognize blood stains and draw the path of their flight back to their source. Attention and working memory. Currently attempting to program a useful field of view task, and an experiment that tests memory for facial features. I study stem cell bioengineering, but try to bring a computational twist into things. So I'm working on models of cell proliferations and differentiation, and starting to look at some optimization methods to help with the engineering aspects of many projects within the lab. Investigation into the role of international seafood trade in the expansion of the world's marine fisheries. There is a lack of surface markers used to classify early cardiac progenitor cell types during differentiation from mouse embryonic stem cells. My project is to use mass spectrometry and microarrays to identify surface markers (and other proteins) involved in the differentiation process. Using bioinformatics and potential cell-cell interaction modeling, I would like to mine the data to extract potentially biologically relevant data pertaining to this process. I'm studying the effects of ocean acidification on marine organisms. This involves constructing a lab to manipulate and monitor water chemistry. Currently, I'm trying to understand if growing under different chemical conditions results in biological materials with different structural properties. I'm working on the molecular evolution of the small subunit of the prokaryotic ribosome. This is a structural informatics project dealing with analyzing 2 and 3-dimensional molecular structures and implementing energy algorithms for ribosomal RNA molecules. My current stage of work involves wetlab RNA production and functional assays on RNA fragments produced in the previous stage of analysis. Implicit Large-Eddy Simulation (ILES) and Direct Numerical Simulation (DNS) of transition to turbulence. Looking at both the physics of transition and the numerical methods require to model it accurately. I am trying to develop a new way of stimulating paralyzed muscle. I am using a technology called surface Functional Electrical Stimulation, which uses electricity to contract skeletal muscle. This method is very rough and does not give precise joint torque vectors. I am looking for a way to use this technology such that the joint torque created by the contracting is precise and repeatable. I'm currently studying the mechanisms by which neurons in the superior colliculus selectively respond to some visual stimuli but not others. What are the properties of these neurons and the input they receive that allow them to respond to specific sensory features? My work involves statistical downscaling, a way of relating future projections made by climate models at the large scale to the scale of weather stations. Climate models work well for broad studies of the climate system but are poor at describing small scale effects such as the climates of cities because of their low resolution (grid cells with sides on the order of hundreds of kilometres). In downscaling, statistical functions are used to relate the low resolution model data to weather station observations during past times. These functions can then be used to relate future climate projections to the smaller scales, yielding projections with much greater detail. Currently focusing on Application Security; identifying Cyber threats and mitigation strategies. Would like to become more proficient in programming and design to be able to develop my own Application Security tool set. A central theme in my research is the development of advanced (model-based) process control methodologies on the basis of such observed data, with a particular focus on chemical and metallurgical systems. This involves extracting complex, nonlinear regularities and trends in measured plant data. Subsequently, models are derived for the purposes of, among other, nonlinear system identification, process diagnostics, as well as insights into underlying process behavior. To this end, a number of mathematical and computational techniques are used including: * Nonparametric learning methods such as kernel methods, ensemble methods and artificial neural networks * Nonlinear time series analysis * Multivariate statistics and other exploratory data analysis techniques. I primarily do Computational Fluid Dynamics (CFD) modelling of combustion systems, ranging in size and application from pilot/experimental scale to industrial/power utility scale. The experimental scale work tends to be in new technologies, such as gasification and high pressure combustion, whereas the industrial scale work tends to look at design and/or operational issues in existing systems. For my PhD, I investigated a type of piezoelectric ultrasonic motor. There were two parts to my research. The first part was the coupled axial-torsional vibration of pretwisted beams; I derived beam equations to model the structure and compared the predicted resonance frequencies with results from finite element method. In the second part I studied the nonlinear dynamical system of a disk bouncing on a vibrating platform as a model of the stator-rotor interaction of the motor. My current area of interest is voting behavior and party strategies. I aim to combine two assumptions about voting behavior into one more general model. Such a heterogeneous voter model can tackle many empirical puzzles and explain political outcomes of democratic processes more accurate. Supporting 300+ scientists in biotech research field. I am doing research in statistical genetics. It is well known that genetic variants are related to many complex diseases. My work is to develop statistical methods for analysis this type of data. Many of the most pressing issues in ecology require understanding how biological systems respond to global scale changes in climate and habitat. In order to understand these impacts it is necessary to study ecology at large scales. Research in my lab focuses on using quantitative macroecological approaches, including large ecological databases, advanced statistical methods, and theoretical modeling to understand broad scale ecological patterns. Supporting clinical research data management at a commercial company. I'm about to enroll in a bioinformatics masters program with Johns Hopkins University and aspire to work on microbial ecological genomics (e.g. Human Microbiome Project etc). I write image and statistical analysis software in Matlab, more of statistical analysis at the moment than image analysis.The SW is a GUI tool for users (mostly our team at the moment) to run statistical analysis on neuroimages image sets. The image sets are mostly PET, MRI, and fMRI studies. Working on protein sequence analysis, phylogenetics, and evolutionary models. I do modeling and theoretical work related to superconducting quantum bits. Previously, I modeled a particular design for a superconducting quantum bit. At present, I am trying to design a protocol to transfer the state of a superconducting quantum bit to another, via a superconducting transmission line. I am working on methods for estimating the size of a genetic effect from large datasets in cases where the same dataset or same individuals is used for multiple purposes in sequence. Dataset sizes include 1M + variables, 2K + patients. Methods I'm using include bootstrap resampling, in which analysis of the dataset is repeated 10K-50K times. My current research will provide musicians a system to compose and perform with a sound corpus by exploring a three-dimensional space by means of non-contact gestures. I am also planning to do research in automatic music recommendation and playlist generation. I'm currently conducting psychiatric genetic research. This involves analyzing data from high-through DNA genotyping experiments. The focus of my thesis is to develop an reinforcement learning (RL) agent for augmenting the supervisory controllers on an industrial grinding circuits. The majority of these circuits depend on expert systems (ES) for supervisory control, which are by in large more static than they are adaptive. My research is focused on augmenting these systems; specifically, to introduce online adaptability to ES using RL. I am working on hydrologic modelling projects for several watersheds in British Columbia. This involves model set up, calibration (using multi-objective optimization tools), uncertaintly analysis. The models will be later forced with downscaled outputs from GCMs for future climate projections. I am also comparing the results with the RCM projections for the same catchment. High throughput imaging system. I am working on the connection between meteorological model output to hydrological model input for the purposes of water supply forecasting. Resource modeling for healthcare systems. I am a Political Scientist, working on policy processes in regional trade agreements. Statistically, I rely multivariate statistics, above all survival Models. In my most recent work I started to have a look at networks, too. Clinical trail Epidemiology Study Genome Wide Association Study Data base management CFD and Fluid-Structure Interaction studies of the repsonse of flexible bodies to various fluid loadings. Solid state quantum computing, particularly superconducting qubits and doped diamond spin assemblies to build scalable quantum architecture. I am working on Computational Fluid dynamics applied to reacting flows. Also interested in compressible flows over blunt body. Customizing a Plone content management system for use in a parliamentary context, and using the Deliverance theming application to present two applications to the end user via a single unified user interface. My research is focused on how neurons encode and decode odor information. I am working on the background reduction for the PICASSO Experiment. Human factors research. Statistical analysis on crash data. Explore driver distraction's effects on injury severities. I am currently investigating different copy number algorithms by doing a large-scale evaluation of many different algorithms. I am using bioinformatic methods to study microbial genomes and evaluate the effects of microbial community on ecosystem. Bioinformatics support. I am responsible for the public accessible databases and maintaining and updating public tools like UCSC genome browser, Ensembl browser, Galaxy server for internal use. I perform data analysis on high-throughput data from experiments designed to investigate the molecular mechanisms of how hypoxia (low oxygenation levels) in cancer tissues arises, and how cancer cells cope with this adversarial condition, with the hope that sufficient insight may eventually lead to novel therapeutic strategies. I work with both microarray and next-generation sequencing data. I work in Statistical and Bioinformatics analysis of complex disease. My previous work focused on GWAS of longitudinal traits related to diabetes. Currently I work on sequence and expression analysis of head and neck cancer. My work involves a lot of figuring out how to work with massive datasets and complex algorithms. I am a field ecologist switching to using digital evolution (Avida) for studies of evolutionary ecology. In my work, I look at questions related to the evolution of tolerance, group formation, and group stability, prerequisites for the evolution of sociality. My research includes developing experiments for in-silico testing of hypotheses of the evolution of cooperation and sociality which are based on in-situ observations of carnivore ecology. Research on mesoscale variability in the California Current using quasi-Lagragian isobaric floats. The main focus is on westward transport off Central California coast, its kinematic characteristics and mechanisms. Besides Lagrangian data, we use satellite altimetry, model outputs, standard oceanographic data. Housing market and monetary policy. I have a wide range of research interests from understanding the effects of climate change to understanding anthropogenic effects on coastal systems. At the moment, I am working on developing models to help improve our understanding of the processes affecting marine snow and particle flux I am developing techniques to perform deep sequencing on single cells. I am interested in how RNA transcripts partition asymmetrically after cell divisions critical in development. My research has a long-term goal of better understanding cell division processes that are typically mis-regulated in cancer cells. I study phylogenetic and biogeographic relationships among terrestrial mammals. I am interested in how geography and evolutionary processes interact to produce and maintain biodiversity. I also do some research on phylogenetic methods and ways of analyzing geographically explicit DNA data. 'I'm studying how agricultural practices effect microbial communities and the impact that these changes have on greenhouse gas flux from soils. I'm using metagenomic data and a microbial ecology approach to determine the diversity and composition of these communities. I'm also developing approaches to analyze metagenomic data Read More ›

Demographics (part one)
Greg Wilson / 2011-01-16
As near as we can tell, here's where this term's students are from. It's quite a mix... BEACON Center for the Study of Evolution in Action, Michigan State University Biostatistics Division, University of Toronto Boston University Centre for Addiction and Mental Health, University of Toronto Department of Aerospace Science and Engineering, University of Toronto Department of Biochemistry, University of Oxford Department of Biochemistry, University of Toronto Department of Economics, Queen's University Department of Geography, University of British Columbia Department of Mechanical Engineering, Polytechnic Institute of New York University Department of Mechanical and Industrial Engineering, University of Toronto Department of Microbiology and Molecular Genetics, Michigan State University Department of Molecular Medicine and Haematology, University of the Witwatersrand Department of Physics, Yale University Department of Physiology, University of Wisconsin — Madison Department of Process Engineering, University of Stellenbosch, South Africa FTW Telecommunications Research Center, Vienna, Austria Faculty of Science, University of Ontario Institute of Technology Fisheries Centre, University of British Columbia Friday Harbor Labs, University of Washington Georgia Institute of Technology German Cancer Research Center Health and Safety Laboratory, UK Howard Hughes Medical Institute IMS, Inc. Institute of Biomaterials and Biomedical Engineering, University of Toronto Institute for Environmental and Spatial Analysis, Gainesville State College James Franck Institude, University of Chicago Janelia Farm Research Campus, Howard Hughes Medical Insititute Lineberger Cancer Center, University of North Carolina McGill University Mechanobiology Institute, National of Singapore Michigan State University Nankai University National Evolutionary Synthesis Center National University of Singapore Natural Resources Canada Ontario Institute for Cancer Research Pacific Climate Impacts Consortium, University of Victoria Physics Department, California State University, Fresno Predictek Inc. SAIC Samuel Lunenfeld Research Institute of Mount Sinai Hospital, Toronto Stem Cell Bioengineering Lab, University of Toronto US Census Bureau United Nations Department of Economic and Social Affairs Universit of North Carolina — Chapel Hill University of British Columbia University of California Berkeley University of California Los Angeles University of Georgia University of Mannheim, Center for Doctoral Studies in Social Science Utah State University self employed unemployed Read More ›

The Hacker Within
Greg Wilson / 2011-01-14
I got back to Toronto late last night from visiting The Hacker Within, a grassroots student organization at the University of Wisconsin — Madison that helps science and engineering students figure out how to use computing in their research. I think it's a great model for schools elsewhere to adopt, and despite truly awful "service" from Delta Airlines [1], it was worth going down to chat with Katy, Nico, Paul, and everyone else, and to see a large auditorium filled with people (80? more?) learning how version control, the shell, Doxygen, and other tools can make them more productive researchers. And as a bonus, I can now take "sleep in a frat house" off my list of 1000 things to do before I die... :-) [1] Check out @gvwilson + #delta on Twitter for the blow-by-blow. Read More ›

Our Funding Pitch
Greg Wilson / 2011-01-14
A couple of people have recently asked, "How do you go about asking for money?" Applying for grants from NSERC, the NSF, and other agencies is something they understand (though it never actually worked for me), but cold-calling people, or emailing them out of the blue, isn't something most academics have ever done. I'm now doing another round of this, and since this is supposed to be an open-everything project, I've posted the short pitch I use as a starting point when approaching someone new. I hope you find it useful, and of course, feedback is always welcome. Read More ›

The Spring 2011 Course Begins
Jon Pipitone / 2011-01-10
The spring (or, if you'd rather, winter) run of the Software Carpentry course has started. We have 86 students currently registered. Here is the welcome email we sent out to the students late last week: Hello, Welcome to Software Carpentry. I'm Jon, one of the teaching assistants for this course, and I'm joined by TAs Orion Buske, Tommy Guy, Luis Zarrabeitia, and the course creator, Greg Wilson. In this email I'll tell you a bit about how the course is organised. First off, the course starts next Monday, Jan 10th and runs for 10 weeks. It is both a self-paced course and a guided course. Most all of the core course content is already up on the course website, and we are continuing to create lectures or revise existing ones based on feedback we get from you and others. We encourage you to explore the online material in whatever order suits you and to ask the TAs and other students for help when you need it. As well, each week will be covering a particular topic according to our lecture schedule[1]. Our schedule is designed to lead you sensibly through the online content. Each week we will send you an email notifying you of the week's topic, and we will prepare exercises for you to work on. We may also prepare the occasional quiz that covers several lectures at once, or other assignments. Being an online course, we are going to use a few different ways of communicating with one another. This mailing list is for course announcements from us. The course forums[2] are where you can ask and answer questions of each other and of the TAs. We also have a course blog and twitter account[3] where we discuss new ideas for course material or reflect on how the course is going. If you are interested in contributing to this project or giving your feedback as we develop it you should tune in there. You can always send us questions about course content or feedback privately. As well, each month we will be asking that you check in with us over skype, phone, chat or email to let us know how you are progressing and to give us feedback on the course so far. Running this course in an online format is quite new for us — this is only our second time doing it. Because of this we will be looking to you for feedback about what works and what doesn't[4]. Be mean if you have to, we can take it! We will also be expecting and encouraging you to interact with the other students taking this course. You each have a lot of great real-world experience to share with one another but the online format can be unfamiliar and maybe a bit daunting, so please just jump in. I look forward to working with you all, Jon. [1] http://software-carpentry.org/spring-2011 [2] http://software-carpentry.org/forums [3] http://software-carpentry.org/blog/ and @swcarpentry [4] We wrote a blog post to summarise own description of what went right and what went wrong last semester Read More ›

Software Carpentry in One Picture and Five Words
Greg Wilson / 2011-01-10
This 1995 Pirelli ad pretty much sums up what Software Carpentry is about: Read More ›

Slower Than Expected
Greg Wilson / 2011-01-10
We've started work on the high-performance computing lecture, and as part of that, we'd like to set a puzzle for our more advanced readers. We've written several versions of invasion percolation; the two of interest here use a list-based representation for the underlying grid, and a NumPy array. Run either program as: python filename.py -g -n 31 -v 100 -g turns off the end-of-run graphical display; -n 31 tells the program to use a 31×31 grid; and -v 100 tells it to initialize cells to random values between 0 and 100. Try varying the size, compare the speeds of the two programs, see if you can figure out why one is faster than the other, and then tell us: what you found, and how you found it. #2 is actually more important to us than #1, since we'd like to figure out how programmers actually go about diagnosing performance problems. Please post your findings as comment on this blog after Wednesday, Jan 12 (to give everyone else a chance to try it out too). Update: WordPress seems to "fix" indentation (leading white space) more or less randomly when I paste code into a <pre> area, so I'll link to the two files. Using Lists Using NumPy [Later version of lecture on invasion percolation] Read More ›

Funding (A Plea for Contacts)
Greg Wilson / 2011-01-09
I've been working full-time on Software Carpentry since the beginning of May 2010. As I said at the end of November, progress has been steady, but we're only at 60-70% of plan. I am therefore now looking for more funding so that I can continue developing material and delivering the course online this summer and fall. I would welcome pointers to potential backers, and introductions even more. (But please, only ones that are fairly focused: "have you tried the NSF?" or "there must be companies that need this stuff!" aren't actually very useful...) Read More ›

What I Learned From Software Carpentry
Greg Wilson / 2011-01-06
Anna Maidens, who works for the UK Met Office in Exeter and took part in our October 2010 Software Carpentry course there, did a presentation a few weeks ago to her team about what she learned. She has kindly given permission for us to post those slides; as always, we're eager to hear what other people have taken away from the course as well. Read More ›

First Half of Lecture on Object-Oriented Programming
Greg Wilson / 2011-01-06
The first four episodes of our lecture on object-oriented programming in Python are now available: Introduction Basics Interfaces Inheritance There's (lots) more to come, but early feedback on these would be very welcome. Read More ›

Software Carpentry Bootcamp Jan 12-14 in Madison
Greg Wilson / 2010-12-31
Registration is now open for a three-day Software Carpentry bootcamp Jan 12-14, 2011, at the University of Wisconsin — Madison, organized by the folks at The Hacker Within. I'll be speaking on the first morning, and hanging out the rest of the time trying to learn as much as I can. Look forward to seeing you there! Read More ›

More Detailed Outline for HPC Lecture
Greg Wilson / 2010-12-30
I've added a more detailed outline for a lecture on high-performance computing to the site; feedback on the content and order would be very welcome, as would pointers to tools that are easy to teach with (as opposed to being powerful in the hands of experienced users). Read More ›

Open Research Computation
Greg Wilson / 2010-12-27
By now, many of you have (hopefully) seen the announcement of Open Research Computation, a new journal devoted to "...peer reviewed articles that describe the development, capacities, and uses of software designed for use by researchers in any field." The editorial board includes several friends of this course; as one of them, Titus Brown, observed in his blog: ...the problem with the online world for scientists [is] there's no real systematized incentive to any of this online stuff. And that makes it really tough. I'm going through Reappointment right now... Nowhere on there is there a place for "influential blog posts" — how would you measure that, anyway? Same with software — I listed my various software releases on the "scientific products" page of the form, and have since been asked to describe and discuss the impact of my software. Since I don't track downloads, and half or more of the software hasn't been published yet and can't easily be cited, and people don't seem to reliably cite open source software anyway, I'm not sure how to document the impact. ...I'm extra-specially-pleased to be on the board of editors, not least because so far it seems like this journal is trying to break significant new ground. Our ed board discussions so far have included discussions on how to properly "snapshot" version control repositories upon publication of the associated paper...and considerations for "repeat" publishing of significant new software versions, as the software matures, in order to help encourage people to actually update and release their software. This new journal isn't a panacea, of course. It's going to take 3-5 years, or even more, to make a real impact, if it ever does. But I'm enthusiastic about a venue that speaks to a major theme of my own scientific efforts — responsible computing — and that could help in the struggle to place responsible computing more squarely in the scientific focus. I wish them the best of luck, and hope to see many contributions from alumni of this course in coming years. Read More ›

Elimination
Greg Wilson / 2010-12-27
I'm working up another essay on software design, and would like to ask readers of this blog how they handle something that comes up when simulating interacting agents. If your program models the behavior of a flock of birds, it probably looks something like this: # create birds birds = [] for i in range(num_birds): new_bird = Bird(...parameters...) birds.append(new_bird) # simulate movement for t in range(num_timesteps): for b in birds: b.move(birds) # need access to other birds to calculate forces There's a flaw in this—or at least, something questionable. By the time you are moving the last bird for time t, every other bird is effectively at time t+1. There are many solutions, the simplest of which is to calculate each bird's new position in one loop, then update the bird in another: # simulate movement for t in range(num_timesteps): new_pos = [] for b in birds: p = b.next_position(birds) # doesn't move the bird new_pos.append(p) for (b, p) in zip(birds, temp): b.set_position(p) (If you haven't run into it, the built-in function zip takes two or more lists, and produces tuples of corresponding elements. For example, zip('abc', [1, 2, 3]) produces ('a', 1), ('b', 2), and ('c', 3).) So far so good—but what if the things we're simulating can produce offspring, die, or eat one another? Offspring are relatively simple to handle: we just put them in a temporary list (or set), then append them to the main list after everything else has been moved. Removing creatures that have died is a bit trickier, because modifying a list as we're looping over it may cause us to skip elements (we delete the element at location i, then advance our loop counter to i+1, and voila: the item that was at location i+1 but has been bumped down to location i is never seen in the loop). We can handle that either by "stuttering" the loop index: i = 0 new_pos = [] while i < len(birds): state, p = birds[i].move(birds) if state == ALIVE: i += 1 new_pos.append(p) else: del birds[i] or by moving creatures that haven't died into another list, and swapping at the end of the loop: temp = [] for b in birds: state, p = b.move(birds) if state == ALIVE: temp.append((b, p)) birds = [] for (b, p) in temp: b.set_position(p) birds.append(b) I think the second is less fragile—modifying structures as I'm looping over them always gives me the shivers—but either will do the job. But now comes the hard case. What happens if birds can eat each other? If bird i eats bird j, for i<j, it's no different from bird j dying. But if bird j eats bird i, we have a problem, because bird i is already in the list of survivors. Do we search for it and delete it (in which case, the stuttering solution above is definitely not the one we want, because the indexing logic becomes even more fragile)? Or... or what? Set a "but actually dead" flag in the bird's record in the temporary list, and not move it back into the bird list after all in the second loop? What would you do, and why? Read More ›

Local Subversion Repositories
Greg Wilson / 2010-12-26
A colleague in the UK who is going to teach Software Carpentry asked about setting up repositories. In particular, he doesn't have a server where he can create accounts and repos, so he was thinking of using Git or Mercurial, and having students host their repos on their own machines. That's not actually necessary: if you're going the locally-hosted route, and giving each student a separate repository, you can still use Subversion: just use the "file:" protocol for connecting instead of "http:" or "svn+ssh:". For example: $ pwd /users/gvw $ mkdir demo $ cd demo $ svnadmin create jon $ svn checkout file:///users/gvw/demo/jon mycopy $ ls jon mycopy $ cd mycopy $ touch example.txt $ svn add example.txt A example.txt $ svn commit -m "Checking in an example file" Adding example.txt Transmitting file data . Committed revision 1. The repository can be anywhere on the local file system—I just put it and the working copy in the same directory so that they'd be easy to delete afterward. And a repository that you're accessing via the "file:" protocol can also be accessed through other protocols—SVN does a good job of separating protocol from storage. The only thing I trip over when I'm doing this is the triple slash: the protocol spec is "file://" (two slashes) and then there's the absolute path to the repo (which starts with another slash) making for three in all. Read More ›

Extended Examples
Greg Wilson / 2010-12-23
We'd like to add more extended examples to this course, both because they're fun and because they're a good way to show how our topics relate to one another. Right now, we have: an entire lecture on invasion percolation (which we'll shorten), phylogenetic tree reconstruction, and how to recommend papers to people. We plan to add a simple N-body simulation (easy to do badly, motivates testing, visualization, and a bit of numerical analysis), but after that, we'd like to find some examples from psychology, linguistics, and other areas that are starting to do more computational work. Parsing text files full of patient records, putting the information in a database, doing some analysis using SQL, and visualizing the results is one possibility; what else would you recommend or like to see? Read More ›

Compute Canada's 'Strategic' Plan Isn't
Greg Wilson / 2010-12-21
Last Friday—December 17—I received an email from Compute Canada. The emphasis is mine: A Strategic Plan for Compute Canada was a key recommendation of the International Review Panel. This plan draws on suggestions from that panel as well as the information and discussions from the Town Hall Meetings held earlier this year. It has been many months in preparation and review by Compute Canada's committees and must be submitted to CFI before the end of December 2010... Your comments are invited and should be sent...by end of day Monday, December 20th. Call me cynical, but I have to wonder how much feedback they really want if they're sending out more than 50 pages on the Friday of the weekend before Christmas, and insisting on replies by Monday... The plan itself is an even bigger disappointment. It is supposed to lay out the next decade's goals for the entire Canadian HPC community, but of the six goals listed in the executive summary, only one talks about people (or in government terms, "highly qualified personnel"), and the "Implementation Strategy" given is as vague as it could possibly be: "Work with universities to develop HPC support expertise and train researchers to use HPC effectively and efficiently." Sections 4.1.4 and 9.1.4 are equally vague—the latter acknowledges that "The key 'product' academia provides to businesses is Highly Qualified Personnel", but the only concrete plan I see is decoupling funding for people from funding for hardware. Again, call me cynical, but I expect that will result in less money for the former, not more... Nowhere do I see any mention of what matters most: giving scientists and engineers the foundational computational skills they need to use computers effectively—all kinds of computers, of all sizes. Big computers are vital pieces of experimental apparatus, and as the biggest line items in Compute Canada's budget, the priorities for choosing them need to be stated (and argued for). Without skilled people, though, those fancy machines just space heaters with blinking lights. The best way to see what Compute Canada's Strategic Plan should focus on is to redraw their tired old pyramid to show what things really look like: Compute Canada's Pyramid Reality If Compute Canada really wants to help academia and industry use high-performance computers more effectively, the picture on the right is the one that matters. If changing that doesn't become their #1 priority, the gap between what Canadian scientists and engineers can do and what they could do will continue to grow, to the detriment of all. Read More ›

Executable Papers
Greg Wilson / 2010-12-20
Elsevier is sponsoring an "Executable Paper Grand Challenge". If you have more than just ideas about the future of scholarly publication in computational science, it may be a good way to get them some press. Read More ›

Building a Recommendation Engine with NumPy
Greg Wilson / 2010-12-15
Tommy Guy's explanation of how to build a recommendation engine in NumPy, based on an example from Toby Segaran's excellent book Programming Collective Intelligence, is now online. Read More ›

Presents for the Holidays
Greg Wilson / 2010-12-14
Some of the best presents I have ever received have been recommendations: "Oh, you'd like this author," or, "You really should listen to this album." So, in the holiday spirit, please take a moment and share (in the comments) some pointers to computational resources that you think deserve to be better known. To get the ball rolling, I'm having fun playing with Ruffus, a simple Python library for constructing pipelined workflows. I'd also recommend Jonathan Weiner's Time, Love, Memory: A Great Biologist and His Quest for the Origins of Behavior. It's ostensibly a biography of Seymour Benzer (who spent his entire career exploring how genes determine behavior), but it's actually the best description I've ever read of how a successful long-lived research program actually works. Read More ›

Slides for First Five OO Episodes Online
Greg Wilson / 2010-12-13
Slides for the first five episodes of our lecture on object-oriented programming are now online; we'd welcome feedback on the approach we're taking. If you like what you see, we'll try to have recordings up before the holiday break. Read More ›

Winter 2011 Signup vs. Spam Filters
Greg Wilson / 2010-12-10
If you asked to enrol in the Winter 2011 run of the course, you should have received an email request from us on Wednesday pointing you at a form we'd like you to fill in, and another message today clarifying one of the questions. You should also then have received a message saying, "Some people are reporting that they only got #2 of the above, presumably because their spam filters ate #1, so please check." If you didn't get Wednesday's message, or this morning's repeat, please contact us directly and we'll sort it out. Read More ›

Performance and Parallelism
Greg Wilson / 2010-12-10
Some topics for a lecture on parallel programming: how to measure/compare performance (raw speed, weak scaling, strong scaling, Amdahl's Law, response time vs. throughput) the register/cache/RAM/virtual memory/local disk/remote storage hierarchy and the relative performance of each (order of magnitude) in-processor pipelining (or, why branches reduce performance, and why vectorized operations are a good thing) how that data-parallel model extends to distributed-memory systems, and what the limits of that model are the shared-memory (threads and locks) model, its performance limitations, deadlock, and race conditions the pure task farm model, its map/reduce cousin, and their limitations the actors model (processes with their own state communicating only through messages, as in MPI) It's too much (each point should be an hour-long lecture in its own right, rather than 10-12 minutes of a larger lecture); what do we cut, and what's in there that doesn't need to be? Read More ›

Where Are My Keys?
Greg Wilson / 2010-12-09
I was looking through some Python code a few days ago, and noticed that its author was using this: if something in dict.keys(): dict[something] += 1 instead of: if something in dict: dict[something] += 1 It seems like a small difference, but it's actually a very important one. The second form checks to see whether something is a key in the dictionary dict. The first form, on the other hand, creates a list of all the keys in the dictionary, then searches that list from start to finish to see if something is there. They both produce the right answer, but the fast form does it in constant time, while the slow form (the one with the call to dict.keys()) takes time proportional to the size of the dictionary. The really bad news is, the "error" is silent. What other examples have you seen of people getting the right answer the wrong way? And what more could we do to teach people how to avoid these traps? Does it have to be done case by case, or is there something larger or more general we could try to convey? Read More ›

How Do You Manage a Terabyte?
Greg Wilson / 2010-12-08
This question has come up a couple of times, and I'd welcome feedback from readers. Suppose you have a large, but not enormous, amount of scientific data to manage: too much to easily keep a copy on every researcher's laptop, but not enough to justify buying special-purpose storage hardware or hiring a full-time sys admin. What do you do? Break it into pieces, compress them with gzip or its moral equivalent, put the chunks on a central server, and create an index so that people can download and uncompress what they need, when they need it? Or...or what? What do you do, and why? Read More ›

Approaching Objects from a New Direction
Greg Wilson / 2010-12-07
We have just posted early drafts of slides for the first two episodes of our lecture on object-oriented programming. These take a "nuts and bolts" approach, showing more about how objects work than is usual in an introductory course in the hopes of dispelling some of the mystery and confusion that often surrounds them. Feedback would be very welcome... Read More ›

Pins, Balls, and Arbitrary Decisions
Greg Wilson / 2010-12-06
We'd like to include more extended examples like invasion percolation in this course, but they're surprisingly hard to write. One that seems simple at first is a Galton box simulator. As the diagram below shows, this is simply a a vertical board with interleaved rows of pins. Small balls are dropped in at the top, and bounce left and right as they hit pins until they land in the boxes at the bottom. Since each bounce randomly goes left or right, the distribution of balls is binomial, which, in the limit, approximates a normal distribution. At first glance, this should be pretty simple to code up. Our objects are: a world, which is W wide and H high; two walls at X=0 and X=W; P pins, each of which has an XY location; B non-overlapping bins, each W/B wide; and N balls, each of which starts life at X=W/2 and Y=H (i.e., the middle of the top of the board. The physics is pretty simple too: balls fall under the force of gravity until they hit a wall, a pin, or a bin. If they hit a wall or a pin, we calculate their new velocity using conservation of energy and a bit of trig. If they hit a bin, we increment its counter so that we can see how closely our final result approximates a bell curve. In order to calculate a new trajectory for a ball, though, we need to know exactly where it struck the pin, which means we need to know its radius and the pin's radius as well as the locations of their centers. We can simplify by making one or the other a point instead of a circle, but we obviously can't do that for both of them—the chances of any collisions at all would be pretty close to zero. The choice isn't symmetrical: if the ball has a radius, then it can rotate as a result of collision. That will affect its trajectory because its rotation will dissipate energy, so it won't bounce as vigorously. Point masses should give the same distribution as real balls, but there's another trap waiting for us. Think about what happens when we drop a ball vertically on the top center pin. Its velocity when it hits is (0, vy), so if the coefficient of elasticity is E, its post-bounce velocity is (0, Evy), so it goes uuuuuuup, and comes back dooooown, and bounces with velocity (0, E2vy) and goes up, and comes down, and on and on. Barring numerical catastrophe, it never hits any other pins at all: after a few iterations, it's just vibrating infinitesimally in the Y axis on the top edge of the first pin. We can "fix" this by introducing balls with some initial X velocity, but how much should they have? Alternatively, we could randomize bounces a little bit, but how? Do we calculate a post-bounce angle and perturb it by a small random amount? Or jitter the X and Y post-bounce energies? Or something else equally artificial? And no matter what we choose, how can we turn around after throwing together half a dozen arbitrary decisions and say that this is "simulating" anything? There are days when I really miss the Game of Life... Read More ›

Red-R
Greg Wilson / 2010-12-02
From a reader, a link to Red-R, a visual programming environment for R. The six-minute screencast on the documentation page shows what it's capable of. I don't know of any similar projects for Python—I'd appreciate pointers. Read More ›

Programmer Competency Matrix
Greg Wilson / 2010-12-02
There's no scientific research behind this tabulation of what programmers ought to know, and some of the categorizations are unactionably vague, but it's still a useful guide. Most of Software Carpentry is Level 1, with some Level 0 and Level 2 stuff blended in. Read More ›

Prerequisites (or, When to Say No)
Greg Wilson / 2010-12-02
How much should Software Carpentry assume students know before they start? Or to put a sharper point on it, how much should this course require students to know? To date, we have said that students should have at least one prior programming course, and have to understand loops, conditionals, functions, arrays, and simple file I/O: the kinds of things that are usually (but not always) covered in a CS-1 course, and hopefully remembered thereafter. To make this more concrete, we've said that students should be able to solve this problem in the language of their choice: Write a program that reads a rectangular matrix of numbers from a file, transposes that matrix, and writes the result to another file. The values in the matrix are separated by spaces, the number of rows and columns is not known in advance, and the program is not allowed to overwrite its input file. For example, if the program is invoked as transpose input.dat output.dat, it reads this: 0 1 2 3 4 5 6 7 8 9 from input.dat, and writes this: 0 5 1 6 2 7 3 8 4 9 to output.dat. Many of the students who took this course this fall online or at UCAR, the Met Office, or in London could have done this, but a substantial minority wouldn't have been able to. We therefore have three choices: Teach basic programming. We don't like this because it makes the course less useful to people who already know the basics, and because it's harder to teach the basics at arm's length than second-order stuff. Ignore the problem (basically, tell the students "sink or swim"). We don't like this because it feels like setting people up to fail. We also don't want people to come away feeling "programming isn't for me" or (even worse) "I just can't do it", which is often what happens in "sink or swim" scenarios. Give potential registrants a proficiency test like the one above, and encourage people who don't have the background knowledge we require to go elsewhere for now and then come back to us. We don't like this because there often isn't an "elsewhere" for those people: courses like the University of Toronto's CSC120 are pretty rare, and most free online material has a low signal-to-noise ratio or assumes too much background knowledge. (If that wasn't the case, Software Carpentry would consist primarily of links to other people's stuff...) So, what should we do? Please vote by posting comments... Read More ›

Peer to Peer
Greg Wilson / 2010-12-02
One of the things we think didn't work well in this term's online run of the course was peer-to-peer discussion among students. Such discussion is one sign of a vibrant open source project or educational community, but our mailing list was almost silent except for announcements we posted ourselves, and other than a few bug reports, there were very few comments from students on the lecture pages. We think students will get more out of the course if they talk more amongst themselves for several reasons: Learning: you don't really know something well until you teach it yourself. Scalability: there aren't enough of us to give everyone personal help (particularly not with numbers tripling next time 'round). Relevance: grad students in geology are more likely to know what other geology grad students need and will understand than we are. Sustainability: right now, Software Carpentry has a bus factor of 1. The more course participants help each other, the more likely it is that this project will survive its founder being hit by a bus abducted by aliens having to get a real job. What could we have done this time around to encourage more peer-to-peer discussion? What could we do next time? Please add your thoughts as comments... Read More ›

Fall 2010: What Went Right, What Went Wrong
Ainsley Lawson / 2010-12-02
As we're getting close to wrapping up this semester's offering of Software Carpentry, we've been putting some thought into the things that worked and didn't work in terms of how we administered the course. Here's what we came up with: What Worked Well Version Control — Students really seemed to recognize the value of incorporating version control into their work practices. Of all the topics covered so far, we've received the most positive feedback about version control. Midterm Quiz — This was useful for assessing the students' learning, and getting an idea of how many of the students were keeping up with the material. Check-ins — About a month ago, we had each of the students meet with us on Skype for 10 minutes or so. Again, this helped us get an idea of how the students were doing with the course material, and also allowed the students to give us feedback on things they'd like to see changed/added to SWC. TA Organization — Although we got off to a bit of a slow start, the Software Carpentry TAs managed to get organized in terms of the administrative aspects of the course. Division of labor, and explicitly assigning tasks to specific people seemed to work well. Asking Students for their Problems — We spoke to a few students about the kinds of computer-related problems they are facing in their research. Next semester we hope to write weekly blog posts in which we will work through the solutions to the problems that the students are reporting. They like us! — The feedback we've been getting from students indicates that they are enjoying the SWC course, and see the value of the material. What Went Wrong Silent Running — With the exception of the midterm quiz and check-ins, we were in the dark about how the students were doing, and whether or not they were keeping up with the lectures. There also wasn't much conversation among the students, or between the students and TAs. Prerequisites — The Software Carpentry course assumes that students have had some basic programming experience. We were not clear enough (or firm enough) about this. Greg Wilson, the course organizer, wrote a separate post about this. Initial Organization — Getting things running at the beginning of term took some time, and we weren't very successful at communicating to the students what they should be working on each week. Pacing — This term's lecture schedule was uneven. The first three topics (version control, spreadsheets, and databases) are very light compared to the fast-paced Python lecture series that follows. It appears that many students may have fallen behind at this point (but that may also have something to do with other coursework and deadlines). Perhaps the Python lectures assumed a higher level of prior programming experience than the first three lectures, and we saw a drop off because we had not fully expressed what the prerequisites were. The Second Check-in — Since the initial check-ins went so well, we decided to do them again this week. This time, very few students have taken the time to speak with us. Perhaps this was too soon to be asking for a second meeting, or we were encroaching on other courses' exams and projects. Exercises — We were late in creating the exercises for many of the lecture topics. Also, the style of the exercises is inconsistent, and we do not have solutions posted for some of them. Office Hours — Office hours were not well attended. In general, the only time that students signed up for office hours was when we specifically requested that they check in with us. Anything we missed? Current students: please help us out by leaving some comments! Thanks. Read More ›

Cast Your Votes
Greg Wilson / 2010-12-02
The solstitial holiday is approaching fast, but with a bit of luck, we should be able to get one more lecture up on the web before it arrives. Topics we could tackle are: a survey of empirical results in software engineering, building desktop GUIs, handling XML, or object-oriented programming. (Web programming isn't on the list because it will take more time than we have.) If you have a preference, please cast your vote in a comment on this post... Read More ›

First Four Episodes on Multimedia
Greg Wilson / 2010-11-30
Our first four episodes on multimedia programming, which cover basic image processing, are now online—comments are welcome. Read More ›

Winter 2011 Online Course Now Full
Greg Wilson / 2010-11-29
As of this morning, 102 people have signed up for our Winter 2011 offering—it will be the largest run of the course ever. Registration has now closed, but we will send out notices when we have dates for the next one. Read More ›

Next Part of Persistence Essay Online
Greg Wilson / 2010-11-26
The second part of the discussion of persistence is now online. We're almost up to how Python's 'pickle' module works; should get there over the weekend or on Monday. Read More ›

Hours So Far
Greg Wilson / 2010-11-25
The chart below shows cumulative hours on Software Carpentry since we started working on Version 4 in May. We're not as far along as I'd like, but progress has been pretty steady... Read More ›

Phylogenetic Trees
Greg Wilson / 2010-11-23
Elango Cheran has recorded an episodes for the sets & dictionaries lecture on reconstructing phylogenetic trees. It's a neat application of dictionaries to a problem that's usually explained as a matrix. Please check it out and let us know what you think. Read More ›

Four Episodes on Matrix Programming
Greg Wilson / 2010-11-23
Tommy Guy has recorded four introductory episodes on matrix programming with NumPy covering: Introduction Basic operations Indexing Linear algebra Please have a look and let us know what you think. Read More ›

Repository URL Change
Jon Pipitone / 2010-11-21
We've been having problems hosting the course's Subversion repository at its current URL (http://software-carpentry.org/swc), and so we've moved it to: http://svn.software-carpentry.org/swc This URL points to the same repository as before. If you already have a working copy checked out you can switch it to use the new URL by running the following shell command in the top-level folder of your working copy: svn switch --relocate http://software-carpentry.org/swc http://svn.software-carpentry.org/swc Read More ›

Mid-term Quiz Results
Ainsley Lawson / 2010-11-20
A few weeks ago we gave a quiz to the SWC students on the first three topics of the course: Version Control, Spreadsheets, and Databases. We did this in addition to our student check-ins in order to get a sense of how the students were doing with the material. Half of the class (24 students) submitted their answers. Results: Everyone did really, really well! The majority of the mistakes were small SQL syntax errors, or other similarly minor things. Problem areas: Spreadsheets — Absolute References: This was probably the most common error made on the quiz. There were quite a few instances where students failed to recognize the necessity of using absolute references in their solutions. (Questions 2 and 6 in the Spreadsheets section of the quiz.) Databases — Nested Queries: The final question of the quiz involved nested queries. A few of the students did not use nested queries correctly, or failed to use them at all. Version Control — Comments: Ah, of course. There was tendency for students to skip the "give a meaningful comment" part when explaining how to commit changes to a repository. Overall, the students did very well. We did not assign numeric scores, but the majority of the quizzes were perfect or near to it. Well done, Fall 2010 class! Read More ›

Now Annotated
Greg Wilson / 2010-11-19
We have added brief notes on the books in the course's bibliography—we hope you find them useful. Read More ›

Summary of student check-ins
Jon Pipitone / 2010-11-18
Over the last two weeks we've had short one-on-one conversations with the current students about how the course is going for them, and what they've enjoyed and haven't. We've heard from 30 students of the ~48 signed up in the course. For most students, their feedback was about the first three lectures (Version Control, Databases, Spreadsheets) and to some extent the Python lectures. Here is a summary of the feedback: Overall comments about the course: Generally, the feedback was pretty much all positive. Students are enjoying the course so far and see the benefit in learning the material. Some are already using techniques or technologies they've learned (mostly, version control), or are referring back to the course material. Almost everyone that responded liked the short episode screencast format. One student said that they had dropped out of the course because it was hard to decide in advance whether to committ without better idea of the structure/time commitment involved. Pacing: The opinion on pacing is generally mixed. Some students are finding the pacing fine, at least one feels we are moving too quickly, and several have found the pace too slow. A few of those that have said the pace was slow also have pointed out that spreadsheets & databases aren't applicable to them or that their interest is more on Python so their opinion might change as the course progresses. Time allocated for course work: The most common answer was about 1-2 hours a week spent on the course. Many students (over half) said that they have been busy and haven't had enough time to really work through the course material. A couple of students have said that they didn't intend to spend much time on exercises, but instead will just dip into the course if the material seems applicable to them. Several students mentioned that they find it challenging to do the work at their own pace. Exercises & Quizes: Generally students have said the exercises and quizes have been helpful and are good motivators to keep working on the course. Several students mentioned that the quiz was lengthy but not difficult. Version control: Most students that gave feedback about the version control lecture said that they were excited by it and thought it would be useful. A reoccurring issue that came up was on how to set up an SVN repository for themselves now that they understood how to use one. They liked the idea that we write up a short HOWTO directed to their sys admins. Spreadsheets & Databases: Generally students said they were familiar with spreadsheets and so didn't find the lectures useful. Not many students commented on the database lectures, but those that did either said it was easy or irrelevant to their work. As with SVN, the issue of how to create a database was raised (which we don't cover in our lectures). Python: Almost everyone seems excited about starting Python, but at the time of our check-ins not many people had gotten into the material. Those that had asked about accessing databases from python, numerical programming (which we are creating a lecture on), and raised the concern that there was a significant jump in difficulty from the spreadsheet lectures to the Python lectures. Other topics to cover: We had lots of interest, suggestions and requests for future topics students wanted to see covered in the course. They include: more advanced python programming Web programming (under construction) Programming in Matlab, and integrating with python Working with text (the lecture on regular expressions may helpful) Testing Shell, Make or generally automating scripts Perl Algorithms and parallel processing Read More ›

New Section for Essays
Greg Wilson / 2010-11-17
By popular request, we have moved the longer essay-style posts to a new section of this web site. We have also added a new essay on saving and loading data that also talks about metadata, registries, and the open/closed principle. Read More ›

Making Software Screencast
Greg Wilson / 2010-11-17
A short screencast about Making Software is now up on Amazon. In each of the book's 30 chapters, leading researchers in software engineering explain what we actually know about code review, statistical prediction of software faults, how programmers communicate, whether using design patterns makes code better, the relative quality of open and closed source software, and a host of other topics. We hope you enjoy it. Read More ›

Ratios and Rework
Greg Wilson / 2010-11-16
It's been six months and a bit since we started working on Version 4 of this course, so I'd like to share two things we've learned about creating online tutorial videos: It takes a long time. A lot of that time feels like it should be unnecessary. Let's start with "a long time". It takes me 3-5 hours to prepare a good slide deck for an hour-long lecture that I'm going to deliver in person. A slide deck for a screencast takes 8-10, because I need to prepare more slides: since there isn't a lecturer making eye contact and pointing at things to keep viewers engaged, a slide deck for online use has to have many more small transitions. Writing a script to go with those slides is another 3-5 hours, much of which is spent rearranging the slides as I discover things that don't work as well as I thought. I don't know if it's fair to count this as an extra cost, or whether it's just bringing time I would spend on the second run of a course forward to the first run, but it's still time. Recording audio takes about 1.5 times as long as the audio (if I stumble, I just pause, take a breath, and restart that sentence), so call it 1.5 hours. Editing takes longer than recording, so let's say 4-5 hours all in (again, including some rework as I notice glitches that escaped me before). Getting everything off my machine and onto software-carpentry.org is another hour by the time all is said and done, so an hour-long lecture made up of 5-6 episodes comes in between 20 and 25 hours. But now we come to #2: the stuff that feels like it should be unnecessary. If I want to make a change after an episode has been posted—even a small one—it's substantially more work than changing a few slides in a PowerPoint deck would be. First, I need to record new audio, or get whoever originally created the episode to find a microphone and record some snippets of MP3 for me (having the voice change for a slide or two in the middle of a screencast is very jarring). Second, I need to re-export the PowerPoint slides as PNGs, and if the number of slides has changed, rename some of them: Camtasia refers to imported image files by name, and if I simply re-export and re-import, I have to go back and change the time boundaries for all the images after the ones that have been updated. How bad is this? Well, I just fixed a small mistake in the episode on Python lists. It took almost an hour, start to finish, and the change in audio quality where the fix was made is painful. And then there's version control. PowerPoint is effectively a binary format: yes, there's XML in there, but version control treats that XML as lines of text, which means its diffs and merges are senseless. I've used HTML and LaTeX in the past—at least I can structure them so that line-oriented diff/merge is sensible—but both formats force a strong separation between text and image, where PowerPoint allows me to mix them freely as I would on a whiteboard. I don't like PowerPoint, but the final result is easier on the eyes than what I can do with today's HTML or LaTeX with equivalent effort. What do I want instead? I want something that: Plays nicely with version control (i.e., diff and merge just work). Allows me to mix text and images freely (PNGs ghettoed in text isn't good enough). Has stable open source authoring tools with a long likely life in front of them (so that investing in them won't be insanely risky). Includes a decent text-to-speech engine (something better than Xtranormal's, please) so that when I update what I'm displaying and the script of what I'm saying, I can just push a "compile" button and get a seamless video out the end. What's that? "And a pony for Christmas?" Yeah, that too. But let me share a secret with you: whoever builds this thing first will be rich, famous, and popular, because this—content creation and maintenance that the average instructor can afford—is the real bottleneck in online education. Read More ›

Counting Things (Part 2)
Greg Wilson / 2010-11-07
This post has been moved. Read More ›

Done In London
Greg Wilson / 2010-11-05
We wrapped up this week's class in London today: I think a lot of the students felt that they'd been in the wind tunnel most of the week, but the feedback was fairly positive: Good new content (hadn't seen before) covered a lot (good to see things) easy to apply / pragmatic group exercises were fun / useful Subversion entertaining instructor (blah blah blah) better understanding of best practices persuasive arguments for best practices nice to meet everybody! accidental knowledge transfer (vertically and sideways) Bad too much too fast (got lost on Tuesday) spent time thinking about programming tasks while instructor was showing solution covered a lot (would have liked warmup before arrival) too much hummus (could have used more variety in food) not enough networking how does it run on my machine?? not enough connection with sound introduced software we may never use not long enough (need more time) no incentive to do exercises / homework Read More ›

Counting Things (Part 1)
Greg Wilson / 2010-10-31
This post has been moved. Read More ›

Would You Prefer...
Greg Wilson / 2010-10-30
Here are three instructional videos that I enjoyed watching: The Crisis of Credit Visualized Math is Not Linear Changing Education Paradigms What they have in common: They're informative. They're fun to watch. It must have taken a ton of work to create them. So my questions are: Do you like them more than what we're doing? Are they "better enough" than what we're doing to justify the extra effort? We'd produce half as much material if we did this—would that be a good tradeoff? Read More ›

Need Something to Debug
Greg Wilson / 2010-10-30
Following up on the previous post about the way Paul Dubois organizes good practices as "defense in depth", we would really like to include a lecture on debugging that does more than present a few general rules like "make sure you know what the right output is supposed to be" or "divide and conquer". More specifically, we need an example that contains a handful of qualitatively different bugs that can be found in qualitatively different ways: one by a code analyzer, another using coverage ("Wait, why isn't the 'else' ever being executed?"), a third by unit testing, a fourth by differential comparison of pre-change and post-change output, a fifth using breakpoints and single-stepping, and so on. If you have such an example, or are willing to help us construct one, please let us know. (And if you have ever wondered why it takes 35-40 hours to create an hour of instructional video, this is part of the reason: coming up with an example that can motivate and illustrate several related points is hard.) Read More ›

Dubois on Maintaining Correctness
Greg Wilson / 2010-10-30
Something else I didn't get to at the Met Office last week: Paul F. Dubois: "Maintaining Correctness in Scientific Programs". Computing in Science & Engineering, May-June 2005, http://doi.ieeecomputersociety.org/10.1109/MCSE.2005.54. Please, please, please, if you're building scientific software of any kind, find a copy and read it, because it's the best explanation I've ever found of why the right way to do it is, well, the right way to do it. Paul ties together a banker's dozen good ideas under the banner defense in depth: no matter how scrupulous we are, any of these might fail, so we have others in place to catch the mistakes that creep through. To quote the list that opens the paper, they are: Protocol for source control. Policies and procedures for managing the source can isolate errors when they occur. Language-specific safety tools. Each computer language has some facilities for ensuring correctness, but they're often underutilized. Design by contract. Bertrand Meyer's design by contract (DBC) methodology is a good fit with scientific programming, and its optional runtime checking of the contracts catches many errors. Verification. Defending against bad user input or data is separate from checking contracts. Reusing reliable components. We use third-party libraries for many things; the biggest benefit of reusing code isn't that you don't have to write it, but that the software is more likely to be correct already. Automating testing. A simple automation of testing procedures makes it easier to do as much testing as you ought to. Unit testing. Hand-in-hand with DBC, unit testing ensures component integrity. To-main testing policy. We insist on a certain level of testing before committing developments to our "main line." Regression testing. Additional nightly or weekly testing on all target platforms catches problems caused by our own errors as well as those caused by changes in environment. Release management. A disciplined approach to release management gives most users a stable experience. Bug tracking. Simple open-source tools can help make sure issues don't get lost. It's only six pages long, including a couple of ads that you can skip: please, like I said, give it a read. (And note to self: I really need to turn this into a lecture...) Read More ›

Provenance (Or, What We Didn't Quite Get to at the Met Office)
Greg Wilson / 2010-10-29
This post has been moved. Read More ›

How We've Helped
Greg Wilson / 2010-10-28
We have started to collect testimonials from former students; if you have a story to share, please send it in. Read More ›

Feedback at UKMO
Greg Wilson / 2010-10-28
I've just finished teaching a four-day version of the course at the UK Met Office in Exeter. I think it went reasonably well, even if we did version control last instead of first :-). Here's the feedback the students gave me: Good Most of what the instructor told us was probably true! Liked the emphasis on the human aspects of software development Liked the emphasis on evidence-based software engineering Lots of details on how to do testing Liked it when the instructor went off on tangents Liked having pointers to more material Instructor enthusiastic Lots of stuff on course web site Enjoyed learning a new language Finally had someone explain why to make code readable Nice mix of concepts and details Modular course design allowed me to catch up when I got lost Bad No overall roadmap or guide Some (many?) parts too fast Didn't have access to the code the instructor was writing "live" Instructor went off on too many tangents The links the instructor mentioned aren't collected anywhere There weren't any tea breaks on the first two days (tiring) Need citations for claims that "studies show" Some people left behind in some parts Squeezing it into a week made some parts indigestible Had to learn a new language Don't know where to go next with Python The first two days felt rushed Some of the examples could have been better fits for this audience Read More ›

ComputerWorld Canada Educator of the Year
Greg Wilson / 2010-10-27
ComputerWorld Canada has named Software Carpentry's Greg Wilson the 2010 IT Educator of the Year for "...recognizing the application of innovative techniques and development of new curriculum and delivery of programs that stimulate learning." Greg would like to thank Prof. Karen Reid and Prof. Marsha Chechik for the nomination; the students who have taken or worked on this course; and most especially, his mum and dad, who taught him the things that matter most. Read More ›

Configuration Files
Greg Wilson / 2010-10-24
This post has been moved. Read More ›

Slides Available as PDF and PPT
Greg Wilson / 2010-10-21
The home pages of the following lectures now have links to PDF and PowerPoint versions of the slides used in each episode: Python Sets and Dictionaries Testing The Unix Shell Regular Expressions Version Control Make Program Design We haven't provided slides for the spreadsheets and databases lectures because they aren't slide-based. Please let us know if you find these useful, or if you notice any typos that we should fix. Read More ›

How Did You Find Us?
Greg Wilson / 2010-10-18
Hope you don't mind another administrative meta-question, but: how did you find this site? A peer or colleague pointed you at us? Your boss or supervisor did so? A Google for one of the topics we cover? A mention by someone else you follow? A pointer in some kind of publication (American Scientist, Nature, Computing in Science & Engineering, etc.) We contacted you at some point? Other (please specify)? Please add a comment and let us know. Read More ›

Final Four Episodes of Python Lecture
Greg Wilson / 2010-10-18
The last four episodes of our lecture on Python are now available, covering: libraries, tuples, slicing, and what is text, anyway? The last isn't really about Python, but it's something people trip over pretty quickly when they start using it. Hope you find them useful... Read More ›

Ratings Revised
Greg Wilson / 2010-10-17
We asked, you answered: here are the latest results from our survey of what topics you'd most like us to cover, with links to the ones that have been posted. A few notes: N = 188 responses. This is the first time data visualization has dropped out of the top 5. It's also the first time that nerdish subjects like computational complexity and functional languages have placed anywhere near the top. There's a noticeable mismatch between things we think people should know (like version control) and things people think they want (building desktop GUIs). Conclusion: more data ≠ more insight. 2.53 Automating Repetitive Tasks 2.51 Basic Programming 2.47 Build a Desktop User Interface 2.45 Coding Style 2.42 Computational Complexity 2.40 Create a Web Service 2.38 Data Structures 2.38 Data Visualization 2.37 Debugging with a Debugger 2.34 Design Patterns 2.34 Designing a Data Model 2.33 Functional Languages 2.27 Geographic Information Systems 2.22 Handling Binary Data 2.18 Image Processing 2.17 Integrating with C and Fortran 2.13 Introduction 2.09 Matrix Algebra 2.07 Object-Oriented Programming 2.07 Packaging Code for Release 2.05 Parallel Programming 2.03 Performance Optimization 1.99 Refactoring 1.95 Reproducible Research 1.95 Static and Dynamic Code Analysis Tools 1.82 Systems Programming 1.76 Testing and Quality Assurance 1.74 Using the Unix Shell 1.73 Version Control 1.65 Working in Teams/on Large Projects 1.40 XML Read More ›

Six More Python Episodes
Greg Wilson / 2010-10-15
We've added six more Python screencasts to the site: basics control flow I/O aliasing functions first-class functions We hope you like them—please let us know. Read More ›

Three Python Screencasts Up
Greg Wilson / 2010-10-14
Three Python screencasts narrated by Dominique Vuvan, an alumnus of the Summer 2009 run of the course, are now online: introduction (episode 1), lists (4), and strings (6). We'll be recording and filling in the other episodes just as soon as construction in the office slows down... Read More ›

Nature Article on Scientific Programming
Greg Wilson / 2010-10-14
Nature has just published an article by Zeeya Merali titled "Computational science: ...Error" that looks at what's wrong with scientific computing today. Software Carpentry gets a mention, along with some of the scientists who are trying to raise awareness and standards. And in a companion article, Nick Barnes makes the case for scientists publishing their research software. Here's hoping both will be widely read. Read More ›

Five Rules for Computational Scientists
Greg Wilson / 2010-10-14
Stepping back from the details for a moment, here are five rules every computational scientist should (try to) follow: 1. Version Control Put every primary artifact (source code, raw data files, parameters, etc.) in a version control system so that you have a record of exactly what you did, and when. There's no need to store things you re-create, such as the graphs you generate from your data files, as long as you have the raw material archived and timestamped. The one major exception to this rule is very large data sets: tools like Subversion aren't designed to handle the terabytes and petabytes that come out of the LHC or the Hubble. However, the teams managing those experiments include people whose job is archiving and managing data using specialized (and often one-of-a-kind) systems. 2. Provenance Track the provenance of your code and data. Museums use the term "provenance" to mean the paper trail of ownership and transfer for a particular piece; in the scientific world, provenance is a record of what raw data was combined or processed to produce a particular result, what tools were used to do the processing, what parameters were given to those tools, and so on. If raw data sources and source files have unique version numbers (which they will if you're keeping them in a version control system), then it's a simple matter of programming to copy those IDs forward each time you derive a new result, such as an aggregate data set, a graph for a paper, or the paper itself. The good news is, tools to do this tracking automatically are finally entering production: see the Open Provenance Model website for updates on efforts to standardize the kinds of information they record, and how they communicate. 3. Design for Test Write testable software. Tangled monolithic programs are very hard to test; instead, programs should be built out small, more-or-less independent components, each of which can be tested in isolation. Building programs this way requires discipline on the part of the developer, but there are lots of places to turn for guidance, such as Michael Feathers' excellent book Working Effectively With Legacy Code. Modularizing code and defining clear interfaces between modules also helps speed things up. One application programmer working at Lawrence Livermore National Laboratory typically found that simply by tidying up the code scientists brought to him, he could speed it up by a factor of 10 or 20, even before parallelizing it (which he could only do after cleaning it up). 4. Test Actually test the software you've written. Yes, it's much harder to test most scientific applications than it is to test games or banking software, both because of numerical accuracy issues, and because scientists usually don't know what the right answer is. (If they did, they'd be writing up their paper, not writing software.) However, as Diane Kelly and other researchers have found, there's a lot scientists can do. Run simple cases that can be solved analytically; compare the program's output against experimental data; compare the output of the parallel Fortran version using the hyper-efficient algorithm against the output of the sequential MATLAB version using the slow, naive, but comprehensible algorithm, and so on. And do code reviews: study after study has shown that having someone else read your code is the most effective, and most cost-effective, way to find bugs in it. 5. Review Finally and most importantly, insist on access to the software used to produce the results in papers you are reviewing. No, you won't be able to read or review all of the ATLAS particle detector software, and no, the folks at Wolfram Research aren't going to give you the source of Mathematica, but not having access to the engineering schematics of today's high-throughput sequencing machines doesn't stop us from reviewing the rest of our peers' wet lab protocols. Most scientific software is neither very large nor closed source; we can and should start to treat it according to the same rules we've used for physical experiments for the last 300 years. Read More ›

Dexy
Greg Wilson / 2010-10-14
From Ana Nelson, a neat new reproducible research tool called Dexy that takes the pipe-and-filter model to new heights (or extremes, depending on your point of view :-) ). There's a quick intro here, and a video of her recent talk about Dexy here. I'm already thinking about ways of combining this with consistency-checking tools so that if the descriptions of results get out of step with the results themselves, alarms will go off. Check it out, and let us know what you think—is it useful and approachable enough to include in this course? Read More ›

Three More Episodes on Spreadsheets
Greg Wilson / 2010-10-13
We have uploaded episodes 5, 6, and 7 of Jason Montojo's lecture on spreadsheets, which cover conditionals, pivot tables, and the basics of charting. Comments would be very welcome—and if you'd like to contribute material as well, please get in touch (we're particularly looking for help with data visualization). Read More ›

Python Lecture Coming Online
Greg Wilson / 2010-10-12
Slides for the first few episodes of our lecture on Python are now online; recordings should follow tomorrow and Thursday, and slides for the rest of the lecture should be up by the weekend. I think the content is OK, but it feels like there's too much shoveling: we're throwing a lot of features at people without showing them how they'll make life better. I think there's inevitably some of that in an intro like this—people need to know a few things before any of them can be put to work—but I'd welcome suggestions on reorganizing the material. Read More ›

Using Subversion from the Command Line
Greg Wilson / 2010-10-05
Ainsley Lawson, one of the TAs for the Fall 2010 online course, has recorded an 8-minute episode showing how to use Subversion from the command line. We hope you find it useful. Read More ›

Aaaand We're Off!
Greg Wilson / 2010-10-04
Our first all-online offering of the course starts today: over 40 graduate students and researchers are taking part. We're going to use email, Skype, DimDim, and whatever else comes in handy to supplement the recorded lectures with tutoring and tech support—I'm looking forward to finding out how well all this Internewt stuff actually works :-) Read More ›

What Questions Do You (Frequently) Ask?
Greg Wilson / 2010-10-03
As part of redesigning this site, we're finally putting together an FAQ. It's pretty brief right now: what questions have you had about Software Carpentry that we could add (and answer)? Read More ›

Do You Use Software Carpentry?
Greg Wilson / 2010-10-03
If you use any of the Software Carpentry material in courses of your own, please add a link in the comments section below: our sponsors would like to know what our reach is (i.e., the more links we get, the more likely it is that we'll be able to continue helping you :-) ). Read More ›

Tracking Utility and Impact
Greg Wilson / 2010-09-30
Mark Guzdial recently posted some interesting (and for us, slightly depressing) statistics about MIT Open Courseware. Long story short, it looks like that flagship effort isn't as widely used as many of us had hoped or believed. I can't find equivalent stats for the Khan Academy (a scrappy "agile" alternative to MOC that has been getting a lot of attention in the geek community), but even if I could, those numbers probably wouldn't answer my real question: who is this reaching, and what impact is it having? As Guzdial points out, asking users to self-report is subject to large sampling bias, as is googling for links back to lectures. Our sponsors want to know who's using our stuff, and whether it's helping them: how do we answer those questions? Read More ›

Ten Short Papers Every Computational Scientist Should Read
Greg Wilson / 2010-09-30
No, we don't have a list—not yet—but we'd like to. What short, readable papers or articles do you think every scientist doing computational work should read at some point in their career (preferably early in their career)? Paul Dubois' 2005 article in Computing in Science and Engineering on maintaining correctness in scientific programs is a favorite of mine; so is Evan Robinson's summary of research on the effects of overwork on productivity. (Actually, I think everyone should read Robinson's article, not just computational scientists...) What else should be on the list? To qualify, entries must be short (up to a few pages long), well written, of broad general interest, and have something important to say that's relevant to our target audience. Suggestions in the comments, please... Read More ›

A New Site Design
Jon Pipitone / 2010-09-28
We've just made a few changes to the layout and look and feel of the Software Carpentry site. The blog and new content is now accessible directly at http://software-carpentry.org/ Version 3.0 of the course material is still available at http://software-carpentry.org/3_0. We've put all of the new content under the Topics menu, and shuffled a few of the other pages around to make things easier to find. We've updated the WordPress theme to one that works better for embedded screencasts. Let us know in the comments how you like the new configuration, and if you find any glitches. Read More ›

Software Carpentry at UCSF
Greg Wilson / 2010-09-23
Via a comment from Scooter Morris: the University of California San Francisco is offering a "short course" variant of Software Carpentry under the course codes BMI-280/BMI-219. Their slides are online in S5 format, and include lots of useful stuff that aren't in the stock version. If you know of other offerings, please send pointers! Read More ›

Response Has Been Overwhelming
Greg Wilson / 2010-09-22
I'm very pleased to announce that the Fall 2010 offering of this course to Ontario graduate students is now full: we'll be sending acceptance notices to applicants tonight and tomorrow. We hope to offer it again starting in January 2011, so if you didn't make it in this time, please check back in December. Read More ›

I'm No Graphic Artist...
Greg Wilson / 2010-09-21
I'm not much of a graphic artist (in both my startups, I was told after a few weeks that I was never to work on the user interface again), so I'd appreciate your help designing a flyer/noticeboard poster for Software Carpentry. This PDF is what I have so far; anything more constructive than, "My eyes! My eyes! Aargh!" would be very welcome. Read More ›

Your Favorite Running Examples?
Greg Wilson / 2010-09-20
I've been fond of invasion percolation since I first encountered it: the problem is simple to state, but its implementation brings up quite a variety of useful ideas, from testing in the face of randomness, to associative data structures, to the limits of parallelism. It's a good size, too: I can develop a first version in front of students in about 30 minutes, then branch off in various directions for almost as long as they'll listen to me. We now have three volunteers working on similar problems: phylogenetic tree reconstruction, a Galton box simulator, and an image processing widget that will find and label stars in photos of the night sky. Today's question is, what other small apps do you think would make good examples for this course? Candidates must be related to science in some way, and must be buildable in a couple of hundred lines of not-insanely-intricate code. (And of course, we'll award bonus marks to people who volunteer to implement their ideas too :-) Read More ›

Survey: Help Needed
Greg Wilson / 2010-09-16
Dr. Roscoe Bartlett (Sandia National Laboratory), Dr. Jeffrey Carver (University of Alabama), and Dr. Lorin Hochstein (University of Southern California) are conducting a survey of software development practices among computational scientists. This survey seeks to understand current software development practices and identify areas of need. The goal is to produce a report on the status of scientific software development which can serve as the basis for future work in this area. The survey should take approximately 15 minutes to complete. The survey can be found here, and has been approved by The University of Alabama IRB board. Read More ›

Testing Scientific Software
Greg Wilson / 2010-09-15
Yesterday, Michael Feathers tweeted that, "The hardest bit of TDD [test-driven development] for ppl in scientific computation is that they often don't know what their intermediate results should be." I'd go further: if the average scientist or engineer knew what the output of their program was supposed to be, they wouldn't need to run the program. And if you don't know what answer is right, what do you compare your tests' output to? Coincidentally, one of our readers sent us this a couple of days ago: Forgive my ignorance, but I have just watched all the lectures on testing, and I'm a little fuzzy on how I can put all this knowledge to use for me. I do CFD programming to solve fluid flow and heat transfer problems. I write in Fortran, code that solves an entire problem start to finish (obviously a collection of subroutines and modules). Then I use Python as a "driver" to set the values of my input variables, create input files, run the compiled code, write output files and gnuplot files to create some plots. I loosely follow the logic in Hans Petter Langtangen's book. Most of your lectures pertained to testing functions. I hardly ever use them—should I be? And how can I independently test a subroutine—or should I be? Lets say that I run the code to solve a known problem—should I use the same pass/fail criteria that you talk about in the lecture? What about testing to find the right values for variables—i.e. grid resolution studies. Should I automate these? I really would like to get these concepts clear, because I don't do ANY regression testing right now, and I know (as you point out) how dangerous that can be. I have some answers, but before I post them, I'd be interested in hearing from other readers: what do you do? What do you expect others to have done when you're reviewing their papers? Note: readers interested in this subject might enjoy Rebecca Sanders and Diane Kelly's paper, "Dealing with Risk in Scientific Software Development" (IEEE Software, 25(4), July 2008). Read More ›

Five Episodes on Make
Greg Wilson / 2010-09-15
The first five episodes of the lecture on Make have been recorded, edited, and posted. I still need to do one on using Make to track data provenance, but as I discovered when I started to write it, there are a few other topics I need to fill in first. I will do an episode on managing C/C++ compilation with Make first (probably after I return from teaching in Colorado next week). As always, comments are welcome. Read More ›

Software Carpentry Offered Online in Fall 2010 (for Ontario students)
Greg Wilson / 2010-09-14
I'm pleased to announce that we'll be running Software Carpentry online October-December 2010 for up to 40 graduate students and researchers at Ontario universities. Please see the Fall 2010 page for details (as we get them), or contact us if you would like to enrol. The course is free, but non-credit. Read More ›

Will America's Universities Go The Way Of Its Car Companies?
Greg Wilson / 2010-09-13
Two days before I flew south to speak at Michigan State University, I read an article in The Economist that asked the question in the title of this post. As it says (quoting US News & World Report), "If colleges were businesses, they would be ripe for hostile takeovers, complete with serious cost-cutting and painful reorganisations." I agree with many of their points: the universities I have worked at (six of them, on three continents) all suffered from the sedimentary buildup of red tape that you'd expect of a large organization that hasn't faced real competitive pressure in living memory, and struggled to achieve either of their core goals of fostering research and passing knowledge on. To quote a friend who's still in the system, "We're here to do research, they pay us to teach, we spend our time on administration." It's tempting to look to technology for a solution, but I don't think webcasting or interactive game-style tutorials will have any real impact on students unless they're supported by large-scale institutional change. To continue with the auto analogy, GM didn't fail because its machinery wasn't as good as Toyota's; it failed because its business model and thought processes weren't as good. That's why I think that the most interesting part of what we're doing isn't the screencasts we're making, but the peer support and just-in-time tutoring we're going to start putting on top of them next month. If we can do for students from anywhere what The Hacker Within has done at the University of Wisconsin, then we'll be doing more than help grad students in science and engineering program better: we'll be helping, in a small way, to figure out what a 21st Century education ought to look like. Read More ›

And For My Next Trick...
Greg Wilson / 2010-09-09
One of the things people have voted right up to the top of our poll is reproducible research. The phrase means different things to different people, but part of it must be accurately tracking the provenance of every bit of data a scientist touches: where it came from, what was done to it, what was done to the results, and so on. Once I've done the next episode of the Make lecture (on macros), I'd like to spend 10 minutes showing people how to do some of this by: embedding Subversion keywords like $Id:$ in original files, and having tools copy those IDs, plus their own version numbers and settings, into the files they generate. For example, if the command line to generate summary-1.dat is: stats.py --alpha 3.5 data-1-1.dat data-1-2.dat > summary-1.dat then data-1-1.dat should contain the line: $Id: data-1-1.dat 138 2010-09-08 21:30:43Z cdarwin $ which Subversion will update each time the file is checked in. data-1-2.dat should have a similar line, as should stats.py; its should be embedded in a string so that it can be printed: # inside stats.py version = "$Id: stats.py 71 2010-08-17 08:13:17Z aturing $" When stats.py runs, it copies its own ID string, its parameters (—alpha 3.5), and the ID strings of data-1-1.dat and data-1-2.dat into summary-1.dat. If some other program then processes summary-1.dat to create something else, it copies all of that information again, so that each file has a header with its complete ancestry. Yes, a proper provenance tool would be more robust and more flexible, but this technique will convey the idea, and implementing it is a good medium-sized exercise in Python. Thoughts? (And if you're interested in reproducible research, you'll probably enjoy a recently-published declaration of principles drawn up by Victoria Stodden and others.) Read More ›

Slides for the First Four Episodes on Make
Greg Wilson / 2010-09-08
I don't like Make, but I'm resigned to teaching it: other tools have different, rather than lesser, flaws. I've posted slides for the first four episodes of the lecture about it, and hope to have scripts and screencasts up soon. I'd welcome feedback—in particular, if anyone knows a clean solution to the problem at the end of the fourth episode (getting summary data to depend on stats.py), please let me know. Introduction Basics Patterns Rules Later: scripts for these four episodes have been added; screencasts should go up next week. And for those who are wondering, the fifth and sixth episodes will cover macros, and how to use Make with Subversion to track the provenance of created files. Read More ›

Getting the Source
Greg Wilson / 2010-09-06
In response to several queries: if you'd like to get the source for both the Version 3 material (the old static HTML pages) and the new Version 4 stuff, they're both in our Subversion repository at http://software-carpentry.org/swc. The whole world should have read permission; please let us know if you run into any problems. And if you're minded to contribute material, we'd welcome help: please contact us by email to talk about what you could do. Read More ›

Eight Episodes on the Unix Shell
Greg Wilson / 2010-09-03
I had planned to leave it out, but your votes count: the first eight screencasts of our lecture on the Unix shell are now online. Topics covered are: Introduction (4:09) Files and Directories (9:55) Creating and Deleting (6:23) Pipes and Filters (9:11) Permissions (10:54) Finding Things (9:22) Job Control (5:37) Variables (6:49) I've already got a list of changes to make (thanks to feedback from IsaacG2—my apologies, but I didn't see your comments until after I'd recorded), but I hope this will be a good start. And for those who are interested, the eight episodes together took 36.5 hours to write, revise, record, and edit, for 62:20 of screen time. That's better than the 50:1 ratio I was clocking in June, but I need to squeeze it further if I'm going to finish everything on time. Read More ›

Three More Sets of Slides
Greg Wilson / 2010-09-01
As promised in yesterday's post, the slides for three more episodes on the Unix shell are now online, covering how to find things, job control, and shell variables. Next up: public/private keys and SSH. Read More ›

Five Episodes on the Shell (and Three to Come)
Greg Wilson / 2010-08-31
I've just posted the slides for the first five episodes of the lecture on the Unix shell, covering: Introduction (or, why would anyone do this to themselves, really?) Files and directories Creating and deleting things Pipes and filters Permissions Minor things I have yet to cover: Drive letters on Windows, and how Cygwin hacks around them The 'echo' command (Briefly) Windows access control lists Wildcards in filenames ('*' has been covered) Major things (the remaining three episodes): Environment variables, the export thereof, and how to configure things with '.bashrc' Job control (Ctrl-Z, 'jobs', 'fg', and the 'ps' command) Finding things with 'grep' and 'find' I'll do the other three slide decks tomorrow, then record and post the screencasts on Thursday and Friday. Please let me know if I've left out something absolutely essential... Read More ›

Four More Screencasts on Testing
Greg Wilson / 2010-08-27
After far too many delays, the next four screencasts on testing are finally available, covering systematic unit testing with Nose, why you should test interfaces rather than implementations, why testing floating point is hard, and how to create and manage fixtures systematically. I hope you enjoy them—have a good weekend. Read More ›

Another Update on What You Want
Greg Wilson / 2010-08-26
Responses have slowed, so here are the final scores for the topics we're including (or thinking about including) in this course. It looks like the Unix shell just might go back in... 2.51 Automating Repetitive Tasks 2.50 Reproducible Research 2.50 Data Visualization 2.47 Version Control 2.44 Performance Optimization 2.41 Data Structures 2.39 Testing and Quality Assurance 2.39 Coding Style 2.38 Basic Programming 2.35 Using the Unix Shell 2.35 Parallel Programming 2.35 Debugging with a Debugger 2.29 Computational Complexity 2.22 Object-Oriented Programming 2.20 Working in Teams/on Large Projects 2.19 Designing a Data Model 2.15 Refactoring 2.10 Matrix Algebra 2.09 Static and Dynamic Code Analysis Tools 2.07 Systems Programming 2.04 Integrating with C and Fortran 2.03 Design Patterns 2.01 Packaging Code for Release 1.95 Functional Languages 1.94 Handling Binary Data 1.82 Image Processing 1.74 Build a Desktop User Interface 1.73 XML 1.65 Create a Web Service 1.39 Geographic Information Systems Read More ›

What Don't You Understand That You'd Like To?
Greg Wilson / 2010-08-23
What don't you understand about computers and computing that you'd like to? I'm not talking about big stuff, like "How does it all work, anyway?" or, "How do I write a game that's as addictive as BubbleSpinner?" What don't you understand about smaller-scale stuff, like handwriting recognition or how Excel knows which cells to update when you change a number or how your machine decides that it's time to update Java and what it does next. Please let me know, or if you're feeling shy, point your friends at this page and have them ask for you—we have to use something as motivating examples for this course, and if we can explain something interesting while teaching you fundamental computing concepts, so much the better. Read More ›

Slides and Scripts for the Next Two Episodes
Greg Wilson / 2010-08-19
I have posted slides and transcripts for the next two lectures on testing, which cover unit testing with Nose, and testing interfaces instead of implementations. I was hoping to record the screencasts today, but setting up a new 64-bit desktop machine took longer than expected—hopefully they'll be up early next week, along with episodes on floating point and setting up fixtures. Comments, as always, would be greatly appreciated. Read More ›

43% Independent
Greg Wilson / 2010-08-16
Cecilia d'Oliveira and colleagues recently wrote an essay in Science about MIT's OpenCourseWare initiative, ten years after its inception. Among the stats: OCW currently receives upwards of 1.5 million visits each month from 900,000 unique individuals. Students have grown to 42% of the audience, and educators and independent learners now constitute 9% and 43% of visitors, respectively. Twelve percent of educators responding to a March 2010 visitor survey indicated that they do incorporate OCW materials into their own content as anticipated, but educators more frequently use OCW for personal learning (37%), to adopt new teaching methods (18%), and as a reference for their students (16%). Students were largely expected to use the site as a supplement to materials they received in their own classes, a use identified by 40% of students. Just over 43%, however, indicated that they also use OCW for personal learning beyond the scope of their formal studies, and a further 12% use it as an aid in planning their course of study. Independent learners use OCW in a variety of personal (41%) and professional (50%) contexts, including home-schooling children and keeping up on developments in their professional field. 66% of visitors indicate they are mostly or completely successful at meeting their educational goals for visiting the site. I wish there was more detailed analysis of what's worked and what hasn't (I'd obviously like to imitate their successes and avoid their mistakes), but even without that, it's clear that bums-in-seats is not the future of higher education... Read More ›

Interview with Cameron Neylon
Greg Wilson / 2010-08-12
Today's interview is with Cameron Neylon, a noted advocate of open science. Tell us a bit about your organization and its goals. I work for the UK Science and Technology Facilities Council. We are a research funder but although we provide some direct funding our main role is to build and run or subscribe to large scale research infrastructure on behalf of UK scientists. For instance we run telescopes, pay the UK subscription to CERN, as well as supporting and running synchrotrons, neutron sources, high powered lasers, microfabrication facilities and large scale computing infrastructure. I work at the ISIS Neutron Scattering Facility which hosts several thousand scientists a year doing hundreds of experiments on around 20 different instruments. We help to select which experiments get done, support sample preparation, assist with the planning and running of experiments, as well as data analysis, sometimes all the way to publication. My group focusses on support and development of new techniques for biological scientists. Tell us a bit about the software your group uses. We use a big mix of things. Like most experimental scientists Word and Excel figure a lot in basic analysis and record keeping. We use a blog based laboratory notebook system (biolab.isis.rl.ac.uk) developed in collaboration with the University of Southampton. The instruments are highly specialised and are run with software developed in house and first stage analysis is moving to a new framework called Mantid (mantidproject.org). After the first stage we move to all sorts of tools based on what we need and the scientific problem. Specialist analysis software, usually built by individuals or groups, often requiring some sort of proprietary framework (MatLab is common and Igor from Wavemetrics is quite often used), is put together in ad hoc pipelines to attack a problem from several different directions. This is often quite haphazard. Some examples include RaSCAL (MatLab: http://sourceforge.net/projects/rscl/), ATSAS suite (closed source mostly command line drive suite for scattering analysis: http://www.embl-hamburg.de/ExternalInfo/Research/Sax/software.html), and NIST SANS analysis tools (Igor Pro: http://www.ncnr.nist.gov/programs/sans/data/red_anal.html). Tell us a bit about what software your group develops. The Mantid project has provided us with a Python scripting and GUI environment which has made it possible to provide some simple tools for ourselves and some users and to help us integrate this with our blog based record keeping system. Most of what I do is based in the immediate needs of our group but with an eye to making it more useful to a wider community. Often it involves trying to make those disparate data analysis pipelines easier to use, more consistent, and to enable easier and better record keeping of the analysis process. We use our experience of problems to try and build things that are useful for our wider community. What's the typical background of your scientists, developers, and/or users? Most of the scientists we deal with have no specific experience of programming. In rare cases they have a little experience of scripting or command line work. They are focussed on outcomes and getting results rather than tools. This leads to ad hoc procedures and pipelines that are usually inefficient and badly recorded. Most could look at simple scripts and manipulate those for their needs. However the lack of experience in programming "properly" and a lack of knowledge of best practice leads to messy and incomprehensible, often unusable tools. An understanding of test driven software design and versioning for safe development is rare. Those scientists who do build software and are comfortable with programming rarely have any skill or experience in user interface design leading to difficult to use interfaces and GUIs that confuse users. How do you hope Software Carpentry will help them? Good practice, good testing, good documentation, and availability of code for checking. On top of this a good understanding of how to think about the design of a specific piece of software and some knowledge of common design patterns to aid in the more rapid development of good and re-usable software. How will you tell what impact the course has had (if any)? I'll see some comments in people's code and I'll be able to get at it in an appropriate repository. When I get this code and read the comments I'll be able to understand how I might re-use it for my own purposes. If the course can achieve that or steps towards that I'll be very happy! Read More ›

Software Carpentry for Audio and Music Researchers
Greg Wilson / 2010-08-05
I will be teaching a version of Software Carpentry tailored for audio and music researchers at Queen Mary University in London from November 1st to 5th, 2010. From the full announcement: We are seeking nominations for a small number (up to 15) of UK-based PhD students or early career researchers to attend the Autumn School. The SoundSoftware.ac.uk project will pay for reasonable travel and accommodation costs for attendees. Due to the level of interest we anticipate in this initial Autumn School, and to obtain a balance of attendees from several UK research groups, we are asking Research Groups to nominate potential attendees. If you are a PhD student or researcher in a UK research group, please contact your PhD supervisor, line manager, or Head of Group to ask them to nominate you. I look forward to meeting everyone there! Read More ›

An Answer That Most Students Won't Understand
Greg Wilson / 2010-08-05
Two days ago, I asked how to generates tests from tables of fixtures using Nose: ...does Nose already have a tool for running through a table of fixtures and expected results? My hand-rolled version is: Tests = ( # R1 R2 Expected ( ((0, 0), (0, 0)), ((0, 0), (0, 0)), None ), ( ((0, 0), (0, 0)), ((0, 0), (1, 1)), None ), ( ((0, 0), (1, 1)), ((0, 0), (1, 1)), ((0, 0), (1, 1)) ), ( ((0, 3), (2, 5)), ((1, 0), (2, 4)), ((1, 3), (2, 4)) ) ) def test_table(): for (R1, R2, expected) in Tests: yield run_it, R1, R2, expected def run_it(R1, R2, expected): assert overlap(R1, R2) == expected which is simple enough if students already understand generators and function application, but hell to explain if they don't—and they won't. After some back and forth, Jacob Kaplan-Moss (of Django fame) came up with this: def tabletest(table): def decorator(func): def _inner(): for args in table: yield tuple([func] + list(args)) _inner.__name__ = 'test_'+func.__name__ return _inner return decorator table = [(1, 2), (3, 4)] @tabletest(table) def check_pair(left, right): assert left > right The outer function tabletest takes the table of fixtures as an argument, and produces a function of one argument. That argument is supposed to be the function that is being wrapped up by the decorator, so: @tabletest(table) def check_pair(...): ... means: decorator = tabletest(table) check_pair = ...what the 'def' creates... check_pair = decorator(check_pair) With me so far? Now, what decorator does is take a function F as an argument, and create a new function F' that produces each combination of the original F with the entries in the table: in jargon, it creates a generator that yields F and the arguments that F should be applied to. But what's that inner_.__name__ stuff? That's to make sure that the wrapped function's name starts with the letters "test_", because that's how Nose knows to run it. This does exactly what I wanted, but sparks three comments: Thanks, Jacob: I can understand the solution once it's in front of me, but it would have taken me a long time to figure this out myself. Treating programs as data, i.e., manipulating code just as you'd manipulate arrays or strings, is incredibly powerful. Only a tiny fraction of the students who complete this course will understand how this works. I'm sure they all could, if they wanted to invest the time, but given their usual starting point, they'd have to invest a lot of time. #3 is what many advocates of new technology (functional languages! GPUs! functional languages on GPUs!) consistently overlook. What Jacob did here is really quite elegant, but in the same way that the classic proof of Euler's theorem is elegant: you have to know quite a lot to understand it, and even more to understand its grace. People who have that understanding often forget what the world looks like to people who don't; we're trying hard not to, and would be grateful if readers and viewers could tell us when we slip up. Read More ›

Open Source, Open Science in 1999
Greg Wilson / 2010-08-03
A long time ago, in a galaxy far, far away... Actually, it was 1999, and the venue was Brookhaven National Laboratory in New York—it just feels like a different time and place. The event was a conference called "Open Source, Open Science", and Stephen Adler, Tom Throwe, and Sean McCorkle have very kindly resurrected its web site at http://openscience.bnl.gov/ for those interested in the history of an idea whose time may finally have come. It's interesting to see what has changed in the intervening decade, and what hasn't: supercomputing is still disproportionately represented, but I think the open side has won the debate that was the subject of the main panel discussion, "Overcoming the obstacles...in using and contributing to Open Source technologies." What I remember most from the conference, though, was a professor from somewhere in the Midwest saying, "This is all very noble, but in reality, if I spend two or three years developing a software package while my pre-tenure rivals are cranking out results, then make that software open so that they can crank out even more results, I'm committing career suicide, because I won't get any credit for their use of my software." Nobody had a good answer then; sadly, I don't think anyone has a good one now, either. As I rediscovered during my three and a half years at the University of Toronto, big organizations' own rules often don't allow people working for them to do the right thing... And if you're interested in the history of the idea of openness in science, Michael Nielsen has provided a couple of pointers: You've probably already seen it, but if not you may like to take a look at some of Paul David's work, of which http://www.bepress.com/cas/vol3/iss2/art5/ is an interesting starting point. He's been writing on many aspects of open science, including the history, for 20 odd years. Also, if you haven't seen it, Peter Suber's "Open Access Timeline" is a superbly useful overview of some parts of open science since the 1960s: http://www.earlham.edu/~peters/fos/timeline.htm [ed.: now at http://oad.simmons.edu/oadwiki/Timeline]. When I'm looking up historical stuff, those are both treasure troves, which I've personally barely touched the surface of.... Read More ›

A Question About Nose
Greg Wilson / 2010-08-03
I'm putting together an episode of the testing lecture to introduce unit testing frameworks. In the past, I've used unittest, but colleagues have had good experiences with Nose, which doesn't require students to understand classes and methods in order to write tests. As an example, they're going to test a function that finds the overlap between fields in aerial photographs of Saskatchewan (where it's safe to assume that everything is a rectangle :-) ). My question is, does Nose already have a tool for running through a table of fixtures and expected results? My hand-rolled version is: Tests = ( # R1 R2 Expected ( ((0, 0), (0, 0)), ((0, 0), (0, 0)), None ), ( ((0, 0), (0, 0)), ((0, 0), (1, 1)), None ), ( ((0, 0), (1, 1)), ((0, 0), (1, 1)), ((0, 0), (1, 1)) ), ( ((0, 3), (2, 5)), ((1, 0), (2, 4)), ((1, 3), (2, 4)) ) ) def test_table(): for (R1, R2, expected) in Tests: yield run_it, R1, R2, expected def run_it(R1, R2, expected): assert overlap(R1, R2) == expected which is simple enough if students already understand generators and function application, but hell to explain if they don't—and they won't. So, is this already in Nose, and I've just missed it? If not, who wants to help me design it, and what's the likely elapsed time between submission of a patch and its appearance in a release? Read More ›

Interview with Sergey Fomel
Greg Wilson / 2010-08-02
Sergey Fomel is a professor at the University of Texas at Austin, and the leader of the Madagascar project. Tell us a bit about your organization and its goals. I work at the University of Texas at Austin, with a joint appointment between the Bureau of Economic Geology and the Department of Geological Sciences. My group conducts research on computational geophysics, with applications to petroleum exploration. I also serve as a project leader for the Madagascar open-source project. The goal of Madagascar is to provide a convenient and powerful environment and a convenient technology transfer tool for researchers working with digital image and data processing in geophysics and related fields. Tell us a bit about the software your group uses. We use Madagascar for all relevant scientific computations. For general data manipulations, Madagascar can be as general as tools like Matlab. The main difference is that multidimensional hypercube objects do not sit in memory but reside in files on disk or are passed through Unix pipes. This allows us to work with data that might be too large for RAM. For theoretical work, we use occasionally symbolic math software such as Mathematica. Currently, I am trying to switch from Mathematica to SAGE, which seems like a good open-source replacement. Supporting tools: Subversion for version control (a must), SCons for compilation and data processing flows, LaTeX for writing papers. Madagascar number-crunching programs are typically written in C, although there are interfaces to many other languages. We assemble elementary binary programs (compiled from C) into data processing flows using our extensions to SCons. We also use SCons (in combination with customized latex2html) to publish final results in a "reproducible research" format (papers with links to software code and data necessary for reproducing computational experiments). Tell us a bit about what software your group develops. We build software for use in our own research but also for sharing with others. Some codes get distributed first to our industrial sponsors, who incorporate them for their own use in industrial geophysical data processing. As for the general research framework, I believe it is very important to share it with as many other research groups as possible so that we could reproduce and verify each others's results (computational experiments) once they get published. What's the typical background of your scientists, developers, and/or users? What do you see as their strengths, and where are the gaps in their knowledge? Most of the people in the Madagascar developer community have background in geophysics or related fields (petroleum or electrical engineering, physics, applied mathematics, etc.) The strength in it is that every developer is also a user, people join the development effort to "scratch a personal itch". The weakness is the lack of formal training in software engineering, computer science, or numerical analysis. Geophysicists are inventive people, many of them with a natural talent for software development of numerical computations, but sometimes one just needs to follow good rules. How do you hope Software Carpentry will help them—that is, what big ideas and what specific skills do you what students to learn from it? Software Carpentry is exceptionally useful. It feels the gap in education for scientists like us, who use software tools every day but have only a fragmentary knowledge of them. By making this knowledge systematic, Software Carpentry simply makes us better at what we do. I could name some specific ideas (testing, scripting, debugging, version control) but actually all of them are good, the most important is to be as systematic with our tools as good carpenters are with theirs. Read More ›

Interview with Davor Cubranic
Greg Wilson / 2010-07-31
Today's interview is with Dr. Davor Cubranic, a statistician who lives and works in Vancouver, B.C. Davor recently ran a workshop for faculty and grad students in statistics that covered many of the same ideas as Software Carpentry. Tell us a bit about your organization and its goals. I work in the Department of Statistics of a large research university. Our goals are production of research papers, often in collaboration with researchers in other departments, such as life sciences, engineering, or forestry. Tell us a bit about the software your group uses. Primarily R, with some C/C++, Matlab and SAS. A few groups use version control (Subversion, with some thought given to migrating to distributed version control systems). I'm probably the only one using automated tests (with RUnit testing framework for R). A number of researchers use ESS, an Emacs front-end to statistical packages. Tell us a bit about what software your group develops. In-house development for our own use, although it is typically made publicly available over the web. I suppose some statistical packages we develop might be used as components in other researchers' software, but I'm not familiar with any specific cases of this. What can you tell us about your course? It was a two-hour workshop on lightweight software engineering practices that help improve the quality of the research software we create, and so indirectly improve the quality of the research itself. This is the first time I gave such a workshop, and there was considerable interest in it. I hope to grow it into a more extensive course that would borrow from Software Carpentry, but using tools, languages, and problems that are more familiar to members of our department. How do you tell what impact the course has had? I don't know yet. What are your plans for future work? Grow the workshop in duration and number of topics covered, and make it a regular fixture of the orientation given to incoming students every September. Read More ›

Stats for July
Greg Wilson / 2010-07-30
Visitors Page Views Read More ›

A Little Bit of Javascript
Greg Wilson / 2010-07-30
As I've mentioned before, one actionable finding in educational research is that faded examples—ones in which progressively less of the solution is shown to students as they progress—are a very effective teaching tool. I've been thinking about how to add them to this course, and have an idea I'd like to try out. It requires more Javascript than I know, though, so I'm hoping someone who reads this blog will be willing to write it for us. (And in general, if anyone wants to help produce material for this course, please get in touch: we're looking for scripts, slides, voiceovers, examples, artwork, and everything else that an open education project needs.) My idea is to create something like a simple folding editor to progressively expand solutions in place in a controlled order. I want to put specially-formatted comments in code to mark folds: import sys, re '''Find all duplicated words in an input file.''' # <4> Finally, define a pattern that will match duplicated words. pattern = r'(\b\w+\b)\s+\1' # </4> # <2> Process lines of text with a regular expression using the looping pattern we've seen before. def process(lines): result = set() for the_line in lines: for match in re.findall(pattern, the_line): # <3> Extract data from matches. This is specific to *this* problem, and has to sync with the pattern. word = match.split()[0] result.add(word) # </3> return result # </2> if __name__ == '__main__': # <1> Write the main body of the program first using the read/process/write pattern we've seen before. lines = open(sys.argv[1], 'r').readlines() results = process(lines) for r in results: print r # </1> It will initially appear as: import sys, re '''Find all duplicated words in an input file.''' ...4... ...2... if __name__ == '__main__': ...1: Write the main body of the program first using the read/process/write pattern we've seen before... Clicking on the fold marked '1' expands it, and draws attention to fold #2 by putting its comment text inline: import sys, re '''Find all duplicated words in an input file.''' # ...4... # ...2: Process lines of text with a regular expression using the looping pattern we've seen before... if __name__ == '__main__': # Write the main body of the program first using the read/process/write pattern we've seen before. lines = open(sys.argv[1], 'r').readlines() results = process(lines) for r in results: print r Clicking '2' expands it to show (and draw attention to) #3, et cetera. And there would be markers of some kind to re-fold an item, which would automatically re-fold all higher-numbered items at the same time. This would let us show readers how we created a solution, not just the solution itself; the marked-up code would be a bit ugly, but pretty easy to create (at least for small examples). So, any volunteers? Read More ›

Two More Episodes on Version Control
Greg Wilson / 2010-07-29
The third and fourth episodes of our lecture on version control are now online. These explain how to handle conflicts from concurrent edits, and how to roll back changes. As with the second episode on basic workflow, they use a mix of slides, screen recording, and sound effects. The next episode is supposed to explain how to create a repository, but I'm still trying to figure out what to show people. A repo on the same machine that's being used for development is better than nothing, but that doesn't help people share work with colleagues. On the other hand, creating a repo on a server somewhere requires at least basic knowledge of the shell: even if someone is willing to type in a password for each interaction (so that they don't need to know about public/private keypairs), they'll need to know enough to SSH in to the server and run "svnadmin create reponame". There are web-based control panels for creating and managing repositories, and we could just require them to ask their friendly neighborhood sys admin to set that up, but it's just enough of a stumbling block to, well, be a stumbling block. Suggestions would be welcome... Read More ›

Survey Update
Greg Wilson / 2010-07-29
Here's an update on responses to the survey I posted a couple of weeks ago. 172 people have responded at this point; it's encouraging that priorities are relatively stable as numbers increase. Education 77.3% Graduate degree 22.1% Undergraduate degree 0.6% High school Field 41.0% Computer Science 30.1% Earth Sciences 28.9% Physics 25.4% Mathematics and Statistics 11.0% Microbiology 9.2% Biomedical Engineering 6.9% Macrobiology 5.2% Medicine and Health Care 5.2% Electrical Engineering 5.2% Astronomy 4.6% Mechanical Engineering 4.6% Aerospace Engineering 4.0% Chemical Engineering 2.9% Psychology 2.3% Economics 2.3% Business/Finance 1.2% Linguistics 1.2% Civil Engineering 0.6% Social Sciences 0.6% Arts and Humanities Role 44.8% Academic Researcher 32.8% Software Developer 16.7% Graduate Student 16.7% Government Research Scientist 10.3% Engineer 9.8% Manager/Supervisor 8.6% System Administrator 3.4% Teacher 2.9% Industrial Research Scientist 1.1% Undergraduate student 1.1% Laboratory Technician Priorities 2.51 Automating Repetitive Tasks 2.50 Reproducible Research 2.49 Data Visualization 2.46 Version Control 2.43 Performance Optimization 2.41 Data Structures 2.41 Coding Style 2.38 Basic Programming 2.37 Testing and Quality Assurance 2.35 Parallel Programming 2.34 Debugging with a Debugger 2.33 Using the Unix Shell 2.29 Computational Complexity 2.21 Object-Oriented Programming 2.21 Designing a Data Model 2.19 Working in Teams/on Large Projects 2.14 Refactoring 2.10 Static and Dynamic Code Analysis Tools 2.09 Matrix Algebra 2.06 Systems Programming 2.06 Integrating with C and Fortran 2.03 Design Patterns 2.01 Packaging Code for Release 1.95 Functional Languages 1.93 Handling Binary Data 1.80 Image Processing 1.77 Introduction 1.75 Build a Desktop User Interface 1.73 XML 1.64 Create a Web Service 1.39 Geographic Information Systems Read More ›

Mark Guzdial on Software Carpentry
Greg Wilson / 2010-07-28
Mark Guzdial, a leading researcher in computing education, blogged a few days ago about the Texas Advanced Computing Center's training program for computational scientists, and asked, "Given the importance of computational science, what do all scientists and engineers need to know about high-performance computing?" As you might expect, I replied to say that the question was almost always premature: we should first ask what scientists and engineers need to know about computing in general before tackling HPC. Mark has responded with a post on the CACM blog that quotes me, and puts Software Carpentry in a larger context: ...by 2012, there will be about 3 million professional software developers in the United States, but there will also be about 13 million end-user programmers—people who program as part of their work, but who do not primarily develop software... these end-user programmers don't know a lot about computer science, and that lack of knowledge hurts them. He find that they mostly learn to program through Google. In his most recent work, he is finding that not knowing much about computer science means that they're inefficient at searching. He then goes on to quote Alan Kay's "Triple Whammy" of core concepts: Matter can be made to remember, discriminate, decide, and do. Matter can remember descriptions and interpret and act on them. Matter can hold and interpret and act on descriptions that describe anything that matter can do. and asks, "How do we frame [this] in a way that fledgling scientists and engineers would find valuable and learnable?" I agree that these ideas are at the heart of computing, but trying to map them directly to "here's what you do on Tuesday morning" is a really big step. I hope that our concept map is one of the intermediate steps, but there have to be many, many more. Read More ›

Second Lecture on Version Control
Greg Wilson / 2010-07-26
Our second lecture on version control is now on the web. It combines screen recording with static slides; please let us know whether the format works for you; in particular, can you follow what's happening on the desktop? Read More ›

Introduction to Version Control
Greg Wilson / 2010-07-24
It took a lot longer to put together than I expected, but I'm pleased with the result—the newest screencast explains what version control is, and why you'd want to use it, in four minutes and four seconds, including a wolf howl and a maniacal laugh. I hope you like it... And in case you haven't been reading comments, I'd be very happy to include a parallel version control lecture based on a distributed version control system like Mercurial or Git if someone would like to create it. Software Carpentry is an open source project—if that's the itch you want to scratch, then please email me and we'll figure out how to make it happen. Read More ›

Strictly Speaking, This Isn't Part of Testing
Greg Wilson / 2010-07-23
The second episode of the lecture on testing is now up. It covers exceptions, which strictly speaking aren't part of testing, but it seemed like a natural place to introduce them. As always, feedback is welcome... Read More ›

First Episode of Testing Lecture
Greg Wilson / 2010-07-22
The first episode of the lecture on testing is now online. We're going to be showing people how to use Nose, so I've tried to motivate the idea of a unit testing framework—I'd appreciate feedback on whether it works or not. And yes, I am working on the version control lecture, but I've been stumbling over minor technical glitches, and I'm afraid that if I try recording today, I'll start using uncivilized language. Hopefully I'll get to it tomorrow... Read More ›

Popular, Fast, or Usable: Pick One
Greg Wilson / 2010-07-21
One of the core skills in any software developer's toolbox is automating repetitive tasks. We've been teaching this in Software Carpentry with Make for 13 years, and for 13 years, we've been looking for something better: something that has a less user-hostile syntax, requires less arcane knowledge of the shell, and at least pretends to do the same thing on different platforms. We briefly considered Ant (where "briefly" means "as long as it took to find the post by Ant's creator saying he would do it all differently next time"), but it's clear that tools like Rake and SCons, that embed build commands in a full-blown scripting language, are the future of build systems. Rake is based on Ruby, but SCons is based on Python (and actually has its roots in the original Software Carpentry project), so it should be a no-brainer, right? Well, not when you look at its performance, or rather, lack of performance. According to Eric Melski's numbers, SCons's runtime seems to grow as the square of the number of things being built, which is much worse than the behavior of other systems. We could teach it anyway, and tell students to switch to something else for real-world problems, but we really want to avoid that. Scimatic's Jim Graham pointed me at Vellum, another Python-based build tool. It looks interesting, but it also looks moribund: the last release was April 2008, the last bug report was July 2009, and the last check-in was a month ago. It therefore looks like whatever I choose is going to be hard to learn, slow to run, or have such a small user and developer base that it might well disappear during the lifetime of this course. On the bright side... Um... Err... OK, I don't know how to finish that sentence. Read More ›

Interview with The Hackers Within
Greg Wilson / 2010-07-20
Today's interview is with Nicholas Preston, Katy Huff, and Milad Fatenejad of the The Hacker Within at the University of Wisconsin-Madison. Tell us a bit about your organization and its goals. The Hacker Within is a student-run, skill-sharing, interest group dedicated to scientific software development at the University of Wisconsin-Madison. We typically meet every week or two discuss programming and computation. The meetings are also a good place to ask questions, share ideas, and meet others who use computation in their research. Tell us a bit about the software your group uses. The software our group uses includes: Languages Fortran IDL Python Perl C/C++ R XML (SVG) SQL Editing Vim Emacs Kate Writing LaTeX Sphinx Rest Web Django Web2Py Trac Version Control Subversion Mercurial e-Science scipy numpy MatLab Mathematica R Specialized Agricultural modelling software (e.g., Agro IBIS, Pegasus) Natural Language (Processing) Toolkit Mesh tools (MOAB, MOOSE, etc.) EES (Engineering Equation Solver) Bayesian statistical tools (e.g., WinBUGS) Tell us a bit about what software your group develops. We primarily develop software for use in-house (e.g., our departments and lab groups), but this varies among our members. Some are involved in open source projects as hobbyists and some of the research tools are being spun into the public domain. Our members develop tools for wide-range of disciplines, including nuclear engineering, physics, medical imaging, geographic information systems, epidemiology, ecology, data visualization, and statistics. Who are you hoping Software Carpentry will help? Software Carpentry would help both our group members and colleagues. Our group members often have discipline specific knowledge, but may not have exposure to computational tools which would improve their research efficiency. For example, some are physicists in large research groups without experience using version control and some are Scipy/Numpy wizards who lack experience with data driven web design (for data sharing) or statistics (e.g. R). Everyone would benefit from exposure to new tools, tips, hacks, and languages. As for our colleagues, the course would be useful for attracting new members to our organization; many of whom have basic scripting (hacking) experience with a specific research tool but have limited general programming literacy. Often potential members express an interest in contributing to open source projects, but they are unsure where to begin and are not familiar with the tools for collaborative code development. How do you hope the course will help them? The fundamentals of being a good programmer. Our colleagues are increasingly expected to program for their research, but most don't have formal computer science training. Rather than taking a general course, there is tremendous interest in learning specific and applied skills, such as how to use a text editor, how to do version control, or how to debug. It would be ideal if we could also communicate general "best practices" and concepts that are language agnostic. How will you tell what impact the course has had? We have been individually monitoring the development of the course materials, but have not yet used the materials for teaching. When classes, and our Hacker Within meetings, resume this fall (2010) we plan to work through some of the modules in our meetings and do some test teaching with our colleagues. Read More ›

Interview with STSci's Perry Greenfield
Greg Wilson / 2010-07-20
Today's interview is with Dr. Perry Greenfield of the Space Telescope Science Institute. Tell us a bit about your organization and its goals. We run the Hubble Space Telescope, starting with soliciting and gathering proposals from astronomers that want to use it, to scheduling the selected proposals, generating and uploading telescope commands, processing the downloaded data from raw telemetry to calibrated science images and spectra and providing the results in an archive to the astronomical community. The general goal is to maximize the scientific potential of HST. We are also helping develop the systems needed to run the next large space telescope planned, the James Webb Space Telescope. STScI employs approximately 500 people and is located on the Johns Hopkins University Campus in Baltimore, MD. Tell us a bit about the software your group uses. Python and scientific packages for Python such as numpy and matplotlib. We use nose for testing, subversion for version control, sphinx for documentation. Tell us a bit about what software your group develops. We build software for in-house use, both for operations such as automatic calibration pipelines and for staff astronomer use. We also distribute our tools to the general astronomical community. We have a mix of tools, from modules to read and write standard data files to instrument-specific data reduction and analysis routines. Who are you hoping Software Carpentry will help? Our staff astronomers, who could benefit from use of software tools that they are not really aware of. How do you hope the course will help them? To learn some good practices that will make their scripts more reliable, easier to repeat and less brittle. We frequently have to adopt the algorithms that they develop into more production-quality code, and the better they do the job the first time, the easier it will be for us. How will you tell what impact the course has had? We see better code in scripts given to us to include into production quality code. Read More ›

Five... Five... Five Scripts in One!
Greg Wilson / 2010-07-20
I've updated the script for the introductory episode on version control that I posted yesterday, and added four more covering basic operations, handling conflicts, rolling back changes (with a very brief mention of branching and merging), and setting up repositories for personal use (creating a shared repo requires more skills than students are likely to have, and should be left in the hands of sys admins). The five scripts are down below the break; comments before I start making up slides and recording would be very welcome. Introduction Hello, and welcome to the first episode of the Software Carpentry lecture on version control. In this episode, we will explain what version control is, how it works, and why you should use it. Suppose you and a friend are working together on a paper. You both want to edit the file at the same time—what should you do? You could take turns, but then each of you would spend have the time waiting for the other. Another option would be to go ahead and work simultaneously, then patch things up afterward. But somehow, stuff always winds up getting lost, overwitten, or duplicated. The right solution is to use a version control system. This keeps the master copy of the file in a central repository, usually located on a server—a computer that is never used directly by people, but only by the applications serving them. No-one ever edits the master copy directly. Instead, you and your friend each have a working copy on your own computers. You work independently, making whatever changes you want to your local copies. As soon you are ready to share your changes, you commit your changes to the repository. Your friend can then update her working copy to get those changes. And of course, if your friend finishes her part first, she can commit, and then you can update. But what if you and your friend want to make changes to the same part of the paper? Old-fashioned version control systems prevented this from happening by locking the master copy. Everyone's working copy would normally be read-only. When someone wanted to start work on a file, the version control system would make her copy of that file writeable. When she was finished work, the version control system would copy her changes to the repository, then mark her copy as read-only once again. Only one person at a time could have a writeable copy. This guaranteed that two or more people could never accidentally make changes to the same file at the same time... ...but it also meant that only person at a time could work on any given file. This is essentially the "one at a time" strategy from the start of this episode, but with the version control system acting as the referee to prevent accidents. In practice, locking like this isn't as restrictive as it sounds. If you and your friend repeatedly find that you're trying to edit the same file, the solution is to break your paper (or your program) into several smaller files, so that you can work simultaneously. However, most of today's version control systems use a different strategy, one based on the old saying that it's easier to get forgiveness than permission. In these systems, nothing is ever locked—everyone is always allowed to edit the files in their working copy. Sometimes, of course, you and your friend will make changes to the same part of the paper. If your friend updates first first, her changes go into the repository as usual. If you try to commit something that would overwrite her changes, the version control system will stop you... ...and highlight the conflict by marking the overlapping regions in your working copy. It's up to you to edit the file to resolve the conflict. You can keep your changes, accept your friend's, or write something new that combines the two—it's up to you. Once you have fixed things, you can go ahead and commit. Experience shows that version control is better than mailing files back and forth for at least three reasons. First, it's hard (but not impossible) to accidentally overlook or overwrite someone's changes—the version control system highlights them for you automatically. Second, there are no arguments about whose copy is the most up to date—the master copy is. These features mean that version control is worth using even when you're the only person working on a particular set of files. Because it's a more reliable way to move files around between the computers you use than copying things onto a USB or sending email to yourself. More importantly, whether you're working on your own or in a group, version control allows you to look at or undo changes what you did weeks, months, or years ago. This works because the version control system never actually overwrites the master copy in the repository. Instead, everyone time someone commits a new version, the system saves it on top of the previous master copy, along with some information about when the change was made and who made it. This means that you can always see what the file looked like last week before someone rewrote the analysis section while you were on holiday. It also means that you can always fetch old versions of things, like the exact version of the program you used to produce the graph on page 5 of that paper that someone is now challenging. Version control systems do have one important shortcoming. If you are working with plain text files, it's easy for the version control system to find and display differences, and to help you merge them. Unfortunately, today's version control systems won't do this for images, MP3s, PDFs, and Microsoft Word or Excel files. These aren't stored as text—they use specialized binary data formats, and there usually aren't tools for finding, displaying, or merging the differences between them. In most cases, all the version system can do is say, "These files are different." That's better than nothing, but not by much. Even with this limitation, version control is probably the most important concept in this entire course. It's not just because it facilitates sharing; version control also allows you to look at or undo changes what you did weeks, months, or years ago. We'll talk more about using version control to make your research reproducible in a later lecture. In the next episode of this one, we'll look at the most popular open source version control system in use today, called Subversion. Basic Use Hello, and welcome to the second episode of the Software Carpentry lecture on version control. This episode introduces the basic workflow you'll use when working with version control. To keep things simple, we'll assume that someone has already been set up a repository for you. A later episode will show you how to do this yourself. So, Dracula and Frankenstein have just joined the Universal Monsters project, and need to put some data together about where in the Solar System they should hide their secret lair. Their project's repository is on the software-carpentry.org server, and its full URL is http://www.software-carpentry.org/monsters. Every repository has an address like this that uniquely identifies the location of the master copy. It's Monday night. Dracula sits down at his computer and runs SmartSVN. This is a Subversion client, i.e., a program that runs on your machine, and knows how to move files back and forth to a Subversion version control repository on a server computer. There are lots of other graphical clients out there, and many power users run Subversion commands from the shell, but we'll use SmartSVN in this lecture. In order to create a working copy on his computer, Dracula has to check out the repository. He only has to do this once per project; once he has a working copy, he can update it when he wants to get files from the repository. Using SmartSVN, Dracula goes to the Repository menu and selects Checkout.... The dialog that appears on his screen has two required fields. The first is the URL of the repository, which tells Subversion where to look for the master copy. The second specifies where Dracula wants the working copy put on his computer. After filling them both in, he clicks OK. SmartSVN opens a connection to the server, checks that Dracula is allowed to view the repository, then creates a new directory on his computer and copies files into it. Once the checkout is complete, SmartSVN adds an entry for the project in its bookmarks pane. As in a standard file browser, clicking on directories opens them and displays their contents. Dracula can find out more about the state of the project by using Subversion's log command. If he selects the root of the project in the bookmarks pane and clicks the Log button, SmartSVN displays a list of all the changes made to the project so far. This list includes the revision number, the name of the person who made the change, the date the change was made, and whatever comment the user provided when the change was made. As you can see, the monsters project is currently at revision 12. While we have this dialog open, notice how detailed the comments on the updates are. Good comments are as important in version control as they are in coding, because without them, it can be very difficult to figure out who did what, when, and why. You can use "Made changes" and "fixed it" if you want to, or even nothing at all, but you'll only be storing up work for yourself in the future. A couple of cubicles away, Frankenstein also runs SmartSVN to check out a working copy of the repository. He also gets Version 12, so the files on his machine are the same as the files on Dracula's. Unfortunately, the first time he leans back in his chair, it breaks, so he has to go and find a new one. While Frankenstein is looking for a new chair, Dracula decides to add some information to the repository about Jupiter's moons. Using his favorite editor, he creates a file in the jupiter directory called moons.txt, and fills it with information about Io, Europa, Ganymede, and Callisto. After double-checking his data, he wants to commit the file to the repository so that everyone else on the project can see it. The first step is to use SmartSVN to add the file to his working copy. This isn't the same as creating the file—Dracula has already done that. Instead, this step tells Subversion to start keeping track of changes to the file. It's quite common, particularly in programming projects, to have files in a directory that aren't worth storing in the repository. We'll see examples of these in later episodes. Once he has told Subversion to add the file, Dracula can commit his changes to the repository. instructions Notice that the version number has changed from 12 to 13. This version number applies to the whole repository, not just to files that have changed. SmartSVN and other clients display it on a file-by-file basis because it's possible to have a mix of old and new versions of files in a working copy. You could, for example, decide to update your copy of a paper's bibliography, but not fetch the latest version of the conclusions section, because you're in the middle of making changes to it yourself. The next morning, after he has finally found a chair big enough for him, Frankenstein starts work once again. When he fires up SmartSVN, it shows him that there's a file in the repository that's not in his working copy, so he does an update to get it. Frankenstein's working copy is now up to date with Version 13 of the repository, which is the current head revision. Looking in jupiter/moons.txt, Frankenstein notices that Dracula has misspelled "Callisto"—it's supposed to have two L's. Frankenstein edits that line of the file. He then adds a line about Amalthea, which he thinks might be a good site for a secret lair despite its small size. Frankenstein then commits his changes to create Version 14 of the repository. Later that night, when Dracula wakes up and starts work again, the first thing he does is check to see what changes are in the repository that he doesn't have yet. This is a very common workflow: before updating his working copy, Dracula always takes a look to see what might be affected, in case he'd rather carry on working with his current version than deal with someone else's updates. He decides that he wants the changes, so he does an update. His copy of jupiter/moons.txt is now in sync with the master, and with Dracula's (unless Dracula has made some changes since committing). One thing that's worth noticing in this story is how important Frankenstein's comments about his changes were. It's hard to see the difference between 'Calisto' with one L and 'Callisto' with two L's, even if the line containing the difference has been highlighted. Without Frankenstein's comment, Dracula might have wasted time wondering if there actually was a difference or not. In fact, Frankenstein should probably have made two separate commits, since there's no logical connection between fixing a typo in Callisto's name and adding information about Amalthea to the same file. Just as a function or program should do one job, and one job only, a single commit to version control should have a single logical purpose so that it's easier to find, understand, and if necessary undo later on. Managing Conflicts Hello, and welcome to the third episode of the Software Carpentry lecture on version control. In this episode, we will look at what happens when there's a conflict during a commit, and how you can merge changes to fix it. At the end of our previous episode, Dracula and Frankenstein had both synchronized their working copies of the monsters repository with the master, where the head is Version 14. They're both working overnight, and both want to make changes to jupiter/moons.txt. Dracula edits his copy to change Amalthea's radius from a single number to a triple to reflect its irregular shape. As he's doing this, Frankenstein is editing his copy of the file to add information about two other minor moons, Himalia and Elara. Dracula commits first, creating Version 15 of moons.txt in the repository. A few minutes later, Frankenstein updates his working copy—unlike Dracula, he's careful to update before trying to commit. Subversion tells him that moons.txt has changed behind his back. We can draw a timeline of these changes [diagram], and some clients for Subversion and other version control systems actually display diagrams like this automatically. None of the changes conflict with each other—Dracula edited a line that Frankenstein didn't change, while Frankenstein added two lines further down the file—so Subversion just goes ahead and changes Frankenstein's working copy. This does not mean that Frankenstein's changes have been committed to the repository—Subversion only does that when it's ordered to. Frankenstein's changes are still in his working copy, and only in his working copy. He has to do a commit to share them with anyone else. Doing that brings the two independent streams of editing back together to create Version 16. [diagram] Once Dracula does an update, both working copies and the master are in sync with each other. At this point, Frankenstein and Dracula both decide to add units to the column headers in the file. For once, Frankenstein is quicker off the mark; he adds these two lines to the file and commits to create Version 17. While he was making those changes, though, Dracula also added a header to create a different version of the file. He also changed the fifth column title from "Radius" to "Object Radius". If Dracula tries to do a commit without first doing an update, Subversion will tell him that he can't. It detects the conflict between his changes and Frankenstein's and refuses to allow Dracula to overwrite Frankenstein's work. Instead, Dracula must do an update to copy Frankenstein's changes into his own working copy. When he does this, Subversion modifies your copy of the conflicted file, and creates three temporary files beside it. The first of the temporary files is called moons.txt.r16. It is the file as it was in your local copy before you started making changes, i.e., the starting point for your work. The second file is moons.txt.r17. This is the most up-to-date version from the repository that includes Frankenstein's changes. The third temporary file is called moons.txt.mine. This is the file as it was in your working copy before you did the Subversion update. Finally, Subversion modifies the file in question, moons.txt, to show your changes and the changes from the repository side by side. Wherever there is a conflict, Subversion inserts the line <<<<<<< .mine followed by the lines from your copy of the file. It then inserts the separator =======, followed by the lines from the repository's file that are in conflict with yours, and puts >>>>>>> .r17 at the end. Some power users prefer to work with these interpolated changes directly, but for the rest of us, there are several tools for displaying diffs and helping to merge them. If Dracula launches the one that's built in to SmartSVN, it displays his file, the common base that he and Frankenstein were working from, and Frankenstein's file in a three-pane view. He can use the buttons to pull changes from either of the edited versions into the common ancestor to merge changes, or edit the merge file directly. In this case, the conflict is easy to resolve. After a moment's reflection, Dracula decides that Frankenstein's version is better than his, except that he prefers to use circumflex '^' instead of double-star '**' for exponents. He edits the conflicted line accordingly. Once he is done, he saves his changes and exits the diff/merge tool. Subversion will now let him go ahead and commit the resolved file to create Version 18. The final timeline looks like this. [diagram] In this case, the conflict was small and easy to fix. However, if two or more people on a team are repeatedly creating conflicts for one another, it's usually a signal of deeper communication problems: either they aren't talking as often as they should, or their responsibilities overlap. If used properly, the version control system can help the team find and fix these issues so that it will be more productive in future. Rolling Back Changes Hello, and welcome to the fourth episode of the Software Carpentry lecture on version control. In this episode, we will show you how to undo changes so that you can get back earlier versions of your files. We'll start with the simplest case. Suppose that Wolfman made some changes to a program during a full moon. The next day, when he's back in human form, he looks at what he did and realizes that it's never going to work. How can he restore those files to the state they were in before he started editing? Without version control, his choices would be grim. He could ask his colleagues to send him their copies of the files... ...or try to edit them back into their original state by hand (which for some reason hardly ever seems to work). But he's using Subversion, and hasn't committed his work to the repository, so all he has to do is revert his local changes. The Subversion revert command simply throws away local changes to files and puts things back the way they were before those changes were made. If you've edited a file, your edits are discarded, and the file's contents are restored. If you use the remove command to get rid of files or directories, they are resurrected. And if you used add to add new files, reverting tells Subversion not to worry about them any more. It doesn't delete them, since "add" really just means "start paying attention to". But what if Wolfman actually had committed his changes to the repository? Things are a bit more complicated in this case, but only a bit. The trick is to realize that once a change is in the repository, it's there forever. There's no way to erase a commit—instead, what you have to do is copy the old version on top of the latest one, and then commit that change. The command that does this is merge. It can do a lot more than just recover old versions of files, but we'll start with that case. Our working copy is currently in sync with Version 25. We want to restore Version 24 of the file important.py. What we really mean when we say that is that we want Version 26 of the file to contain what Version 24 contained, because we can't just erase Version 25 from history. After all, we might decide later on that we were mistaken, and that parts of Version 25 really were worth keeping. When we run Subversion's merge command, we have to specify the two things we're merging. The first is the current version of important.py The second is Version 24 of the same file, so we specify that revision. When the command runs, it creates the same three temporary files as an update with conflicts, and puts the same conflict markers in our working copy. Wolfman is old school, so instead of using the three-pane diff/merge tool, he decides to edit the merged file with a conventional editor. After he is done, he uses Subversion's resolved command to tell it that the conflict has been fixed... ...then commits the new file, which is really the old file in disguise. This same technique can be used to recover older revisions of files, not just the most recent. It can also be used to recover many files or directories at a time. But the most frequent use is to manage parallel streams of development in large projects. This is outside the scope of this lecture, but the basic idea is simple. Suppose that Universal Monsters has just released a new program for designing secret lairs. Wolfman and the Mummy are doing technical support: their job is to fix any bugs that users find. At the same time, Dracula and Frankenstein are supposed to start adding a few features that had to be left out of the first release because time ran short. All sorts of things could go wrong if both teams tried to work on the same code at the same time. For example, if Wolfman fixed a bug and sent a new copy of the program to a user in Greenland, it would be all too easy for him to accidentally include the half-completed shark tank control feature that Frankenstein was working on. The usual way to handle this situation is to create a branch in the repository for each major sub-project. Branches in version control repositories are often described as "parallel universes". Each branch starts off as a clone of the software at some moment in time—typically each time the software is released, or whenever work starts on a major new feature. Changes made to a branch only affect that branch, just as changes made to the files in one directory don't affect changes made to files in other directories. However, if someone decides that a bug fix in the "maintenance" branch should also be made in the "development" branch, all they have to do is merge the files in question. This is exactly like merging an old version of a file with the current one, but instead of going backward or forward in time, the change is brought sideways from one branch to another. Once any conflicts created by the merge have been resolved, the merged file or files can be committed as usual. Branching helps projects scale up by letting sub-teams work independently, but too many branches can cause as many problems as they solve. If you'd like to know more about branching and merging, see Karl Fogel's excellent book Producing Open Source Software, or Laura Wingerd and Christopher Seiwald's "High-level Best Practices in Software Configuration Management". Keep in mind, though, that branching and merging is a fairly advanced topic, and not something you should need until you have many developers or several active versions of your program. Creating a Repository Hello, and welcome to the fifth episode of the Software Carpentry lecture on version control. In this episode, we will show you how to set up a repository of your own. A word of warning: you will need to know a little bit about using a Unix shell to do this. If you've never used the shell, you should probably ask someone else to create your repository. Here's the simplified picture from our first episode of what we're trying to achieve. We want to keep the master copy of our work in a repository on a server that we can access from other machines on the internet. That master copy is a bunch of files and directories. Nobody ever edits them directly. Instead, a copy of Subversion runs on that machine, managing updates for us and watching for conflicts. Our working copy is a mirror image of the master sitting on our computer. When our Subversion client needs to communicate with the master, it connects to the copy of Subversion running on the server to move data back and forth. In order for all of this to work, we need four things. First, we need the repository itself. It's not enough to create an empty directory and start filling it with files: Subversion needs to create a lot of other structure in order to keep track of old revisions, who made what changes, and so on. Second, we need to know the web address—the URL—of the server. In fact, we need even more than that: we need the full URL of the repository on that server, since a single server could host any number of Subversion repositories. The third thing we need is permission to read or write the master copy. Many open source projects give the whole world permission to read from their repository, but very few allow strangers to write to it: there are just too many possibilities for abuse. Somehow, we have to set up a password or something like it so that users can prove who they are. The fourth and final thing we need is a working copy of the repository on our computer. We saw how to create that in the second episode of this lecture by checking out a copy of the repository; please review that episode if you need a refresher. To keep things simple, we'll start by creating the repository on the machine that we're working on. This won't let us share the repository with other people, but it will allow us to save the history of our work as we go along. The command to create a repository is svnadmin create, followed by the path to the repository. If we want to create a repository called lair_repo directly under our home directory, we can just cd to that directory and run svnadmin create lair_repo. This command creates a directory called lair_repo to hold our repository. We should never edit anything in this repository directly: doing so probably won't tear our sanity to shreds and leave us gibbering mindlessly in horror, but it will almost certainly make the repository unusable. To get a working copy of this repository, we use Subversion's checkout command. If the path to our home directory is /Users/mummy, then the full path to the repository we just created is /Users/mummy/lair_repo, so we use svn checkout file:///Users/mummy/lair_repo lair_working. The first argument is the URL of our repository. file:// says that it's part of the local machine's filesystem, and /Users/mummy/lair_repo is the path to repository directory. Notice that the protocol ends in two slashes, while the absolute path to the repository starts with a slash, making three in total. A very common mistake is to type only two, since that's what web URLs normally have. When we're doing a checkout, it is very important that we provide the second argument, which specifies the name of the directory we want the working copy to be put in. Without it, Subversion will try to use the name of the repository, lair_repo, as the name of the working copy. This means that Subversion will try to overwrite the repository with the working copy, since they have the same name. Again, there isn't much risk of our sanity being torn to shreds, but this would ruin our repository. To avoid these problems, most people create a sub-directory in their account called something like repositories or repos, and then create their repositories in that. For example, we could create our repository in /Users/mummy/repos/lair, then check out a working copy as /Users/mummy/lair. Now let's see how to create a repository on a server. Let's assume we have a Unix shell account on a server called monstrous.monsters.org, and that our home directory is /u/mummy. To create a repository called lair, we log into that computer, then run the command svnadmin create lair. (Once again, we would probably actually create a sub-directory called something like repos and put our repository in there, but we'll skip that step here to keep our URLs short.) The URL for the repository we just created is monstrous.monsters.org/u/mummy/lair—except that's not a complete URL, because it doesn't specify the protocol we are going to use to connect to the repository. A protocol like HTTP or FTP defines how communication takes place between two computers: who talks first, how each party identifies itself, what data should be sent when, and so on. It's very common to use the HTTP protocol to communicate with Subversion, but setting that up requires some knowledge of how web servers work. We're going to use a combination of two protocols to access our repository. The first is called SSH, which stands for "Secure Shell". You probably used it to log in to the server to create the repository, though it might have been hidden inside a GUI client like Putty. SSH specifies rules for connecting to remote computers, providing passwords to prove your identity, and so on. The second protocol is SVN, which is a specialized protocol defined by Subversion for moving data back and forth, comparing different versions of files, and so on. Putting these together, the full URL for our repository is svn+ssh://mummy@monstrous.monsters.org/u/mummy/lair. Breaking this back into pieces: svn+ssh is the protocol. (It has to be spelled exactly this way: "ssh+svn" should work, but doesn't.) mummy@monstrous.monsters.org specifies who we are and what machine we're connecting to. /u/mummy/lair is where our repository is located. Every Subversion repository URL has these parts: a protocol, something to identify the server (which may optionally include a user ID if the repository isn't publicly readable), and a path. Let's switch back to our local machine and check out a copy of the repository. When we run Subversion's checkout command, our client makes a connection to the server monstrous.monsters.org and then prompts us for the password associated with the mummy account. By entering this password, we are proving to the server that we are the user mummy, or at least that we have the right to read and write the files that belong to mummy. Notice that our client gives us the option of saving our password locally, so that we don't have to re-enter it each time we update from or commit to this repository. A lot of people think this is a bad idea, since anyone who stole that password from our local machine would then be able to log into the server as us and do horrible things. We'll look more closely at the pros and cons of this in a future lecture. A bigger question is how to give other people access to the repository we have just created so that they can start working with us. Unfortunately, this really does require things that we are not going to cover in this course. If you need to do this, you can: ask your system administrator to set it up for you, use an open source hosting service like SourceForge or Google Code, or spend a few dollars a month on a commercial hosting service like DreamHost that provides web-based GUIs for creating and managing repositories. Read More ›

A Note on Tools
Greg Wilson / 2010-07-20
After a bit of experimenting, I've decided to use SmartSVN as a GUI client for Subversion in the version control lectures, rather than RapidSVN or the command line. My reasoning is: Requiring people to learn the shell before they can start learning stuff they actually care about raises the drop-out rate noticeably. Requiring them to learn the shell before they can start learning something they don't yet realize they ought to use, like version control, raises it even more. That rules out the Subversion command line tools. SmartSVN is dual-op (the stripped-down version is free, the pro version is not, neither is open source). In contrast, RapidSVN is pure open source, so it would be my first choice (in fact, it was)... ...except RapidSVN doesn't have a built-in diff/merge tool. They recommend SourceGear's DiffMerge, which is free (that's good) but users have to configure RapidSVN to use it (that's bad—very bad). On the Mac, for example, users have to copy a shell script into /usr/bin, then go into RapidSVN's preferences, set the path to the shell script, and get eight (!) command-line arguments exactly right, or things will fail in strange ways (like you can merge files, but when it tries to save your work, it doubles the dirname of the path to the file and then complains about not being able to create /Users/swc/repo/Users/swc/repo/filename). If something can break for reasons that its intended users won't be able to understand, diagnose, and fix, then that something is itself fundamentally broken. In my opinion, the possibility of configuration hell with RapidSVN outweighs both its licensing and its support for "file://" URLs (which SmartSVN doesn't allow). And don't get me started on build tools—I am not looking forward to having to choose something for that lecture... :-( Read More ›

Script for Introduction to Version Control
Greg Wilson / 2010-07-19
While I'm waiting for artwork, I'd be grateful for feedback on the script for the introductory episode on version control. Hello, and welcome to the first episode of the Software Carpentry lecture on version control. In this episode, we will explain what version control is, how it works, and why you should use it. Suppose you and a friend are working together on a paper. You both want to edit the file at the same time—what should you do? You could take turns, but then each of you would spend have the time waiting for the other. Another option would be to go ahead and work simultaneously, then patch things up afterward. But somehow, stuff always winds up getting lost, overwitten, or duplicated. The right solution is to use a version control system. This keeps the master copy of the file in a central repository, usually located on a server—a computer that is never used directly by people, but only by the applications serving them. No-one ever edits the master copy directly. Instead, you and your friend each have a working copy on your own computers. You work independently, making whatever changes you want to your local copies. As soon you are ready to share your changes, you commit your changes to the repository. Your friend can then update her working copy to get those changes. And of course, if your friend finishes her part first, she can commit, and then you can update. But what if you and your friend want to make changes to the same part of the paper? Old-fashioned version control systems prevented this from happening by locking the master copy. Everyone's working copy would normally be read-only. When someone wanted to start work on a file, the version control system would make her copy of that file writeable. When she was finished work, the version control system would copy her changes to the repository, then mark her copy as read-only once again. Only one person at a time could have a writeable copy. This guaranteed that two or more people could never accidentally make changes to the same file at the same time... ...but it also meant that only person at a time could work on any given file. This is essentially the "one at a time" strategy from the start of this episode, but with the version control system acting as the referee to prevent accidents. In practice, locking like this isn't as restrictive as it sounds. If you and your friend repeatedly find that you're trying to edit the same file, the solution is to break your paper (or your program) into several smaller files, so that you can work simultaneously. However, most of today's version control systems use a different strategy, one based on the old saying that it's easier to get forgiveness than permission. In these systems, nothing is ever locked—everyone is always allowed to edit the files in their working copy. Sometimes, of course, you and your friend will make changes to the same part of the paper. If your friend updates first first, her changes go into the repository as usual. If you try to commit something that would overwrite her changes, the version control system will stop you... ...and highlight the conflict by marking the overlapping regions in your working copy. It's up to you to edit the file to resolve the conflict. You can keep your changes, accept your friend's, or write something new that combines the two—it's up to you. Once you have fixed things, you can go ahead and commit. Experience shows that version control is better than mailing files back and forth for at least three reasons. First, it's hard (but not impossible) to accidentally overlook or overwrite someone's changes—the version control system highlights them for you automatically. Second, there are no arguments about whose copy is the most up to date—the master copy is. These features mean that version control is worth using even when you're the only person working on a particular set of files. Because it's a more reliable way to move files around between the computers you use than copying things onto a USB or sending email to yourself. More importantly, whether you're working on your own or in a group, version control allows you to look at or undo changes what you did weeks, months, or years ago. This works because the version control system never actually overwrites the master copy in the repository. Instead, everyone time someone commits a new version, the system saves it on top of the previous master copy, along with some information about when the change was made and who made it. This means that you can always see what the file looked like last week before someone rewrote the analysis section while you were on holiday. It also means that you can always fetch old versions of things, like the exact version of the program you used to produce the graph on page 5 of that paper that someone is now challenging. Version control systems do have one important shortcoming, though. If you are working with plain text files, it's easy for the version control system to find and display differences, and to help you merge them. Unfortunately, today's version control systems won't do this for images, MP3s, PDFs, and Microsoft Word or Excel files. These aren't stored as text—they use specialized binary data formats, and there usually aren't tools for finding, displaying, or merging the differences between them. In most cases, all the version system can do is say, "These files are different." That's better than nothing, but not by much. Even with this limitation, version control is probably the most important concept in this entire course. It's not just because it facilitates sharing; version control also allows you to look at or undo changes what you did weeks, months, or years ago. We'll talk more about using version control to make your research reproducible in a later lecture. In the next episode of this one, we'll look at the most popular open source version control system in use today, called Subversion. The four episodes after this introduction will cover: the basic update/edit/commit cycle, merging conflicts, setting up a repository, and reverting to old versions of files. Of these, #3 is the hardest, since we are not assuming people know how to use a shell or set up public/private keys. Not sure how to handle that yet; advice would be welcome. Read More ›

An Interview with Hans Petter Langtangen
Greg Wilson / 2010-07-18
Our latest interview is with Hans Petter Langtangen, author of two books for scientists about Python (and a lot more). Tell us a bit about your organization and its goals. I am on 80% leave from the computer science department at the University of Oslo, to work at Simula Research Laboratory. Simula is primarily funded by the government, with a commitment to do long-term fundamental research in three areas: software engineering, networks and distributed systems, and scientific computing. My work is in the latter area. The funding is dependent on a successful scientific evaluation by an international panel every five years. About 120 people work at Simula today. Education is the second focus point at Simula. Many researchers have appointments at the University of Oslo and do regular teaching there, like myself. A separate unit, The Simula School of Research and Innovation, organizes the education of master, PhD, and postdoc candidates at Simula. The third focus point of Simula is innovation, in particular start-up companies based on promising research results. Tell us a bit about the software your group uses. Our daily main task is to solve partial differential equations and write papers about the results. We use both commercial, open source, and in-house software. Examples from the first category are MATLAB, Star-CD, and Fluent. Widely used open source packages are Vtk for visualization, VMTK for creating geometries of blood vessels from images, version control systems, LaTeX for mathematical text and slides, and almost all types of Python modules. Our simulations are to a large extent done with in-house software. This can be smaller, specialized, stand-alone programs, or more general programs written in a larger framework, primarily FEniCS. Most of us are very down to the ground when it comes to computer tools: the main working horse is a plain text editor, Emacs or Vim, in combination with a terminal window. Automation via scripting is being done all day long. Now we are heavy users of Python, while we a decade ago depended much on Perl, Bash, Sed, Awk, Make, Autotools and that generation of software. For compiling and linking applications, SCons is a popular tool now, but Make or perhaps just some Python or Bash script may be sufficient. It goes without saying that we love command-based tools and hate GUIs. Integrated development environments, say Eclipse, are hardly used by anyone. We demand people to use version control systems, mainly Mercurial, Subversion, or Bazaar, for software development as well as paper and book writing. Tell us a bit about what software your group develops. Our research group has a long history in developing frameworks for solving partial differential equations. In the 1980s, Fortran 77 and Bourne shell were the languages. In the 1990s, C++ and Perl dominated, especially in the Diffpack development. The languages of choice in our group have in the 2000s been Python and C++. MATLAB is also popular, usually before new people discover that Python can do the same — and more. At the moment we are heavily involved in the FEniCS project, which consists of a dozen software components, written in C++ and Python. Several institutions and an international user community participate in the development of FEniCS components, applications, and documentation. Most FEniCS simulators are written in Python, but the Python program generates C++ code tailored to the problem at hand, and links this C++ code to general libraries for finite element computations, linear algebra packages, etc. Simula has the primary responsibility now for distributing FEniCS as an open source system. Building and testing FEniCS, with and all its dependencies, such as PETSc and Trilinos, can quickly be a nightmare. We have a dedicated scientific programmer working with a Builbot system for FEniCS as well as packaging FEniCS for Debian and other binary distribution repositories. We also develop some other, smaller open source packages. I have recently been involved in three: scitools for my books, latexslides for Python-generated latex slides, and ptex2tex for extending latex. These are distributed through googlecode. Tell us about your course and your books. I guess you're interested in books related to scientific software? The Python Scripting for Computational Science book evolved from the need to teach my master and PhD students what scripting is. In the mid 1990s I used Perl a lot, but the only Perl book, the famous Camel book, didn't give my students the vaguest idea how Perl could be used to do our science in a more effective and reliable way. Therefore, I started a course in 1999 at the University with the aim of teaching scripting and automation in science. Or, honestly speaking, the aim was to avoid reteaching this topic to every new master or PhD student that entered the group. The course notes initially explained how to do scripting in Perl. Over a couple of years, however, we experienced increasing use of Python in-house, as it was much easier to maintain Python code than Perl code. Students also learned Python more quickly than Perl, and all the entertaining side effects and "smart behavior" of Perl was actually found disturbing in teaching, at least when you see Perl and Python in action side by side. The course notes evolved into a book in 2003, exclusively with Python-based material. At that time, there were very few Python books and little documentation of how to effectively do non-numeric administering work in the context of science and high-performance computing. The interest in these topics exploded over the next years. The book was quickly sold out, and a demand for new editions arose. Now the book is a best-selling one in its category. For the fun of it I like to mention that the publisher was not very fond of such a scripting book when I first suggested it in 2000, the market was simply considered too small. But I wrote the book, knowing that this material was important! So, to my next book project, A Primer on Scientific Programming with Python. All over the world, you find computer science departments offering a first programming course with Java as language. This is not optimal for science, and experience shows that most students need to relearn programming in the context of MATLAB, Fortran, C, C++ and scripting languages. At the University of Oslo we introduced a major reform in science education in 2003, with the aim of using programming and simulation as tools for exploring mathematical models throughout all courses, also at the Bachelor level. As part of this reform, we needed a programming course the first semester which could target science problems, numerical methods, and programming styles for later science courses. The idea was to adopt Python as first language and focus both on MATLAB-style programs as well as on object orientation. Recall that OO was invented 300 meters from the building where this course is taught! We wanted an integrated approach so that programming could be learned via examples involving scientific applications, from physics, biology and finance, combined with numerical approaches to handle the mathematics. No existing book offered an integrated approach, so it was again natural to develop a book along with the course. The "primer" book was published in 2009 and has been very well received. Although it targets newbies in programming, it seems that the easy-to-read style is also useful for experienced scientists and PhD students who want to do scientific computing with Python. The "primer" book aims at numerical computing, while the "scripting" book essentially deals with all the non-numerical tasks you need when doing math on the computer. Since there are few days without an email with "thank you for your excellent book... I have a question..." from from people all around the world, these books seemingly prove useful. The example-oriented writing style with much code that can be directly copied to the reader's own problem area is probably the main feature for the popularity and large sales. It's also quite amusing that the "scripting" book is selling so well despite being available at various pirate sites. It seems that people still want hardcopies of books, not PDF files only. How do you tell what impact the course has had? We have educated over 1000 people in Python, both scientists and administrative software developers. When we started in 1999, Python has hardly used at all in Norwegian industry. Our candidates with Python knowledge have introduced this language in a lot of companies, and now there is a significant need for Python competence out there. And by "Python" I mean much more than the language, it's the way of working: automating manual operations for reliability, being more effective, knowing a lot of useful modules, seeing new ways to do things, etc. Most of these students applied their Python and scripting knowledge in their work with master and PhD theses, which we believe has led to more effective and reliable research. Also, the courses we offer educate our own students with the right tools for doing a thesis in our group. However, when we recruit PhD students from elsewhere, without this education, we see a demand for a quick to-the-point course on what you need to know about effective working habits in a "terminal window". This is where your Software Carpentry hopefully comes to rescue! What are your plans for future work? The "scripting" course has been very popular for 11 years now. Unfortunately, nobody oustide our own research group sees any interest in maintaining and developing this course. The technology is rapidly evolving, and many of the tools in the first edition of the "scripting" book became quickly outdated. Since we are scientists with little time for teaching, it is hard to keep up with the technological improvements and incorporate them into new editions of the book and the course. We end up with doing minimal updates, which is not satisfactory. However, we also have strong interest in and need for other more science-oriented courses, so I anticipate that future book projects and courses will be on other subjects. Read More ›

A Gentle Introduction
Greg Wilson / 2010-07-18
Via several routes, I've been pointed at MIT Open CourseWare's "A Gentle Introduction to Programming Using Python", as taught by Sarina Canelake in January 2010. An hour of lecture a day for 10 days (two weeks), plus two hours of lab per day, it is... "a gentle introduction to programming using Python for highly motivated students with little or no prior experience in programming computers". The pace is brisk: 1 Variables and operators 2 Conditionals and iteration 3 Functions 4 Control flow with while 5 Project 1: structuring larger programs 6 Review, dictionaries 7 Introduction to objects 8 Objects and inheritance 9-10 Project 2: working in a team If I understand the list of readings correctly, everything we expect students to know before they start Software Carpentry is covered in the first half of MIT's course, i.e., they think that five full hour-long lectures and ten hours of practical work are enough to teach variables, operators, if/else, strings, lists, for loops, functions, and simple file I/O to people who've never programmed before ("highly motivated" people, anyway). By comparison, Version 3 of this course covered the same material in just three hours, though it assumed students had programmed before, just not in Python. Read More ›

Clip Art
Greg Wilson / 2010-07-16
As per your votes, I'm about to start writing the lecture on version control. Our first attempt, two months ago, did a pretty good job of covering the basics, but the slides were downright boring. In particular, we used standard Microsoft clip art to illustrate two people sharing files via a server: I'd like to use something more entertaining, like these animals from StudioFibonacci: or these Pacman-style monsters from Nicubunu: If we're going to do that, though, I want to do it right, which means having at least four different characters, and some computers and stuff, seen from various angles (preferably in the isometric style, because, you know, the eighties are back in style). If you happen to have some open licensed clip art lying around, or know where I could find some, please get in touch. Read More ›

Survey Results
Greg Wilson / 2010-07-15
Here are the results of the survey that we announced a couple of days ago. I'm a bit surprised that so many computer scientists responded, and equally surprised by the popularity of "biomedical engineering" — who knew? The scores for various topics hold a few surprises as well: I would have predicted that something with the word "web" in it would have scored near the top of the list, rather than at the bottom. But it's clear that version control has to be the next lecture we produce, followed by one on task automation. We're going to use Subversion for the former: Git and Mercurial and other distributed version control systems are clearly on the rise, but there isn't a clear winner yet, and integration with other tools still lags. Deciding what to use for task automation is harder: we've always used GNU Make in the past, but that requires knowledge of the shell, which many of our intended audience don't have. Ant is a non-starter; SCons or Rake would be better from a geek point of view, but again, there's the question of tool support. Your thoughts would be greatly appreciated... Education Graduate degree 69 75.8% Undergraduate degree 22 24.2% Area Computer Science 52 57.1% Mathematics and Statistics 22 24.2% Earth Sciences 20 22.0% Physics 17 18.7% Biomedical Engineering 15 16.5% Microbiology 13 14.3% Electrical Engineering 8 8.8% Macrobiology 7 7.7% Business/Finance 4 4.4% Mechanical Engineering 4 4.4% Medicine and Health Care 4 4.4% Astronomy 3 3.3% Economics 3 3.3% Psychology 3 3.3% Other 8 8.8% Job Academic Researcher 40 44.0% Software Developer 31 34.1% Graduate Student 16 17.6% Engineer 14 15.4% Government Research Scientist 8 8.8% Manager/Supervisor 8 8.8% System Administrator 6 6.6% Industrial Research Scientist 2 2.2% Teacher 2 2.2% Topics Version Control 2.64 Automating Repetitive Tasks 2.59 Data Visualization 2.53 Reproducible Research 2.51 Testing and Quality Assurance 2.51 Coding Style 2.44 Data Structures 2.44 Debugging with a Debugger 2.40 Designing a Data Model 2.36 Object-Oriented Programming 2.34 Performance Optimization 2.32 Basic Programming 2.31 Using the Unix Shell 2.31 Refactoring 2.25 Parallel Programming 2.23 Working in Teams/on Large Projects 2.22 Computational Complexity 2.18 Packaging Code for Release 2.16 Static and Dynamic Code Analysis Tools 2.12 Design Patterns 2.10 Systems Programming 2.03 Integrating with C and Fortran 1.98 Matrix Algebra 1.97 Functional Languages 1.94 Handling Binary Data 1.88 Image Processing 1.82 XML 1.80 Build a Desktop User Interface 1.76 Create a Web Service 1.75 Introduction 1.68 Geographic Information Systems 1.51 Read More ›

Two New Episodes on Dictionaries
Greg Wilson / 2010-07-14
We have just posted the last two episodes of the lecture on sets and dictionaries, which introduce dictionaries and give examples of their use. Please let us know what you think about the pace and level of detail. Read More ›

Traffic
Greg Wilson / 2010-07-13
In case you're interested, we seem to average slightly around 280 views per blog post right now. I'll post stats on page views (i.e., how many people are looking at the lectures themselves) at the end of this month. Read More ›

Interview: Andrew Lumsdaine of Indiana University
Greg Wilson / 2010-07-13
Today's interview is with Indiana University's Professor Andrew Lumsdaine. Tell us a bit about your organization and its goals. One of the reasons the School of Informatics and Computing exists is that computing of various kinds has become part of almost all academic disciplines. The school's goal is to educate students in computer-related areas, as well as traditional computer science, and it's important that the principles of software development be taught. The same holds for my research group, which works in HPC (where the 'P' means both "performance" and "productivity"). There's a huge need for reproducibility in CS research, both for the sake of sound science and also so that people actually can build on each other's work. In order for that to happen, we need some guarantees about quality and reusability, and improving basic skills is a necessary prerequisite. Tell us a bit about the software your group uses. The first group (students in the school) uses every kind of off-the-shelf application you can think of. It's mostly closed source; they do relatively little development. My research group mostly build their own tools, and tend to be pretty zealous about open source. Tell us a bit about what software your group develops. We started by building a version of MPI, and are now part of the Open MPI collaboration. Working with dozens of collaborators around the world requires the same skill set as other open source projects: having a software repository, nightly build, regression tests, and proper licensing protocols is essential. We also contribute to the BOOST C++ library, in particular a parallel version of the BOOST graph library. How do you hope the course will help them? Bill Gropp once said, "Computers should be a labor saving device," but it often doesn't feel that way. We think that adopting basic development practices will allow people to do more and better science. We also think that organizing this material, instead of having grad students tutoring each other erratically, will give us a common base of knowledge that we can then rely on. It's important for us institutionally that the course is self-contained. Introducing a bit of computing here and there across the curriculum is an idea that comes up a lot in faculty meetings, but I don't know of any successful across-the-curriculum efforts. Putting this training in one place is more efficient, and makes it someone's job to ensure success. How do you plan to evaluate the impact the course has had? (laughs) That's a forbidden question in Computer Science. Read More ›

A Shorter Version of the Sets and Tuples Episode
Greg Wilson / 2010-07-13
One of the many advantages of online delivery of course material is that it allows us to present content in several ways, tailored to different audiences. For example, some people want to understand how sets are stored in hashtables; others just want to know that you cannot put a list in a set, and should use a tuple instead. We now have episodes catering to both: the longer and more detailed one posted last week, and a shorter and less detailed one posted today. We'd welcome feedback: is this confusing, helpful, or (as is so often the case) a bit of both? Read More ›

Interview with Michigan State's Titus Brown
Greg Wilson / 2010-07-11
Today's interview is with Professor Titus Brown, from Michigan State University. Tell us a bit about your organization and its goals. Michigan State is one of the Big Ten and hence a really big place. We do a lot of research, in particular, and a lot of that research is in biology. The Gene Expression in Disease and Development focus group is a collection of molecular biologists at MSU that are interested in gene expression and methods (experimental, genomic, and bioinformatic) for understanding it. Tell us a bit about the software your group uses. Most molecular biologists either use pre-packaged analysis tools, or nothing at all. Even the local bioinformaticians have generally not picked up on version control or anything more complex than Perl analysis scripts. Tell us a bit about what software your group develops. My lab develops software for our own research, as well as working on reusable libraries for others to use, and eventually we hope to build GUIs or Web applications for even more general use. We're interested in soup-to-nuts—basic sequence analysis all the way through to database curation and genome-scale visualization. Who are you hoping Software Carpentry will help? Any biology graduate student that needs to do anything unoriginal, computationally speaking. Computational students should find it particularly useful: someone who has been through the normal CS curriculum, for example, but has never learned about SQL databases, version control, Web services, testing, etc. We get a pretty wide range of backgrounds in our interdisciplinary grad students, so it is impossible to identify a single training track that will serve even a majority. Hopefully the SWC material can backstop the material we are already developing on the subject of "being effective at computation." How do you hope the course will help them? Like many other biology research institutions, we're finding ourselves overwhelmed with genome-scale data; all the new sequencing platforms (along with tandem mass spec, and a host of other systems) deliver stunning amounts of data. We are not well prepared to deal with the data, and the old molecular biologist standby of loading everything into Excel doesn't scale at all. So molecular biologists are starting to have to learn to program in order to do pretty much anything with this data. But while we at least have courses that teach people how to program, we have basically no computational science curriculum, and what we do have is targeted less at being effective than at being minimally capable in a given field. I'm not a fan of big ideas. I would just like students to have the ability to improve their general scientific computation skills iteratively, without having to go through a class. How will you tell what impact the course has had? I will be happy the day one of my own students casually (and correctly) uses a technique that s/he could only have learned from the SWC material. I will be thrilled when somebody else's computational student name-drops SWC as the source of a technique that sped up their research. And I will be ecstatic when a previously purely experimental students tells me how great SWC has been for helping them learn how to do computational science better. We don't have any systematic way of assessing the impact of the course, however. Read More ›

Which Topics Are Most Important to You?
Greg Wilson / 2010-07-10
We have created a survey to find out which of the topics we're planning to cover are most important to you. Please take three and a half minutes to fill it in; we'd be grateful as well if you could re-post the link, forward it to relevant mailing lists, etc. http://www.surveymonkey.com/s/FM9YV9C Thanks for your help! Read More ›

HPC and Programmability
Greg Wilson / 2010-07-10
Via Andrew Lumsdaine, a pointer to an interesting article in Communications of the ACM by Eugene Loh titled "The Ideal HPC Programming Language". A few key quotes: These programmability studies began with a focus on programming languages, but the focus quickly shifted to other topics. Existing languages—notably Fortran...—proved remarkably adequate. Programming challenges stem mostly from other factors. The DARPA HPCS program...sponsored the development of new programming languages: Chapel from Cray, Fortress from Sun, and X10 from IBM. Proponents of those languages would show early on how rewriting familiar HPC benchmarks in the new languages could reduce source-code volume substantially—tenfold reductions were not surprising—but rewriting these benchmarks even in Fortran achieved similar source-code reductions and corresponding improvements in expressivity. In HPC, the mind-set is usually to program for performance rather than programmability even before establishing whether a particular section of code is performance sensitive or not. Repeating some of these...studies on larger HPC programs would be interesting. In particular, it would be nice to move from self-contained programs that are small enough for one person to have written...to larger pieces of software... The Cowichan Problems were designed to explore that last issue in an affordable way. If you're interested in helping us re-design them so that they better represent the full range of today's HPC applications, please let me know. Read More ›

Two Episodes on Sets
Greg Wilson / 2010-07-08
I've just posted an introduction to sets in Python, and a brief look at how they're implemented (which you need to know in order to understand why you can't put a list in a set). The second of these has more theory in it than most of our episodes; I'd be grateful for feedback from non-computer scientists about how much sense it makes. Read More ›

Interview: SciNet's Daniel Gruner
Greg Wilson / 2010-07-08
Today's interview is with Dr. Daniel Gruner of SciNet. Tell us a bit about your organization and its goals. SciNet is a High Performance Computing (HPC) consortium. We provide hardware and software platforms for researchers who require HPC in their work. We are also enabling this research, in ways that had not even been thought of before. The enabling is partly due to us having the largest systems in Canada, and partly by helping people port their codes to run on these very large systems. Tell us a bit about the software your group uses. We are not a research group, but rather we provide help to researchers. We have expert HPC developers and practitioners, who use the traditional tools of HPC, namely Message Passing (MPI) or shared memory (OpenMP) multithreading techniques to scale out scientific codes. Tell us a bit about what software your group develops. Our system administrators mostly develop tools for systems management. Our parallel programming analysts assist in porting users' codes to the parallel HPC platforms, and being scientists themselves they write their own codes as well. The software developed here is mostly for scientific simulations and data analysis. How do you hope the course will help them? The course is not really geared to our group, but rather to scientists who are users of our systems. Some of our users are highly skilled software carpenters, but many of them are not. Some develop their own codes, while others use existing software for their simulations or data analysis. Those users who are not skilled carpenters should learn techniques from workflow planning to data management, software development tools to development lifecycle management, data analysis, documentation techniques to version management. We have found that most students in science and engineering who need to use HPC in their work lack the basic education that your course provides. Not to speak of those disciplines where HPC has not been traditionally used... How will you tell what impact the course has had? Hopefully we'll see it in better systems' utilization, smarter approaches to problem solving because the tools will be available and known to the users. We should notice in the way users approach us for help in coding and in running their jobs, and we should be able to interact with users at a higher level of understanding. We should be able to see more and better parallelized codes, more efficient data management, better understanding of how HPC systems operate and how to optimize their utilization. Read More ›

Using Science to Design This Course
Greg Wilson / 2010-07-07
One of the many reasons I left the University of Toronto to work full-time revising this course was to explore two areas I've been curious about for years: educational technology, and the science of learning. With my daughter a year away from kindergarten, is no longer academic [1]. In this post and some that follow, I'd like to talk about what I'm finding and how it's influencing the design of this course. If you want to know more, please subscribe to Mark Guzdial's excellent and thought-provoking blog about computer science education. Our experiments with screencasts and online tutoring are still in their early days (i.e., we have put less than three hours of the former online, and haven't started the latter), so I don't have a lot to report. By October or November, though, I hope to be able to speak more knowledgeably about how cost-effective asynchronous web-based instruction is compared to more traditional modes [2]. Nothing I've learned about ed tech so far has really surprised me. What I've been learning about learning definitely has: there's a lot more science there than I ever suspected [3]. For example, CS educators have spent years arguing "objects first" versus "objects later" as a teaching strategy. Turns out it doesn't matter: outcomes are the same either way [ES09]. Similarly, I've "known" for years that using a debugger helps people learn how to program. Turns out I was wrong, at least in introductory courses [BS10]. At a higher level, there's lots of evidence now showing that novices learn more from worked examples than from working through problems on their own. The theoretical basis for this comes from cognitive load theory, which can, with some work, be translated into concrete course design [CB07]. I'm still digesting this literature, but I would probably never have discovered ideas like fading worked examples without it. More importantly, I wouldn't have been able to distinguish those ideas from others that sound equally plausible, but aren't backed up by evidence [4]. How does this translate into course material? To be honest, I'm not sure yet: "lots of worked examples" is obvious, but other questions—in particular, self-assessment—are still pending. If you know of something with evidence behind it that I should read, I'd welcome a pointer. References [CB07] Michael E. Caspersen and Jens Bennedsen: "Instructional Design of a Programming Course—A Learning Theoretic Approach". ICER'07, 2007. [ES09] Albrecht Ehlert and Carsten Schulte: "Empirical Comparison of Objects-First and Objects-Later". ICER'09, 2009. [BS10] Jens Bennedsen and Carsten Schulte: "BlueJ Visual Debugger for Learning the Execution of Object-Oriented Programs". ACM Trans. Computing Education, 10(2)/8, June 2010. Footnotes [1] Academic (adj.): ...not expected to produce an immediate or practical result. [2] Did I really just use the word "mode"? Br... [3] If you're interested, a good place to start is the CWSEI site, or again, Mark Guzdial's blog. [4] Ernst and Singh's Trick or Treatment is a great book about evidence-based thinking in medicine; I'd happily buy half a dozen copies of something similar about education to give to friends and family. Read More ›

That's, Uh, Pretty Ambitious
Greg Wilson / 2010-07-06
The title of this post is taken from the reaction of the first person I showed this course schedule to. If we're really going to teach Software Carpentry in one university term (12-13 weeks), this is the pace we'll have to move at. What do you think? Is it crazy, or just nuts? And what, if anything, can be done about it? Notes This course is designed for graduate students in science and engineering and their professional peers, which allows a faster pace than would otherwise be possible. All problem sets are due one week after being assigned; their due dates aren't marked to avoid cluttering the table. Based on past experience, each problem set is 4-6 hours of work. There is no midterm or final examination. Date Topics Events 2010-09-07 Introduction to course Post single-paragraph professional biography to course web site 2010-09-09 Associative data structures: sets; dictionaries Problem set #1 (sets & dictionaries) 2010-09-14 Tracking down bugs: visual debuggers; breakpoints; watchpoints; debugging strategies 2010-09-16 Read text files: data processing patterns; incremental development Problem set #2 (parsing); in-lab practical exam on debuggers and debugging 2010-09-21 Quality assurance: data processing patterns; incremental de Problem set #3 (designing tests) 2010-09-23 Regular expressions: patterns; groups; operators; anchors; finite state machines Problem set #4 (regular expressions) 2010-09-28 Iterative program design: choosing data structures; refactoring; tuning In-lab "live design" exercise 2010-09-30 First-class functions: functions as objects; apply; map; reduce Problem set #5 (functional programming) 2010-10-05 Coding style: cognitive concerns; style rules; refactoring revisited Problem set #6 (code reviews); in-lab code review 2010-10-07 File systems programming: directories as files; permissions; properties Problem set #7 (finding duplicate files) 2010-10-12 Version control: motivating problems; update/commit; merge In-lab setup and practice with Subversion or Mercurial 2010-09-14 Image processing: image representation; operators; recursion Problem set #8 (image transformations) 2010-10-19 Data parallelism: array operations; shape operations; masking Problem set #9 (numerical linear algebra and cellular automata) 2010-10-21 Performance: algorithmic vs. actual efficiency; profiling; tuning Problem set #10 (profiling and speeding up small programs) 2010-10-26 Task automation: dependencies; dependency graphs; rules Problem set #11 (using Make/SCons for data provenance) 2010-10-28 Spreadsheets: data representation; unitary operations; aggregation; visualization Problem set #12 (data analysis using Excel or Calc); in-lab discussion of data visualization 2010-11-02 Databases: simple queries; filtering; aggregation; sub-queries Problem set #13 (data analysis using SQL) 2010-11-04 XML: elements vs. attributes; syntax; recursion Problem set #14 (extracting information from XML documents) 2010-11-09 Object-oriented programming: classes vs. instances; defining methods; constructors; data hiding 2010-11-11 Object-oriented programming (cont.): polymorphism; inheritance; operator overloading Problem set #15 (rewriting previous examples using classes) 2010-11-16 Information architecture: data modeling; entity-relationship analysis; class diagrams Problem set #16 (designing data representations); in-lab design of data representations for simple problems 2010-11-18 User interfaces: reactive programming; basic GUI components In-lab demonstration of GUI toolkit 2010-11-23 User interfaces (cont.): model-view-controller; design guidelines Problem set #17 (constructing GUI controllers for previous example programs) 2010-11-25 Web programming: HTTP request cycle; passing parameters; special characters; fetching data Problem set #18 (downloading and analyzing data) 2010-11-30 Web programming (cont.): providing data; maintaining state; security concerns Problem set #19 (a simple data server); in-lab guest lecture on information security 2010-12-02 Parallel programming: task and data decomposition; Amdahl's Law 2010-12-07 Parallel programming (cont.): map/reduce Problem set #20 (map/reduce parallelization of previous example) 2010-12-09 Teamwork: key empirical results in software engineering 2010-12-14 Teamwork (cont.): SCRUM; teamware In-lab demonstration of Trac and Google Code 2010-12-16 Review Read More ›

Hubs, Spokes, and Gonzo Programming Skills
Greg Wilson / 2010-07-06
In the aftermath of his (very successful) course on next-generation sequencing, Titus Brown has posted some thoughts on how to tie that material to Software Carpentry. To make a short story even shorter, he thinks Software Carpentry requires more knowledge on entry than most biologists have, and proposes a hub-and-spoke model as a solution. He also thinks we need a "Greg gets hit by a bus" plan. Please head over to his post and add comments there: April 2011 (the end date for the current round of funding) seems a long way off, but it isn't. Read More ›

The Violas of Programming
Greg Wilson / 2010-06-29
Orchestral musicians make jokes about violas and viola players. "What's a string quartet? A great violin player, a mediocre violin player, a bad violin player, and a cellist." Or, "What's the difference between a viola and a trampoline? You take off your shoes to jump on a trampoline." But violas are as essential as they are unglamorous: hardly anyone plays them as a lead or solo instrument, but string quartets just don't sound right without that third voice. I realized today that Python's sets [1] are sort of like the violas of programming. They come up quite naturally all over the place—just flip through any text on algorithms and count how often they're used. But they're rarely used alone, which makes it hard to come up with well-motivated examples when teaching them. Consider: "What vowels are present in this string?" Sure, but show me an application where that comes up: every one I can think of wants the frequency of the vowels, not just their presence or absence. "Find out whether these photos have some tags in common." Sure, but (a) intersection and union are built in, so it's a one-liner, and (b) you'd almost certainly use a dictionary with photo IDs for keys, and sets of tags as values. Anything with graphs: again, the nodes reachable from X are naturally stored as a set, but the graph as a whole will be a dictionary of nodes to reachable sets. I didn't worry about this too much in the Version 3 lecture on sets and dictionaries: I used a couple of completely abstract examples (including "which vowels"), and moved quickly into a discussion of how sets are stored and why their values have to be immutable. I'd like to do better in Version 4—I'd like every new tool or technique to be well motivated at the time of its introduction—but I'm damned if I can figure out how. [1] Disclaimer: As the author of the original Python Enhancement Proposal (PEP) on sets, I have certain biases. Read More ›

Last Episode on Sets and Dictionaries Posted First
Greg Wilson / 2010-06-28
For our lecture on sets & dictionaries, I decided to start at the end with the wrap-up example on nanotechnology. I'll double back tomorrow and start filling in the pieces that this example relies on. Read More ›

Four Down-What Next?
Greg Wilson / 2010-06-27
The regular expressions lecture has been promoted from the development area to the main site, where it joins the lectures on program design, databases, and spreadsheets. We have a lot of other topics still to cover: please let us know which you'd like to see first. Read More ›

Final Episode of Regular Expressions Lecture Now Online
Greg Wilson / 2010-06-25
The fifth and final episode of our lecture on regular expressions is now online. In this episode, we work through a new problem—extracting bibliographic citations from papers—and introduce a few new tools along the way. Please let us know what you think. Read More ›

SIAM News Article About Software Carpentry
Greg Wilson / 2010-06-24
The June 2010 issue of SIAM News (published by the Society for Industrial and Applied Mathematics) has an article I wrote about Software Carpentry called "Get More Done with Less Pain". Here's hoping the course can live up to that billing :-) Read More ›

Eric Lander on Genomics
Greg Wilson / 2010-06-24
This lecture by MIT's Eric Lander is a great overview of where genomics is, how we got here, and where we might be going. "Biology as an information science" is proving to be a very rich idea... Read More ›

Another Example of small-p Patterns
Greg Wilson / 2010-06-24
A couple of weeks ago, I asked whether people would find an exploration of ways to count things useful. The consensus was "yes", so I've started drawing up notes. While working on them, it occurred to me that "ways to persist things" might be just as interesting. Some of the approaches that could be discussed are: Save a list of numbers (one per line). Save a matrix using a header line to record dimensions (introducing the idea of metadata) and M values per line. Save a matrix without metadata (which requires the length of the second and subsequent lines to be checked). Save mixed atomic values (which gets hard when strings are allowed). Saving records whose structure is known in advance (the hardest of the bunch, since this is the point where aliasing appears). Saving in binary rather than ASCII. Creating self-describing data (save the format at the top, then the data). What other persistence patterns do you think would be worth explaining, and why? Read More ›

Software Developer: Audio and Digital Music
Greg Wilson / 2010-06-23
The Centre for Digital Music (C4DM) at Queen Mary, University of London, is seeking an experienced Software Developer with a background and knowledge in Audio and Digital Music, to work on a new EPSRC-funded project "Sustainable Software for Digital Music and Audio Research". The aim of this project is to provide a Service to support the development and use of software, data and metadata to enable high quality research in the Audio and Digital Music research community. Please see the job posting for more information. Read More ›

Software Carpentry in Three and a Half Minutes
Greg Wilson / 2010-06-23
I had a chance to give a quick pitch for Software Carpentry yesterday, which I've since recorded and posted to our web site. I hope it's a useful overview of what we're trying to do and why; I'd be very grateful for ideas about improving it. Note: a newer version of the pitch, titled "Software Carpentry in Ninety Seconds", is now available. Read More ›

Episode 4 on Regular Expressions
Greg Wilson / 2010-06-22
The fourth episode of our lecture on regular expressions is now online. This one explores shortcuts and escape sequences; we'd be grateful for feedback on whether it works a single episode, or ought to be split in two. Read More ›

Interview with Microsoft's David Rich
Greg Wilson / 2010-06-21
The latest in our series of interviews with Software Carpentry sponsors is with Microsoft's David Rich. Tell us a bit about your organization and its goals. Microsoft has very recently expanded the group working on HPC (clusters and similar) to address a broader "technical computing" solution set. This includes areas such as parallel programming, and making it easier for scientists and engineers to apply computational power to their work. Tell us a bit about the software your group uses. As a software vendor, we are concerned with both our own products and the myriad of applications that form the full solution set for our customers. Tell us a bit about what software your group develops. We build tools for others, but also use them ourselves! How are you hoping Software Carpentry will help? Today, there are few disciplines (any?) where some form of computer based analysis or at least reporting is not an important part of success. Few individuals have the time to become expert in computer science as well as their primary field. A course participant might be a grad student looking at their first large dataset or an experienced scientist in industry who wishes to improve their efficiency. How do you hope the course will help them? Professional chefs keep their knives sharp. Musicians take care of their instruments. We hope that software carpentry students learn to think of their software and computing resources as a primary tool and develop the habit of thinking about how to use that tool more efficiently. How will you tell what impact the course has had? All world problems solved by a huge increase in productivity! Or if not that, students who report increased efficiency in their work. Read More ›

A Little Bit of Theory
Greg Wilson / 2010-06-21
The third episode of our lecture on regular expressions is now on the web. This one describes how regular expressions are implemented using finite state machines: it's more theoretical (with a very small 't') than previous episodes, and we'd welcome feedback. Read More ›

Second Lecture on Regular Expressions
Greg Wilson / 2010-06-19
The second lecture on regular expressions has been posted to the development area. This one covers operators like '*' and '+', and introduces character sets. Please let us know what you think. Read More ›

People You Don't Want On Your Team
Greg Wilson / 2010-06-18
Study after study has shown that the biggest causes of software project failure are over-optimistic scheduling and unstable requirements. I think the only reason team dynamics isn't in the #1 spot is that it's hard for an outsider to judge after the fact. In my experience, though, how well people work together is a lot more important than how smart they are individually: everyone in the engineering group of the first start-up I joined was very good at their job, but when you put us together, our IQs somehow canceled out. I put together the profiles below to jump-start discussion in undergraduate project teams about the many ways in which someone can not contribute. If you have more to add, feel free to use the comment box :-) Anna knows more about every subject than everyone else on the team put together—at least, she thinks she does. No matter what you say, she'll correct you; no matter what you know, she knows better. Annas are pretty easy to spot: if you keep track in team meetings of how often people interrupt one another, her score is usually higher than everyone else's put together. Bao is a contrarian: no matter what anyone says, he'll take the opposite side. This is healthy in small doses, but when Bao does it, there's always another objection lurking behind the first half dozen. Caitlin has so little confidence in her own ability (despite her good grades) that she won't make any decision, no matter how small, until she has checked with someone else. Everything has to be spelled out in detail for her so that there's no possibility of her getting anything wrong. Frank believes that knowledge is power. He enjoys knowing things that other people don't—or to be more accurate, he enjoys it when people know he knows things they don't. Frank can actually make things work, but when asked how he did it, he'll grin and say, "Oh, I'm sure you can figure it out." Hediyeh is quiet. Very quiet. She never speaks up in meetings, even when she knows that what other people are saying is wrong. She might contribute to the mailing list, but she's very sensitive to criticism, and will always back down rather than defending her point of view. Hediyeh isn't a troublemaker, but rather a lost opportunity. Kenny is a hitchhiker. He has discovered that most people would rather shoulder some extra work than snitch, and he takes advantage of it at every turn. The frustrating thing is that he's so damn plausible when someone finally does confront him. "There have been mistakes on all sides," he says, or, "Well, I think you're nit-picking." The only way to deal with Kenny is to stand up to him: remember, if he's not doing his share, he's the bad guy, not you. Melissa would easily have made the varsity procrastination team if she'd bothered to show up to tryouts. She means well—she really does feel bad about letting people down—but somehow something always comes up, and her tasks are never finished until the last possible moment. Of course, that means that everyone who is depending on her can't do their work until after the last possible moment... Petra's favorite phrase is "why don't we". Why don't we write a GUI to help people edit the program's configuration files? Hey, why don't we invent our own little language for designing GUIs? Her energy and enthusiasm are hard to argue with, but argue you must. Otherwise, for every step you move forward, the project's goalposts will recede by two. This is called feature creep, and has ruined many projects that might otherwise have delivered something small, but useful. Raj is rude. "It's just the way I talk," he says, "If you can't hack it, maybe you should find another team." His favorite phrase is, "That's stupid," and he uses obscenity as casually as minor characters in Tarantino films. His only redeeming grace is that he can't dissemble in front of the instructor as well as Kenny, so he's easier to get rid of. Sergei is simply incompetent. He doesn't understand the problem, he hasn't bothered to master the tools and libraries he's supposed to be using, the code he checks in doesn't compile, and his thirty-second bug fixes introduce more problems than they solve. If he means well, try to re-partition the work so that he'll do less damage. If he doesn't, he should be treated like any other hitchhiker. Read More ›

Our First Few Exercises
Greg Wilson / 2010-06-18
Jon Pipitone has posted a few self-assessment exercises for the lecture episode on selecting data from a database. Please have a look and let us know what you think: would having little self-tests like this at the end of each episode help solidify your understanding of the material? Read More ›

For World Cup Fans (and Everyone Else)
Greg Wilson / 2010-06-18
Mike Knell has posted a funny piece titled "If Sports Got Reported Like Science" [accessed via Internet Archive]. Perhaps I should have put quotes around "funny"... Read More ›

First Half of Spreadsheets Lecture Now Online
Greg Wilson / 2010-06-18
The first four episodes of Jason Montojo's lecture on spreadsheets are now online: topics covered include basic calculations, aggregation, controlling the display, sorting, and debugging. Please have a look and let us know what you think. Read More ›

Let's Try That Again
Greg Wilson / 2010-06-17
Thanks to everyone for their feedback on our first attempt at a lecture on regular expressions. Our second attempt [no longer online] is now up for your viewing pleasure: it's about ten minutes long, and uses data scrubbing to motivate the use of REs. As always, we'd be grateful for suggestions. Read More ›

Is Live Coding Worth It?
Greg Wilson / 2010-06-16
I put together an introductory lecture on regular expressions yesterday, but I'm not happy with the results, and I'd like your feedback to help me make it better. One of the reasons we're moving from static HTML to videos is to show people how to program, rather than just the results of programming. As part of that, I want (or wanted—I may be changing my mind) to do all the programming examples live instead of showing snippets of code and snippets of output slide-style. Jon has already done that with the database lectures, but his "programs" were single-line SQL statements; this episode is the first time we've had longer examples. It was a pain to do: each time I made a typo inside a function definition, for example, I'd wind up with a dozen or more junk lines on screen (since the error wouldn't show up until the first time I called the function). I don't want to waste viewers' time on trivial errors, but there's no way in IDLE (or other IDEs) to erase the last few lines of interaction with the interpreter. I "solved" the problem by hitting return until the offending text had scrolled off the top of the screen, but it leaves a couple of ugly breaks in the video. If IDLE had a "clear screen" button, I might use that periodically, but (a) it doesn't, and (b) one of the reasons to code live is to keep recent context in view, which "clear screen" would erase. There's also an annoying IDLE-specific problem in this video: if I type too quickly—in particular, if I have saved program text in another window, switch there, copy it, and paste it into IDLE instead of typing it back in character by character—then syntax highlighting doesn't work properly. You'll see a few places where highlighting simply gave up partway through a keyword. It looks dumb. I could turn highlighting off, but I find it useful, and I think students do too. (Also, if it's on by default, our videos should show it on.) (And yes, I could switch to another IDE—I'm very fond of Wing 101—but we found that when Camtasia is in "record" mode, Wing's debugger won't launch. We think it's because recording slows my machine down just enough for Wing to timeout when forking the subprocess responsible for managing the user's program, but that's just a theory for now.) (And while we're on the subject of Camtasia: its recorder on Windows has a very handy "pause" button, so I can pause recording, switch windows, copy text to be pasted, switch back, re-start recording, and not have to do a ton of slice-and-dice editing after the fact. The Mac version doesn't have this feature yet, which is just as annoying as the fact that the two versions can't read each other's project files. It's still a very useful tool, though—the editing interface on Windows is a snap.) Next up: I've used slides for notes about the code, such as an explanation of what regular expressions are. (I didn't tick the right box when exporting them from PowerPoint, so they're 3/4 the size they should be in the video, but that's a small problem compared to the ones I'm wrestling with now.) I think the transition from code to slide and back is awkward, but I don't know how else to present material that in class I would draw on the whiteboard. However, using slides at all raises a question: why don't I just past my code snippets and their output into slides and show those? Do you, the viewer, really gain anything by watching me type? If so, why? Does the gain come from having a few lines of recent context above the example I'm typing in next? If so, that would be easy to replicate in a slide deck. I could even (easily) do double slides for each example, with the code on the first slide, and the code and output together on the second, so that you don't have to read too much at once. I'm still convinced that live-in-the-IDE is the right way to show people how to use a debugger, but this experience has soured me a bit on doing it for code. Your thoughts would be very welcome... Read More ›

A Voice from the Back of the Room
Greg Wilson / 2010-06-16
Mark Guzdial recently posted another thought-provoking piece, this one about how teachers are biased toward assessing a class's progress by their interactions with its brightest students, and how important it is to assess understanding systematically to find out what's actually going on. We're still wrestling with this in Software Carpentry: what can we put on the web to help students determine whether they have understood a particular topic and are ready to move on? Multiple choice quizzes can only be done once: if you don't get the right answers the first time, go back to review the material, then try the quiz again, all it's really testing is your ability to recall that the answer to #3 is 'c'. Small programming problems don't work either. Even in simple cases, there are always many right (or at least workable) answers. Novices are, almost by definition, not able to see whether their solution can be mapped onto the instructor's; all they see are the differences, and they come away thinking, "Oh, I didn't get that," rather than, "Huh, I guess there is more than one way to do it." This can be ameliorated a bit by providing sample input and output, so that they can test their programs, but that doesn't help them figure out why theirs is right or wrong. What else could we do? How else could we help you figure out whether you're ready to move on to the next topic, or ought to review this one again? Read More ›

Next-Gen Sequencing Course at MSU: It Went Well
Greg Wilson / 2010-06-15
Titus Brown has posted a summary of his course on next-generation sequencing data analysis for biologists with little or no previous training in computing. It went very well, and he's already trying to figure out how to do it again next year. If you can help, please give him a shout! Later: Titus and his students used Amazon's web services in the course, which earned them a post on Amazon's blog. Read More ›

Glossary and License Online
Greg Wilson / 2010-06-15
A glossary of terms used in this course is now online; corrections and suggestions for both additions and deletions would be very welcome. The licenses used for course content and code have also been posted—both are as open as we can make them. Read More ›

Interview: Mark Plumbley at Queen Mary University of London
Greg Wilson / 2010-06-14
Today's interview is with Prof. Mark Plumbley, of the Department of Electronic Engineering & Computer Science at Queen Mary University of London. Tell us a bit about your organization and its goals. We are a new project "Sustainable Software for Digital Music and Audio Research", funded by the UK Engineering and Physical Sciences Research Council (EPSRC), and based at the Centre for Digital Music (C4DM) at Queen Mary University of London. The aim of the project is to provide a Service to support the development and use of software, data and metadata to enable high quality research in the Audio and Digital Music research community. It's really about getting research results—including robust software to implement that research—out to the people who should be able to use them, and then keep it working. Tell us a bit about the software your group uses. We use some generic signal processing development tools like Matlab, as well as C/C++, Python, Prolog, etc. We also use some music-specific software, like Max/MSP, SuperCollider, and Ableton Live. Some people use Subversion for version control (even occasionally for writing joint research papers in LaTeX). Tell us a bit about what software your group develops. Some of the software we've already developed is for our own research, but we are increasinly making more and more available for others to use: see http://www.isophonics.net/ for a selection. Some of this includes: Sonic Visualiser: an application for viewing and analysing the contents of music audio files. SoundBite: an iTunes plugin to create great-sounding playlists. BeatFx: a suite of real-time musical audio effects which are automatically synchronized to the beat. Who are you hoping Software Carpentry will help? The new project includes software developers whose job it is to take flaky research software and turn it into robust software usable by other researchers. We hope that the Software Carpentry course will train up a next generation of PhD students and researchers that can create robust research software for themselves, so they don't have to rely on some other developers to come along later and clear up the mess. How do you hope the course will help them? We hope that the course will help researchers to think about robust software development from the outset of their research, not as an afterthought. Therefore when the paper is published, or the thesis is finished, the software that implements that research will be available for others to use in a robust and sustainable form. This should also benefit the researcher themselves, since people using their software will cite their research papers when acknowledging their software. How will you tell what impact the course has had? We would like to follow up the students who attended the course, and see how well they are producing well-written and well-documented research code, in comparison to what they would have done without it. Some of the impact will be more nebulous, about changing attitudes to the role of software in research. But if we get this right we should be pushing at an open door. "Research Impact" is the name of the game at the moment, and what better way to help your audio research have impact than make software for it available in as sustainable software which is usable by the people who need to use it! Read More ›

The Cowichan Problems
Greg Wilson / 2010-06-12
Back in the 1990s, as the first wave of euphoria about parallel computing was topping out, I had a crazy idea: why don't we actually try to measure, or at least compare, the usability of different parallel programming systems? I left the field before taking the idea very far, but with talk about clouds and GPUs growing louder by the day, I think the idea is worth revisiting. My proposal is below the cut; I'd be interested in feedback. Most programmers believe that parallel programming is harder than sequential programming. We are developing a suite of simple problems that can be used to assess the usability of different scientific programming languages (parallel or otherwise) that captures modularization and integration concerns usually ignored by performance-oriented benchmark suites. This paper motivates and describes the suite. Introduction In the late 1980s and early 1990s, many people (including the first author of this paper) were convinced that programming was about to "go parallel" in a big way. Those same predictions are being made again today: fans of GPUs, multicore CPUs, and cloud computing all claim that programmers can no longer afford to not think about parallelism, and that we will have to rethink programming from the ground up to take advantage of all that shiny new hardware. But how? And how will we know if we're on the right track? Twenty years ago, a survey by Bal et al. listed more than 300 parallel programming systems [Bal 1989]. Only a handful have survived, all of them very conservative extensions to, or libraries for, mainstream languages. If Erlang, Haskell, ML, F#, or other exotic languages are to gain ground, their advocates must convince a sceptical audience that they actually can solve real problems more easily than existing alternatives. We propose using a suite of simple problems to compare the usability of parallel programming systems. Competent programmers, fluent in a specific system, will implement solutions to these problems and report their experiences in terms of development time, code clarity, and runtime performance. These problems are individually very simple, but together cover a broad range of scientific domains. One significant innovation in this work is that we also require implementors to chain applications together, so that the output of one is an input to another. This will allow us to assess how well each language or library supports modularization and re-use. It is also more realistic than having each application run in isolation, since large scientific codes are usually composed of modules that have different "best" data layouts and parallelization strategies. History and Acknowledgments The first author originally developed this suite in the mid-1990s, inspired by discussions with R. Bruce Irvin (then at the University of Wisconsin) and Henri Bal (Vrije Universiteit, Amsterdam), and by the work of Feo and others on the Salishan Problems [Feo 1992]. In recognition of the latter, this suite is called the Cowichan Problems (pronounced COW-i-chun); like Salishan, the word is a Northwest Coast native place name. Early experiences were described in [Wilson 1994] and [Bal 1996]. in 2009, Andrew Borzenko and Cameron Gorrie (then in the Department of Computer Science at the University of Toronto) re-wrote the earlier C implementation in C++ using several modern parallel programming systems. Quantifying the Importance of Usability Amdahl's Law states that if tp is an algorithm's running time on p processors and σ is the algorithm's serial fraction (i.e., the fraction of the algorithm that cannot be parallelized), then: tp = σt1 + (1 — σ)t1/p The speedup on p processors is therefore: s(p) = t1/tp = t1/(σt1 + (1 — σ)t1/p) = 1/(σ + (1 — σ)/p) As p→∞, the speedup is bounded by 1/σ. This makes intuitive sense: if 10% of an algorithm can't be parallelized, then even with infinite hardware, the program can only be made to run 10 times faster. But software doesn't just happen: it has to be written and debugged, and these activities usually aren't made faster by parallel hardware. (In fact, most programmers believe that doing parallel programming is harder and slower than doing it sequentially.) If we let T and S be the equivalents of t and s over the program's whole lifetime, and D be its total development time, then: Tp = σT1 + (1 — σ)T1/p + D The achievable lifetime speedup is then: S∞ = 1/(σ + D/t1) The ratio of development time to run time therefore effectively increases the program's serial fraction. Unless parallelization time can be substantially reduced, parallel programming will only be cost-effective when: the application is trivially parallel (e.g., task-farming a rendering calculation); the expected total program runtime is very large (i.e., the program is a package used extensively in a field such as computational chemistry or vehicle crash simulation); or cost is not an obstacle (i.e., the customer is Disney or the NSA). The aim of this problem suite is to capture enough information about D to allow competing parallel programming systems to be ranked and compared. Previous Work Two comparative efforts in parallel programming are [Babb 1988] and [Feo 1992]. The first presented implementations of a simple numerical quadrature program in more than a dozen different parallel languages used on mid-1980s hardware. The second presented implementations of the Salishan Problems—Hamming number generation, isomer enumeration, skyline matrix reduction, and a simple discrete event simulator—in C*, Occam, Ada, and several dataflow and functional languages. Both books conveyed the flavor of the languages they describe, but neither compared languages or problem implementations directly. More recently, Lutz has explored quantitative comparison of the relative productivity of different programming languages [Prechelt 2000, Prechelt 2009]. Our aim is to support replication of his work for parallel programming. Methodology Our aim is to assess and compare the usability of parallel programming systems. Here, we discuss issues related to choosing the problems that make up our benchmark suite, how to assess particular implementations, and other purposes which implementations of these might serve. Recognizing that our problems are extremely simple, we refer to them as "toys". Criteria for Selection Our criteria for including toys are: Each toy should require no more than an hour to write and test in a well-supported sequential language. We feel that making individual problems more complicated will only discourage uptake. Correctness must be easy to verify. Toys whose output is easily visualized are therefore preferred, as are toys whose results are insensitive to floating-point effects. Speedup must similarly be easy to measure—while machine performance is not our focus, we are generally uninterested in parallel implementations that are slower than the sequential originals. At least some toys should not be "infinitely scalable". Many real-world applications are not, and this suite should reflect such limitations. At least some toys should require I/O, since this important aspect of real-world programming is often neglected by PPS designers. There should be some overlap in the toys' implementations, so that implementors can demonstrate how well their chosen systems take advantage of opportunities for code re-use. Together, the toys in the suite must exercise the parallel clichés discussed in the appendices. In particular, Toys should be specified by inputs and outputs rather than algorithmically, i.e., "sort N integers" rather than "parallelize quicksort", so that implementors can choose algorithms that are "natural" for their systems. Implementations that parallelize a grossly inefficient sequential algorithm should be criticized for doing so. Software Engineering Issues The "single algorithm per program" model of many benchmarks is not representative of real programs, which often contain several qualitatively different phases or modules. A full implementation of this suite will therefore have two parts. In the first, each toy will be implemented as a stand-alone program. In the second, toys will be chained together as shown below. This will test the ease with which heterogeneous parallelism can be mixed within a single program. It will also show how well the system supports code re-use and information hiding, which are crucial to the development of large programs. Where possible, chaining should execute toys concurrently. Some parallel programming systems impose extraneous constraints on programs, e.g., require all processes to participate in every barrier synchronization, or require the same executable to be loaded onto each processor. These constraints can limit the exploitation of potential concurrency. Permitting, but not requiring, concurrent execution of several toys should uncover such limitations. Sizing One crucial aspect of the specification of toys is the way in which the sizes of problems are determined. In a frozen model, the actual size of each problem and/or the number of processors available is compiled into each toy. A fully fluid implementation, by contrast, allows these sizes to be specified at run-time. If a system does not support fully-fluid implementations, it must at least use an intermediate model (which might be called slushy), in which the maximum size of individual problems is specified during compilation, but the actual size of a problem is only determined when the toy begins to execute. Assessing Usability One way to compare the usability of parallel programming systems would be to measure the performance achieved by an "average" programmer as a function of time on each toy. By analogy with Hockney's n1/2 measure [Hockney 1991], we could in principle find the value of p1/2—the programming time required to achieve half of a machine's peak performance for a particular combination of programming system and problem type. (Note that if performance was measured as a fraction of the figures quoted by manufacturers for their machines, it is unlikely that the halfway mark would ever be reached.) However, the number of implementors required is much greater than the number we can reasonably expect. Alternatively, we could use code metrics to compare implementations of various toys. This is also dubious, both because [El Emam 2001] has shown that most metrics do little more than measure lines of code, and because of the difficulty of comparing metric values between languages. The best we can hope for now is therefore qualitative consensus, e.g., to interview implementors about their experiences and ask other programmers who are familiar with the base languages used to read and comment on the parallel versions of the toys. We will also compare the length of code in parallel and sequential implementations, though we realize that this can exaggerate the impact of parallelization. Other Uses for Implementations This suite is intended for assessing the usability of parallel programming systems, but we envisage other uses. First, this suite will indicate what a mature parallel programming system should be able to support. In particular, we will ask implementors to describe how they debugged correctness and performance problems, and in particular what tools they used. (The lack of useful debugging tools is a chronic complaint among parallel programmers.) This suite should also be useful for education, since toys will be small enough to be understood quickly, and parallelizing them should be a suitable classroom exercise in a senior undergraduate course on parallel computing. The Problems The chained implementation of these toys executes in the order shown below. Because of choices in steps 1 and 3, there are four possible chained sequences. An integer matrix I with r rows and c columns is created either by using the Mandelbrot Set algorithm (mandel) or by filling locations with random values (randmat). The integer matrix I is shuffled in both dimensions (shuffle). Either invasion percolation (invperc) or histogram thresholding (thresh) is used to generate a Boolean mask B from I in which the (minimum) percentage of True locations is P. Like I, B has r rows and c columns. The Game of Life (life) is simulated for G generations, using B as an initial configuration. This step overwrites the Boolean matrix B. A vector of L (m,x,y) points is created using the integer matrix I and the Boolean matrix B. A series of convex hulls (hull) are obtained from the points produced above. The coordinates of the points from the previous step are normalized (norm). An L×L matrix A and an L-vector V are created using the normalized point locations from the previous step (outer). The matrix equation AX=V is solved using Gaussian elimination (gauss) and successive over-relaxation (sor) to generate two solution vectors Xgauss and Xsor. These two toys should execute concurrently if possible. The checking vectors Vgauss=AXgauss and Vsor=AXsor are calculated (product). These two toys should execute concurrently if possible. The norm-1 distance between Vgauss and Vsor is calculated (vecdiff). This measures the agreement between the solutions found by the two methods. The toys comprising the Cowichan Problems are sketched below. gauss: Gaussian Elimination This module solves a matrix equation AX=V for a dense, symmetric, diagonally dominant matrix A and an arbitrary vector non-zero V using explicit reduction. Input matrices are required to be symmetric and diagonally dominant in order to guarantee that there is a well-formed solution to the equation. Inputs matrix: the real matrix A. target: the real vector V. Outputs solution: a real vector containing the solution X. hull: Convex Hull Given a set of (x,y) points, this toy finds those that lie on the convex hull, removes them, then finds the convex hull of the remainder. This process continues until no points are left. The output is a list of points in the order in which they were removed, i.e., the first section of the list is the points from the outermost convex hull, the second section is the points that lay on the next hull, and so on. Inputs original: the vector of input points. Outputs ordered: the vector of output points (a permutation of the input). invperc: Invasion Percolation Invasion percolation models the displacement of one fluid (such as oil) by another (such as water) in fractured rock. In two dimensions, this can be simulated by generating an N×N grid of random numbers in the range [1..R], and then marking the center cell of the grid as filled. In each iteration, one examines the four orthogonal neighbors of all filled cells, chooses the one with the lowest value (i.e., the one with the least resistance to filling), and fills it in. Filling begins at the central cell of the matrix (rounding down for even-sized axes). The simulation continues until some fixed percentage of cells have been filled, or until some other condition (such as the presence of trapped regions) is achieved. The fractal structure of the filled and unfilled regions is then examined to determine how much oil could be recovered. The naïve way to implement this is to repeatedly scan the array. A much faster technique is to maintain a priority queue of unfilled cells which are neighbors of filled cells. This latter technique is similar to the list-based methods used in some cellular automaton programs, and is very difficult to parallelize effectively. Inputs matrix: an integer matrix. nfill: the number of points to fill. Outputs mask: a Boolean matrix filled with True (showing a filled cell) or False (showing an unfilled cell). life: Game of Life This module simulates the evolution of Conway's Game of Life, a two-dimensional cellular automaton. At each time step, this module must count the number of live (True) cells in the 8-neighborhood of each cell using circular boundary conditions. If a cell has 3 live neighbors, or has 2 live neighbors and is already alive, it is alive in the next generation. In any other situation, the cell becomes, or stays, dead. Inputs matrix: a Boolean matrix representing the Life world. numgen: the number of generations to simulate. Outputs matrix: a Boolean matrix representing the world after simulation. mandel: Mandelbrot Set Generation This module generates the Mandelbrot Set for a specified region of the complex plane. Given initial coordinates (x0, y0), the Mandelbrot Set is generated by iterating the equation: x' = x2 — y2 + y0 y' = 2xy + x0 until either an iteration limit is reached, or the values diverge. The iteration limit used in this module is 150 steps; divergence occurs when x2 + y2 becomes 2.0 or greater. The integer value of each element of the matrix is the number of iterations done. The values produced should depend only on the size of the matrix and the seed, not on the number of processors or threads used. Inputs nrows, ncols: the number of rows and columns in the output matrix. x0, y0: the real coordinates of the lower-left corner of the region to be generated. dx, dy: the extent of the region to be generated. Outputs matrix: an integer matrix containing the iteration count at each point in the region. norm: Point Location Normalization This module normalizes point coordinates so that all points lie within the unit square [0..1]×[0..1]. If xmin and xmax are the minimum and maximum x coordinate values in the input vector, then the normalization equation is xi' = (xi — xmin)/(xmax — xmin) y coordinates are normalized in the same fashion. Inputs points: a vector of point locations. Outputs points: a vector of normalized point locations. outer: Outer Product This module turns a vector containing point positions into a dense, symmetric, diagonally dominant N×N matrix by calculating the distances between each pair of points. It also constructs a real vector whose values are the distance of each point from the origin. Each matrix element Mi,j such that i ≠ j is given the value di,j, the Euclidean distance between point i and point j. The diagonal values Mi,i are then set to N times the maximum off-diagonal value to ensure that the matrix is diagonally dominant. The value of the vector element vi is set to the distance of point i from the origin, which is given by √(xi2 + yi2). Inputs points: a vector of (x,y) points, where x and y are the point's position. Outputs matrix: a real matrix, whose values are filled with inter-point distances. vector: a real vector, whose values are filled with origin-to-point distances. product: Matrix-Vector Product Given a matrix A, a vector V, and an assumed solution X to the equation AX=V, this module calculates the actual product AX=V', and then finds the magnitude of the error. Inputs matrix: the real matrix A. actual: the real vector V. candidate: the real vector X. Outputs e: the largest absolute value in the element-wise difference of V and V'. randmat: Random Number Generation This module fills a matrix with pseudo-random integers. The values produced must depend only on the size of the matrix and the seed, not on the number of processors or threads used. Inputs nrows, ncols: the number of rows and columns in the output matrix. s: the random number generation seed. Outputs matrix: an integer matrix filled with random values. shuffle: Two-Dimensional Shuffle This module divides the values in a rectangular two-dimensional integer matrix into two halves along one axis, shuffles them, and then repeats this operation along the other axis. Values in odd-numbered locations are collected at the low end of each row or column, while values in even-numbered locations are moved to the high end. An example transformation is: a b c d a c b d e f g h → i k j l i j k l e g f h Note that how an array element is moved depends only on whether its location index is odd or even, not on whether its value is odd or even. Inputs matrix: an integer matrix. Outputs matrix: an integer matrix containing shuffled values. sor: Successive Over-Relaxation This module solves a matrix equation AX=V for a dense, symmetric, diagonally dominant matrix A and an arbitrary vector non-zero V using successive over-relaxation. Inputs matrix: the square real matrix A. target: the real vector V. tolerance: the solution tolerance, e.g., 10-6. Outputs solution: a real vector containing the solution X. thresh: Histogram Thresholding This module performs histogram thresholding on an image. Given an integer image I and a target percentage p, it constructs a binary image B such that Bi,j is set if no more than p percent of the pixels in I are brighter than Ii,j. The general idea is that an image's histogram should have two peaks, one centered around the average foreground intensity, and one centered around the average background intensity. This program attempts to set a threshold between the two peaks in the histogram and select the pixels above the threshold. Inputs matrix: the integer matrix to be thresholded. percent: the minimum percentage of cells to retain. Outputs mask: a Boolean matrix whose values are True where the value of a cell in the input image is above the threshold, and False otherwise. vecdiff: Vector Difference This module finds the maximum absolute elementwise difference between two vectors of real numbers. Inputs left: the first vector. right: the second vector. Outputs maxdiff: the largest absolute difference between any two corresponding vector elements. winnow: Weighted Point Selection This module converts a matrix of integers to a vector of points represented as x and y coordinates. Each location where mask is True becomes a candidate point, with a weight equal to the integer value in matrix at that location and x and y coordinates equal to its row and column indices. These candidate points are then sorted into increasing order by weight, and N evenly-spaced points selected to create the result vector. Inputs matrix: an integer matrix whose values are used as weights. mask: a Boolean matrix of the same size showing which points are eligible for consideration. nelts: the number of points to select. Outputs points: an N-vector of (x,y) points. Other Issues Input and Output I/O is an important part of programming, but is often treated as being of secondary importance by language designers. This suite requires all stand-alone toys to read input values from files, and write results back; the chained version must be able to checkpoint intermediate results between toys. Finally, implementors are strongly encouraged to include some means of visualizing the output or evolution of individual toys. The file formats used in the Cowichan Problems are specified in an appendix. Files are required to be human-readable (i.e., to use ASCII text). Implementations may also include I/O using binary (non-ASCII) files in whatever file formats are convenient. This will allow programmers to demonstrate the "natural" I/O capabilities of particular systems which most probably be used for checkpointing intermediate results in real programs. Reproducibility Reproducibility is an important issue for parallel programming systems. While constraining the order of operations in a parallel system to guarantee reproducibility makes programs in that system easier to debug, it can also reduce the expressiveness or performance of the system. In this problem set, irreproducibility can appear in two forms: numerical and algorithmic. The first arises in toys such as gauss and sor, which use floating-point numbers. Precisely how round-off errors occur in these calculations can depend on the distribution of work among processors, or the order in which those processors carry out particular operations. Irreproducibility also arises in toys which only use exact numbers, such as invperc and randmat. In the former, the percolation region is grown by repeatedly filling the lowest-valued cell on its perimeter. If several cells have this value, implementations may choose one arbitrarily. Thus, different implementations may produce very different fractal shapes. In the case of random number generation, the simplest thing to do is to run the same generator independently on each processor, although the values in the resulting matrix then depend on the number of processors used. Bibliography [Babb 1988] Robert G. Babb: Programming Parallel Processors. Addison-Wesley, 1988. [Bal 1989]: Henri E. Bal, J. G. Steiner, and A. S. Tanenbaum: "Programming Languages for Distributed Computing Systems". ACM Computing Surveys, 21(3), 1989. [Bal 1996] Henri E. Bal and Gregory V. Wilson: "Using the Cowichan Problems to Assess the Usability of Orca". IEEE Parallel & Distributed Technology, 4(3), 1996. [Bennett 1990] John K. Bennett, John B. Carter, and Willy Zwaenepoel: "Munin: Distributed Shared Memory Based on Type-Specific Memory Coherence". Proc. 1990 Conference on Principles and Practice of Parallel Programming, 1990. [El Emam 2001] Khaled El Emam, Saida Benlarbi, Nishith Goel, and Shesh N. Rai: "The Confounding Effect of Class Size on the Validity of Object-Oriented Metrics". IEEE Transasctions on Software Engineering, 27(7), 2001. [Feo 1992] John T. Feo: A Comparative Study of Parallel Programming Languages: The Salishan Problems. North-Holland, 1992. [Hockney 1991] Roger Hockney: "Performance Parameters and Benchmarking of Supercomputers". Parallel Computing, 17, 1991. [Prechelt 2000] Lutz Prechelt: "An Empirical Comparison of Seven Programming Languages". IEEE Computer, 33(10), 2000. [Prechelt 2000] Lutz Prechelt: "Plat_Forms: A Web Development Platform Comparison by an Exploratory Experiment Searching for Emergent Platform Properties". IEEE Trans. Software Engineering, 2009. [Wilson 1994] Gregory V. Wilson: "Assessing the Usability of Parallel Programming Systems: The Cowichan Problems". Proc. IFIP Working Conference on Programming Environments for Massively Parallel Distributed Systems, Birkhäuser, 1994. File Formats Vector A vector file begins with a single positive integer, which specifies the number of elements in the vector. This is then followed by N rows, each containing a single value. Matrix A file containing a matrix begins with a pair of positive integers, which specify the number of rows and columns in the matrix respectively. (Note that this means the first number is the Y extent, and the second number is the X extent.) Elements of the vector or matrix then appear one per line in order of increasing index, i.e., the element at (1,1) appears first, then the element at (1,2), and so on up to (1,N), which is followed by the element at index (2,1). Basic Types Vectors and matrices may contain Booleans, integers, or reals; vectors may also contain (x,y) points. The two Boolean values are represented by upper-case 'T' and 'F'. Integers and reals are represented in the usual way; points are represented as two numbers separated by a single space character. Parallel Operations This list details some operations which are supported by many parallel programming systems. The toys described above provide opportunities for exercising many of these, and implementors are encouraged to phrase discussion of their work in terms of these operations where appropriate. elementwise operations on arrays (unary, binary, and scalar promotion) cumulative operations on arrays (reduction and parallel prefix) array re-shaping and re-sizing (e.g., sub-array extraction) partial (conditionally masked) versions of all of the above regular data motion (e.g., circular and planar shifting) irregular data motion (e.g., 1-to-1, 1-to-many, and many-to-1 permutation) scalar and non-scalar value location (e.g., finding a scalar or record value in an array) differential local addressing (i.e., subscripting an array with an array) full or partial replication of shared read-only values full or partial replication of shared read-mostly values with automatic consistency management structures with rotating ownership, suitable for migration producer-consumer structures partitionable structures pre-determined run-time re-partitioning (i.e., re-distributing an array) dynamic re-partitioning (e.g., for load balancing) committed mutual exclusion (e.g., waiting on a semaphore) uncommitted mutual exclusion (e.g., lock or fail) barrier synchronization multiple concurrent barriers used by non-overlapping groups of processes fetch-and-add, and other fetch-and-operate functions pre-scheduled receipt of messages of known composition variable-length message receipt message selection by type message selection by source message selection by contents guarded message selection (e.g., CSP's alt construct) broadcast and partial broadcast split-phase (non-blocking) operations data marshalling and unmarshalling of "flat" structures (e.g., arrays of scalars) data marshalling and unmarshalling of nested or linked structures heterogeneous parallelism (i.e., running different applications concurrently in one machine) pipelining distributed linked structures in systems without physically-shared memory indexing of distributed shared arrays collective I/O uniform-size interleaved I/O heterogeneous-size interleaved I/O independent I/O operations on a single shared file Memory Reference Patterns These are inspired by the categorization originally used in the Munin system [Bennett 1990]. Again, implementors are encouraged to phrase discussion of their work in these terms where appropriate. Write-once: variables which are assigned a value once, and only read thereafter. These can be supported through replication. Write-many: variables which are written and read many times. If several processes write to the variable concurrently, they typically write to different portions of it. Producer-consumer: variables which are written to by one or more objects, and read by one or more other objects. Entries in a shared queue, or a mailbox used for communication between two processes, are examples. Private: variables which are potentially shared, but actually private. The interior points of a mesh in a program which uses geometric decomposition fall into this category, while boundary points belong to the previous class. Migratory: variables which are read and written many times in succession by a single process before ownership passes to another process. The structures representing particles in an N-body simulation are the clearest example of this category. Result: Accumulators which are updated many times by different processes, but thereafter read. Read-mostly: Any variable which is read much more often than it is written to. The best score so far in a search problem is a typical instance of this class: while any process might update it, in practice processes read its value much more often than they change it. Synchronization: Variables used to force explicit synchronization in a program, such as semaphores and locks. These are typically ignored for long periods, and then the subject of intense bursts of access activity. General read-write: Any variable which cannot be put in one of the above categories. Read More ›

Thought for the Day
Greg Wilson / 2010-06-11
We talked of the education of children; and I asked him what he thought was the best way to teach them first. Johnson: "Sir, it is no matter what you teach them first, any more than what leg you shall put into your breeches first. Sir, you may stand disputing which is best to put in first, but in the mean time your breech is bare. Sir, while you are considering which of two things you should teach your child first, another boy has learnt them both." — Boswell, Life of Samuel Johnson Read More ›

Interview: David Jackson at the UK Met Office
Greg Wilson / 2010-06-11
The second of today's sponsor interviews is with David Jackson from the UK Met Office. Tell us a bit about your organization and its goals. You can find out more about us at http://www.metoffice.gov.uk, but broadly speaking, we: predict the weather for tomorrow, next week, next season and beyond; are a significant contributor to the global understanding of climate change; are leading researchers of weather science; provide forecasts for sporting events such as Wimbledon and Open Golf; stand shoulder to shoulder with Armed Forces around the world; help keep roads open and planes flying; inform the decisions and policies of businesses and governments across the world; help the National Health Service provide preventative healthcare. Our group is focussed on research. Tell us a bit about the software your group uses. We have a lot of home-grown scientific models and related codes around weather and climate prediction. The Unified Model is at the heart of this the code base. Much of the compiled code is in FORTRAN. We use Perl and shell scripting to bind things together, and also some Python. Our MASS storage system is developed in JAVA. We use IDL for data processing and R for statistical analysis. There is a lot of user code that sits on top of these base packages and a lot of bespoke development. We use Subversion for version control and Trac for issue tracking. Tell us a bit about what software your group develops. The Unified model and related software is used internationally by a number of groups. There is a whole range of software development from internationally supported software, like the UM to code scientists write for their own purposes, with other codes supported within the Met Office. Who are you hoping Software Carpentry will help? A scientist who does some software development for their own needs. A scientific programmer who does not have a software engineering background. In general, people who can code but would benefit from not-too-heavy software engineering principles. How do you hope the course will help them? The benefits of software engineering principles and disciples, such as testing and configuration management. Good code structure practice. How to make more code shareable and supportable. How will you tell what impact the course has had? Better disciple. More shared code. Better tested. More confidence. Read More ›

Interview: SHARCNET's Hugh Couchman
Greg Wilson / 2010-06-11
The first of our two sponsor interviews today is with SHARCNET's Prof. Hugh Couchman. Tell us a bit about your organization and its goals. SHARCNET provides high-performance computing (HPC) resources and services to researchers in Ontario and Canada. Our goal is to promote and facilitate the use of HPC techniques among researchers in all fields and to train a new generation of computationally-skilled individuals for the benefit of competitive research, the economy and society in Canada. Tell us a bit about the software your group uses. As a part of the National Platform provided by Compute Canada we support users from all disciplines needing HPC and, consequently, a huge range of software and tools that are used in the HPC environment. The software ranges over every conceivable type of commercial, public domain and home-written software. Who are you hoping Software Carpentry will help? Primarily students. SHARCNET provides advanced training in the use of HPC specifically, but does not provide extensive instruction in the basic techniques of good code design and project management including code verification, debugging, profiling, data handling etc. Many students do not have the basic skills to undertake computational projects in the most effective way let alone be well placed to tackle complexities associated with parallel applications. How will you tell what impact the course has had? I have found that most students who have expressed interest in working with me and an interst in computational techniques fairly readily pick up on the best practices when exposed to them and so there would be a benefit simply from receiving training in these methods/techniques. One would hope that SHARCNET staff might similarly notice improvements in the use of coding practices, version control etc. Read More ›

Counting Things
Greg Wilson / 2010-06-11
I re-read Robins et al's "Learning and Teaching Programming: A Review and Discussion" [1] and Eccles and Stacey's "Understanding the Parallel Programming" [2] this morning. One of the key takeaways in both is that novice programmers tend to classify problems according to problem domain or language features, while experienced programmers tend to classify them according to solution strategies. Where a beginner sees a problem in molecular biology or statistics, for example, an expert sees an opportunity to apply divide-and-conquer or accumulate-and-adjust. Knowing that, it's tempting to try a strategies-first approach when teaching programming. That usually doesn't work, though, because the strategies are shaped by what computers can do, and if you don't know that, those strategies seem to appear like rabbits out of a magician's hat. The idea is still appealing, so as part of the lecture on program design (or perhaps as a separate lecture), I'd like to explore strategies for solving variations on a simple problem as a way to sum up previous material on programming language features like loops and conditionals. The problem I have in mind can be summed up in one word: counting. The variations are: How many things do I have? (Use len(list).) How many things do I have that satisfy some condition? (Conditional inside loop.) What if my things are in a file instead of in memory? (Read the file into a list, then count.) What if I have so many things that the won't fit into memory? (Read and test one at a time, streamwise.) What if I'm counting the number of items of each of a set of fixed types? (Use an array of accumulators.) What if I don't know the types in advance? (Use a dictionary—which is also the better answer for #5 above.) What if my classification scheme is likely to change frequently? (Pass the classifying function into the counting function.) What if I only want things up to some stop sign? (Count to sentinel and break.) What if my test depends on context, e.g., number of times A comes after B, or number of X's inside a region delimited by Y's? (Use flags to turning counting on and off.) What if my things are in a tree rather than a list? (Recursion.) What if my things are in a graph that might contain cycles? (Graph-traversal algorithms.) What if I might have all of the above? (Iterator and visitor design patterns.) What if my things are in a database? (Select, filter, and aggregate.) What if I want to calculate an average? (Any of the above plus a count of items.) What if I want to calculate a median? (Requires different strategies entirely because it depends on global knowledge.) I don't think students could absorb graph traversal, design patterns, and select/filter/aggregate as asides in a single lecture. (It's already clear from feedback, by the way, that introducing big-oh as an aside in the lecture on tuning programs is asking too much—we'll fix that up soon.) What I'm wondering is, if we've introduced those topics earlier, would a single lecture that showed how changes in the problem determine solution strategy be useful as a summing up? Your feedback would be greatly appreciated. [1] Anthony Robins, Janet Rountree, and Nathan Rountree: "Learning and Teaching Programming: A Review and Discussion". Computer Science Education, 13(2), 2003. [2] Ryan Eccles and Deborah A. Stacey: "Understanding the Parallel Programmer". Proc. 20th International Symposium on High-Performance Computing in an Advanced Collaborative Environment (HPCS'06), 2006. Read More ›

The Big Picture (version 3)
Greg Wilson / 2010-06-10
Version 3 of the concept map describing what we mean by "computational thinking" incorporates feedback from several people—thanks for sending it. Please let us know what you think... Read More ›

Our Lecture on Databases is Now Online
Greg Wilson / 2010-06-10
Our introduction to the basics of SQL is now online: the episodes in this series include: Introduction Selecting Data Filtering Sorting Aggregation Joining Tables Missing Data Nested Queries Please let us know what you think. And please also let us know what topic from the course outline you'd like us to tackle next. Read More ›

Interview: Jim Graham of Scimatic
Greg Wilson / 2010-06-10
Over the next few weeks, I'll be interviewing the people whose sponsorship has made this course possible. First up is Scimatic's Jim Graham. Tell us a bit about your organization and its goals. Scimatic (www.scimatic.com) is a company that writes software for scientists. Historically, we've written software on contract for manufacturers of scientific equipment. We have also written our own software product, Samples, to help scientists keep their experiments organized. We're available to help scientists by writing software. Tell us a bit about the software your group uses. We develop software mostly using Microsoft's development tools. Almost all of the projects that we are working on currently are written in C# using Microsoft's Visual Studio development environment. C# is a pretty-rapidly evolving language that is assimilating nice features from dynamic languages. We've also done projects in C++, Java and Python. For tooling, we use: Version Control: hosted Subversion. We have some interest in DVCS systems, but until there are first-class integrations to visual studio and Windows, we'll stick with SVN. Project Management: Trac. It has per-project management, tickets, issues, project wiki, email notification and a tight integration to SVN. Pretty much if it's not in Trac, it doesn't exist. Unit Testing: NUnit, the .NET port of JUnit. We have tried MSTest (the unit testing framework from Microsoft), but it was too verbose. Continuous Integration: CruiseControl.NET: It's good enough to run builds on every checkin. We run compilation, unit tests, generate documentation, and build the installers with one run of the build. Scripting: I use Perl because I'm old school. We'll probably convert any "mission-critical" scripts to Python so everyone can own them. Tell us a bit about what software your group develops. We build products for scientists. Traditionally we've worked on contract, and our client decides how they want to distribute the software. In addition, we are developing our own product that we will sell directly to scientific customers. How do you hope the course will help them—that is, what big ideas and what specific skills do you what students to learn from it? I hope that scientists realise that almost every project they work on will require some type of software, and that they will need to know how it works. After that, I hope they learn to write reusable, compartmentalized software stored in version control system that they can access six months after their paper is published :) My experience has been that every piece of software that I thought was a "one-off" gets reused—best to write it as well as possible. I hope they also get a sense of how large and complicated software projects can be, and who to call for help if they need it. Read More ›

Reorganizing Content
Greg Wilson / 2010-06-09
In anticipation of posting more lecture content, we have reorganized the pages on this site. If you look at the side menu, you'll see that "About/" and "Program Design/" now have sub-pages; we'll organize each lecture the same way to make it easier to follow along. And if you're having trouble posting or viewing comments, please send us email— we think we've sorted it out, but hey, it's software... Read More ›

Episode 11: Making It Fast
Greg Wilson / 2010-06-08
Today's episode is the last in the series on program design using invasion percolation as an example. In it, we look at how to make the program we've developed so far run zillions of times faster. It's the longest episode so far, and probably the most difficult, but it makes two very important points: The biggest performance gains come from improvements to algorithms and data structures, not from tweaking loops and rearranging conditionals. If you want to write a fast program, you should start by writing a simple one, test it, and then rewrite it piece by piece, re-testing as you go along. Here's the complete list of episodes: Introduction The Grid Aliasing Randomness Finding Neighbors Resolving Ties Assembling the Program Bugs Refactoring Testing Tuning Read More ›

The Big Picture (version 2)
Greg Wilson / 2010-06-07
Thanks to everyone who gave us feedback on the first version of a concept map for Software Carpentry. The second version is shown below, along with a list of terms that ought to be on it, but aren't. Where would you place them, and how would they connect to each other and to the concepts that are already linked? Read More ›

Testing Invasion Percolation
Greg Wilson / 2010-06-07
Finally, after all that refactoring, we get to test our invasion percolation program—although in fact, we spend more time discussing how to structure testing code than we do actually testing. Introduction The Grid Aliasing Randomness Finding Neighbors Resolving Ties Assembling the Program Bugs Refactoring Testing Read More ›

Refactoring Invasion Percolation
Greg Wilson / 2010-06-04
Today's episode (the ninth in the lecture about designing an invasion percolation program) was supposed to be about testing, but wound up being about refactoring instead—which probably won't surprise anyone who's done any significant amount of programming. We'd be grateful for feedback on how easy or hard it is to follow along as the program is rearranged. Introduction The Grid Aliasing Randomness Finding Neighbors Resolving Ties Assembling the Program Bugs Refactoring Read More ›

Concept Map
Greg Wilson / 2010-06-04
While re-designing the Software Carpentry course, I have realized that we rely over and over again on some underyling concepts that are hard to capture as lecture topics. I think these concepts are the heart of any useful definition of "computational thinking". The diagram below is a first attempt to capture what these concepts are and how they're related. (The list below the diagram summarizes the relationships in textual form for easier reading.) Suggestions for improvements would be very welcome... A model is implemented as a data structure A model must account for missing or incomplete information A data format conforms to a model Instructions for a computer are abstracted as an algorithm An algorithm operates on a model Choice of algorithm determines machine performance Data is abstracted as a model Data almost always has missing or incomplete information An archive stores data An archive conforms to a data format An archive is parsed to create a data structure An abstract machine is implemented by a library A program is a kind of data A program operates on a data structure A program conforms to a specification A data structure is persisted to create an archive Data structure choice helps determine machine performance A specification can be defined by testing Testing checks a program Testing requires a specification Tools are programs Tools support use of software development techniques Software development techniques support use of tools Modularization is used to structure programs Modularization aids testing Modularization is used to create a library Hardware is represented to programmers by an abstract machine Hardware architecture changes machine performance Machine performance can be traded off against human performance A library can extend an abstract machine Experience is captured in a library Human performance is determined by software development techniques Human performance can be traded off against machine performance Human performance depends on experience Read More ›

If You Want to Look Ahead...
Greg Wilson / 2010-06-03
Our previous post said that there was a bug in the first complete version of our invasion percolation program, and offered a Software Carpentry mug to the first person to find it. If you're the sort who prefers to look up the answers in the back, we've just posted the solution as episode 8 in the series on program design. For reference, here's the lecture so far: Introduction The Grid Aliasing Randomness Finding Neighbors Resolving Ties Assembling the Program Bugs Read More ›

Assembling a Program
Greg Wilson / 2010-06-03
The seventh episode of our lecture on program design is now online. In this one, we actually assemble a complete version of the program step by step using the pieces designed earlier. The final program still has one nasty bug, though, and I'll award a Software Carpentry mug to the first person who can find it (source code attached to this post). For reference, the first six episodes are listed below, along with another link to this one. Introduction The Grid Aliasing Randomness Finding Neighbors Resolving Ties Assembling the Program #!/usr/bin/env python '''Invasion Percolation Simulation usage: invperc.py grid_size value_range random_seed grid_size: the width/height of the grid must be a positive odd integer value_range: number of distinct values in grid must be a positive integer values will be selected randomly in 1..value_range random_seed: random number generation seed must be a positive integer ''' import sys, random FILLED = -1 # Used to mark filled cells. def fail(msg): print >> sys.stderr, msg sys.exit(1) def create_random_grid(N, Z): '''Return an NxN grid of random values in 1..Z. Assumes the RNG has already been seeded.''' assert N > 0, 'Grid size must be positive' assert N%2 == 1, 'Grid size must be odd' grid = [] for x in range(N): grid.append([]) for y in range(N): grid[-1].append(random.randint(1, Z)) return grid def mark_filled(grid, x, y): '''Mark a grid cell as filled.''' assert 0 <= x < len(grid), \ 'X coordinate out of range (%d vs %d)' % \ (x, len(grid)) assert 0 <= y < len(grid), \ 'Y coordinate out of range (%d vs %d)' % \ (y, len(grid)) grid[x][y] = FILLED def is_candidate(grid, x, y): '''Is a cell a candidate for filling?''' N = len(grid) return (x > 0) and (grid[x-1][y] == FILLED) \ or (x < N-1) and (grid[x+1][y] == FILLED) \ or (y > 0) and (grid[x][y-1] == FILLED) \ or (y < N-1) and (grid[x][y+1] == FILLED) def find_candidates(grid): '''Find low-valued neighbor cells.''' N = len(grid) min_val = sys.maxint min_set = set() for x in range(N): for y in range(N): if is_candidate(grid, x, y): if grid[x][y] == min_val: min_set.add((x, y)) elif grid[x][y] < min_val: min_val = grid[x][y] min_set = set([(x, y)]) return min_set def fill_grid(grid): '''Fill an NxN grid until filled region hits boundary.''' N, num_filled = len(grid), 0 while True: candidates = find_candidates(grid) assert candidates, 'No fillable cells found!' x, y = random.choice(list(candidates)) mark_filled(grid, x, y) num_filled += 1 if x in (0, N-1) or y in (0, N-1): break return num_filled # Main driver. if __name__ == '__main__': # Get parameters from command line. arguments = sys.argv[1:] try: grid_size = int(arguments[0]) value_range = int(arguments[1]) random_seed = int(arguments[2]) except IndexError: fail('Expected 3 arguments, got %d' % len(arguments)) except ValueError: fail('Expected integer arguments, got %s' % str(arguments)) # Run simulation. random.seed(random_seed) grid = create_random_grid(grid_size, value_range) mark_filled(grid, grid_size/2, grid_size/2) num_filled_cells = fill_grid(grid) + 1 print '%d cells filled' % num_filled_cells Read More ›

Who Reports On The Other 97 Per Cent?
Greg Wilson / 2010-06-01
The BBC has posted a nice graphic of the latest Top 500 list of supercomputers around the world. It's pretty impressive, particularly if you're a Linux fan (check out the display by operating system). However, nobody would think that a list of the world's 500 richest people told them much about the state of the economy; I'd be much more interested in statistics on how much computing power is employed by the 97% or so of scientists who don't use a supercomputer. Read More ›

Program Design: the Second Instalment
Greg Wilson / 2010-06-01
Having recorded and posted four episodes of a lecture on program design yesterday, I've managed another two today (highlighted below): Introduction The Grid Aliasing Randomness Finding Neighbors Resolving Ties The remaining episodes are much longer, and will show how to assemble the program from the pieces seen so far, how to refactor the result to make it more testable, how to test it, how to speed it up, and how to build a GUI for it. Slides for the first two of these episodes are done; I hope to have everything up by the end of this week or early next. As always, feedback is greatly appreciated. Read More ›

Program Design: the First Third
Greg Wilson / 2010-05-31
I spent last week making and re-making slides for a lecture on program design, and managed to record the first third of them today—the rest should follow tomorrow and Wednesday. We've taken on board your comments about pace and layout, and would be grateful for more feedback—please go ahead and post comments on this post to tell us what you think. Introduction The Grid Aliasing Randomness Read More ›

Jim Graham on Reproducibility
Greg Wilson / 2010-05-29
In response to Titus Brown's not-really-joking spoof of how most scientists manage their data, Scimatic's Jim Graham has asked, "What is reproducibility, anyway?" His main point is that if I re-run your code on your data, and get the same result that you got, I haven't actually added to the sum total of human knowledge. If I get an equivalent answer using different code, on the other hand, then our confidence in the answer is usefully increased. What do you think? Read More ›

Teaching databases by example
Jon Pipitone / 2010-05-28
Over the last two weeks I've been spending most of my Software Carpentry time working on the database unit. They began as a fairly straight-forward translation of the Software Carpentry 3.0 lecture notes with only a few changes to the sequencing of the topics. The plan for the unit is fairly simple: each screencast introducing only one or two new topics, and builds on the previous screencasts with as few as possible forward references to later topics. The topics themselves are really just the different language features (SELECT, WHERE, JOIN, etc..) presented as tasks you might want to perform, and illustrated by working with toy data. It might be clearer if I present the topic plan we had in mind: An Introduction to Databases (What is a database and why/when would you use one?) Getting data from a table, filtering, and sorting it. (SELECT, WHERE, and ORDER BY) Aggregating and grouping results (GROUP BY, and aggregation functions: SUM, MAX, etc..) Dealing with empty or missing data (NULL) Combining data from multiple tables (inner JOINs) Advanced queries (subqueries) An overview other features (e.g. HAVING, expressions in SELECT, LIMIT, ...) And then this week Greg suggested I take a look at this gem of a book (sadly, out of print): The Essence of SQL: A Guide to Learning Most of SQL in the Least Amount of Time by David Rozenshtein It takes a completely different and, I think, much more useful approach. It begins with a list of typical questions you might ask about a database of students/courses/profs. For example, "What are the student numbers and names of students who take CS112? ", "Who are the youngest students?", "Who does not take CS112?", or "Who takes a course which is not CS112?". These questions are meant as prototypes of the sorts of questions you would use any database to answer. The book proceeds through each question and explains how you'd use SQL to answer it, why it makes sense to do it that way, and why it even works in the first place. My intuition about this approach is that it makes for a great way to learn about databases. Structuring the book around the prototypical questions will serve as a really useful way to refer back to the course later when you have a real problem to solve, as well as being much more motivating to have the unit be problem-based. My concern is that by organising the database unit around these questions we'll be stuck mixing the language features throughout the units with all sorts of cross-referencing needed in case people just want to learn about using a WHERE clause. What do you think? Greg suggests that maybe the Rozenshtein approach is something we use in combination with our original approach: we first cover the language features in separate screencasts, we then use the Rozenshtein questions to pull it all together. There's another question I have that's separate from the question of which approach we choose. In my opinion, having real data to teach around is more exciting and educational than creating toy data that's tuned just to the specific topic. The Nobel Prize data, and the Experiments and Researchers data sets we've used in the first few lectures are okay, but wouldn't it just be so much more interesting if we started with, say data about tabacco use and physical activity, and each screencast taught about a language feature by posing and answering questions about this dataset. This is something the book Data Analysis Using SQL and Excel by Gordon S. Linoff does really well. So, another question for us (and now for you!) is, what real-world data should we use to teach? We need dataset we can tell a good story around. The data also needs to be varied enough so that the queries results in our small screencast size don't need to be scrolled, and span several tables so that we can illustrate different join types and such. Read More ›

Badges and Stars
Greg Wilson / 2010-05-27
The Open Notebook Science folks have developed a set of badges to help people label their work, ranging from "All Content — Immediate Release" to "Selected Content — Delayed Release". Like the badges for various Creative Commons licenses, or Tim Berners-Lee's proposed five-star system for rating open data, the biggest benefit of this kind of categorization is that it encourages people to think more clearly about what they are (or aren't) doing, and why. Read More ›

Archiving Experiments to Raise Scientific Standards
Greg Wilson / 2010-05-25
An NSF Workshop on Archiving Experiments to Raise Scientific Standards begins today (May 25, 2010) at the University of Utah — the schedule has links to some of the presentations, and there's a live video feed as well. Topic areas range from networking and compilers to geophysics and the life sciences — it'll be interesting to see what comes out, and how it interacts with the Open Provenance work. Read More ›

Evaluating Methods and Protocols
Greg Wilson / 2010-05-19
From their home page: ScienceCheck.org is the only site dedicated specifically to sharing objective evaluations of published experimental methods/protocols, from researchers with real-world experience. Sadly, the "computer science" category (filed under "Other") is currently empty, and there isn't a "computational science" category as far as I can see. Still, it's a beginning... Read More ›

We'll Know We've Succeeded If...
Greg Wilson / 2010-05-18
We will know the students taking this course have learned something if: They understand why Titus Brown's Data Management Plan post is both funny and sad. They have the knowledge, skills, and tools they need to do something about it. Read More ›

Day 11: Slides
Greg Wilson / 2010-05-17
Today's screencast experiment [link no longer active] is narration over PowerPoint slides: it isn't animation per se, but we'd like your feedback on whether something like this is a good way to explain "big picture" concepts. Read More ›

Why Most Scientists Don't Like Computers
Greg Wilson / 2010-05-14
Psychology is fascinating: so much of what we think we know about people turns out not to be true, while so many everyday oddities turn out to have rational explanations (for some version of "rational"). For example, I've known for twenty-five years that most scientists dread the point in their work when they have to do something new with a computer. I've finally figured out why: it's because the yield-to-effort function is wildly discontinuous, and human beings hate that. Let's start with a result from social psychology. Suppose people have a choice between waiting 5 minutes for the bus every single time, or waiting 1 minute nine times out of ten, and 20 minutes the tenth time. On average, they're better off in the second case, but almost everyone prefers the first: they value predictability over pure economic yield. Now, suppose you're a scientist—or anyone else, for that matter—and you're sitting down to do something with a computer that you haven't done before. How long is it going to take? You don't know, and what's worse, no matter how much experience you have with computers, you still don't know. You're always in the situation of someone who's wondering whether the bus will arrive in 1 minute or 20 minutes. For example, we're using TechSmith's Camtasia to create our screencasts. Between us, we have six degree in computer science and over forty years of experience, but we're still twisting our ankles in potholes every time we try to do something new. This morning's job was to add closed captioning to a screencast that Jon created discussing NULL in SQL. There's a video on the Camtasia site showing how to do this with the Windows version, but Jon did his recording on a Mac. Can I add captions there? Not as of last August, and I can't find anything more recent on the web or in Camtasia's help to indicate that the situation has changed. Can the Windows version open the Mac project? No — if I want to switch from one version of the company's flagship product to another, I have to export the content files (the audio and video) from the Mac version and load them into a new project on Windows. If I do that and add captions, will those captions survive me moving things back to the Mac? I have no idea... Scientists run into this kind of thing all the time. They know something ought to be doable, but they have no way of knowing how long it will take to straighten out the kinks and make it happen. This naturally—and rightly—makes them very conservative: they'd rather spend 10 hours doing something the "wrong" way than take a chance that some new technique might let them finish in 1 hour, because experience has taught them that switching might wind up taking 20 hours instead. I now believe that predictable effort matters more to most people than "pure" usability or overall capability: given a choice between knowing what they're getting themselves into, having an easy ride, or eventually being able to accomplish a lot, most people will choose the first. What I don't know is how to reflect that in the design of this course... Read More ›

Day 10: Closed Captioning
Greg Wilson / 2010-05-14
Our latest screencast, on NULL values in SQL, is now online. [Original link no longer active; more recent screencast on NULL values.] Unlike its predecessors, this one has closed captions (as well as a transcript in the enclosing page). Please let us know what you think: are the captions helpful, or do you find them distracting? Read More ›

Day 9: Programming
Greg Wilson / 2010-05-13
Our first screencast showing a bit of Python programming is now up for comments. [Link no longer active, more recent link to screencast on dictionaries.] It's deliberately low-tech: we used Emacs as an editor, and simply showed the program's textual output instead of stepping through it with a debugger. (Our next programming lectures will be IDE-based for comparison.) Please tell us what you think about the pace, watchability, etc. Read More ›

Day 8: Exercises (with a screencast)
Jon Pipitone / 2010-05-13
Today I experimented with creating a few simple exercises for the material covered in the first screencast on databases. The very last exercise includes a screencast that walks through the solution, and a potential pitfall along the way. My apologies for the crappy audio and choppiness in a few places — I was using my laptop microphone, and I tried to edit out a few clicks and pops. The exercises start easy and, well, stay easy, but this is the first lecture after all. These exercises are aimed at giving students the chance to test out what they've learned, and help them decide if they are ready to move on. What do you think? Read More ›

A Word (Or Three) From Our Sponsors
Greg Wilson / 2010-05-13
Starting next week, we're going to start posting interviews with the people who are sponsoring Software Carpentry. Questions we'll ask include: What sort of work does your organization do? What kind of people work there? What backgrounds and skills do they typically have when they join you, and where do they need help? What do they find most frustrating when doing computational work? How do you hope this course will help them? How will you tell if it actually has (i.e., how will you evaluate this course's success or failure)? What else would you like us to ask them? Read More ›

Day 7: Mini-screencasts
Jon Pipitone / 2010-05-12
Yesterday Jason and Greg experimented with breaking apart a 10 minute screencast on database grouping and aggregation into even more bite-sized pieces. We have two flavours today: point form notes with links to the screencasts, or point form notes with screencasts embedded. [Links no longer active, latest screencast on database aggregation as of January 2012.] Feedback on this format would be really helpful. Are these chunks too small? Do you prefer embedding over linking, or vice versa? Also, we'd welcome any comments on the material itself, of course. Read More ›

Why We're Self-Hosting
Greg Wilson / 2010-05-10
A couple of people have asked why we're planning to host and serve the content we're developing instead of using a combination of YouTube and Google Code, or one of the emerging online education services such as LearnHub or Supercool School. Part of the answer is longevity: while standards like SCORM are supposed to make e-learning content portable between different systems, in reality there is still a large degree of vendor lock-in, particularly with hosted services, and we don't want to find that our content has gone off-line because someone hasn't survived a market downturn. Hostable learning management systems like Moodle require more work to set up and administer, but are also more robust (at least for now). For another part of the answer, have a look at Matt McKeon's visualization of the evolution of privacy on Facebook. How many of the "free" services that you use are making a living by selling information on your browsing patterns to marketing firms? The odds are that you don't know, and even if a service isn't doing it now, there's no guarantee they won't in future. Read More ›

Day 6: Screencast With Point-Form Notes
Greg Wilson / 2010-05-10
Jason Montojo and I put together an introductory screencast about spreadsheets today. Unlike our previous screencasts, this one is accompanied by point-form slide-style notes instead of prose paragraphs. We'd be grateful for your feedback: are notes of this kind useful? Are they comprehensible without watching the screencast itself? If the notes were available in slide format (Keynote, PowerPoint, Impress, or plain old PDF) would they be enough for you to lecture with? Read More ›

Microsoft
Greg Wilson / 2010-05-09
We are very pleased to announce that Microsoft Corporation has come on board as a sponsor of this project. Many thank to David Rich for making this happen! Read More ›

Day 5: A Different Kind of Screencast
Greg Wilson / 2010-05-07
I spent most of today creating another screencast, this one explaining what "and", "or", and "not" mean to programmers and how they work. There are no live examples this time: instead, I illustrated the talk with some simple images. Setting aside the question of whether Venn diagrams are the best way to explain Boolean logic, do you prefer this to live desktop of the previous screencast? Do the jumps back and forth between sets and tables make sense, or are they confusing? Do the colors add value, or should we stick to black & white? And how about the pace—is it too fast, too slow, too repetitive, et cetera? Thanks in advance for your feedback. Read More ›

Day 4: First Preliminary Alpha Test Etc. Screencast
Greg Wilson / 2010-05-06
Jon Pipitone and I have created a screencast showing the basics (and I really mean "the basics") of pulling data out of a database using SQL. It took about 4.5 hours in total to produce 7 minutes of video: Writing script: 1.5 hours Doing 3 takes: 1.0 hours Editing: 1.0 hours Transcribing: 0.5 hours Screenshots: 0.5 hours This was the first time either of us had used Camtasia; I expect that we'll be able to cut the recording, editing, and screenshotting time in half with practice. Transcribing time is dictated by typing and editing speed, so it's unlikely to come down, and scripting time will probably be the same or greater for future topics (since some of them will require diagrams and other time-consuming prep). We're therefore looking at 3 hours of production for 5-7 minutes of screen time; if you figure a lecture is 50 minutes, and there are 25 of them in the course, that works out to... um... carry the six... 750 hours to get all of the existing material online. Pad by half to account for real-world effects, and that's 35-40 weeks — tight, but doable. What we need now from you, dear reader, is feedback. Is the pace right? Do you want larger chunks, or smaller ones? Is the transcript useful? (I know we have to re-do the screenshots to make them more readable — we're thinking about exactly how to do that.) What else could/should we change, and why? We look forward to your comments. Read More ›

A Question About Documentation
Greg Wilson / 2010-05-05
What kind of documentation do you use when you're programming? How useful do you find it? If have three minutes to fill in a very short survey on the topic (it's literally half a dozen questions), we'd be very grateful for your feedback. Read More ›

Day 2: More Sticky Notes
Greg Wilson / 2010-05-04
Day 2. I feel I should say something like, "We traversed the lower ridge, and are camped in a small valley overlooked by the main peak—I can hear wallabies hooting in the distance," but (a) it hasn't been quite that dramatic and (b) I don't think wallabies hoot. In this reality, Jason, Jon, and I spent another afternoon mapping content: in particular, disentangling the mouse-clicking and data-analyzing aspects of the spreadsheet lecture. Tomorrow we hope to do our first pair of five-minute trial screencasts, just to see how well (or how poorly) our chosen tools work. Read More ›

Day 1: Shuffling Sticky Notes Around
Greg Wilson / 2010-05-04
Day 1 of Software Carpentry Version 4: Jason, Jon, and I spent the afternoon shuffling sticky notes around on a whiteboard, trying to map out key concepts for our databases lecture(s) and their dependencies. The result isn't all that different from what's in Version 3, but there was one important insight. J&J commented that many of the students in the winter run of the course had understood the material, but been unable to apply it to their own problems. Our tentative explanation is that in order to apply what they have learned about SQL to their own data, students have to (a) pre-process that data (e.g., read a CSV file, then parse the cells in certain columns to extract timing values embedded in other data) and (b) figure out how to store those values in a database (i.e., do some data modeling and schema design). Our lesson plan for Version 4 already has a slot for a lecture on modeling: maybe this should be given more emphasis? Tomorrow, we hope to go through the same exercise for spreadsheets. Read More ›

Setting Up a New Windows Machine
Greg Wilson / 2010-05-03
Setting up a new machine is never fun, but it's always interesting to compare what different people have in their toolkits. Here's what I have installed so far on the desktop machine I'll be using to do Software Carpentry course develpoment: Windows XP (yes, I still use XP) Cygwin (a collection of Unix emulation tools for Windows) Adobe PDF Reader Audacity (for sound editing) MikTeX (an all-in-one LaTeX for Windows) GIMP (the open source clone of Photoshop) Google Talk (for chatting with my co-conspirators) Inkscape (an open source vector drawing tool) iTunes (this course brought to you by Beethoven, John Coltrane, and a variety of 80s bands) MATLAB (I'm using R2008b) MWSnap (for doing screen captures) Microsoft Office 2007 (because most scientists who use anything, use Excel) OpenOffice (but I'll mostly use MS Office) Python 2.6 (because some of the packages I want don't exist for Python 3 yet) R 2.11 (because after Python and MATLAB, R and Perl are our next target languages) SciPy (which gives me almost everything I want that isn't in the standard Python install) Skype (for talking to collaborators) VLC (for viewing video files) yED (for creating and editing simple bubble-and-arrow diagrams) What's missing (so far) are the science-specific tools that I'll be adding as needed for particular topics—I'll blog about those as I install them. What's interesting is how many of these tools are free (as in beer) and open (as in modifiable): I'll be blogging soon about the distinction, and why I think both are important to science, as well. CYGWIN=binmode ntsec tty Read More ›

T Minus One
Greg Wilson / 2010-05-02
Tomorrow (May 3, 2010) will be my first day of full-time work on Version 4 of this course (where "full-time" means "except for getting my daughter a new passport and taking her to the dentist for the first time"). I'm very pleased that Jason Montojo and Jon Pipitone will be joining me part-time for May and June — they helped organize and deliver the most recent run of the course at the University of Toronto, and with them on board, I'm confident that we'll be able to hit our first milestone. That milestone is to put two or three different versions of two lectures on the web by the first week of June. The topics we have chosen are databases and spreadsheets — more specifically, how to do simple data analysis using those two tools. We've decided to do this for several reasons: This material will be immediately useful to many of our intended users (particularly those in life sciences). There aren't many dependencies on other material — in particular, people don't have to learn anything about loops, object-oriented programming, and what-not before reaping rewards. Getting this stuff up early will allow us to get feedback from users on what formats they prefer. The last point is the most important one for us. Do our users want webified versions of conventional classroom lectures, like those offered by MIT and Google? Or would they (perhaps I should say "you"?) prefer lots of shorter screencasts, à la ShowMeDo? If so, how should they be arranged? What about examples and exercises: should they be presented in the browser somehow, or should we ask (require?) people to download code and run it on their own machine? Putting a few variations out there and asking people which they prefer seems like the best way to find out. Read More ›

Apologies for the Flurry of Re-Posts
Greg Wilson / 2010-04-19
Our apologies for the flood of re-posts that some of you may have seen over the weekend: apparently, adding a category to a post, or changing its existing category, makes some blog readers believe the whole post is new. We're sorry for any confusion or inconvenience the clutter may have caused. Read More ›

File Sharing for Scientists
Greg Wilson / 2010-04-16
A scientist I recently met in Toronto had a problem: how to share large files with colleagues. Each file is a couple of hundred megabytes; dozens are produced each week, but each is only interesting for a couple of months; and there are confidentiality issues, so some kind of password protection is needed. Conventional file-sharing services like Dropbox aren't designed for data that size, so in the end she bought a domain and set up secure FTP. But now there's this: BioTorrents: A File Sharing Service for Scientific Data The transfer of scientific data has emerged as a significant challenge, as datasets continue to grow in size and demand for open access sharing increases. Current methods for file transfer do not scale well for large files and can cause long transfer times. In this study we present BioTorrents, a website that allows open access sharing of scientific data and uses the popular BitTorrent peer-to-peer file sharing technology. BioTorrents allows files to be transferred rapidly due to the sharing of bandwidth across multiple institutions and provides more reliable file transfers due to the built-in error checking of the file sharing technology. BioTorrents contains multiple features, including keyword searching, category browsing, RSS feeds, torrent comments, and a discussion forum. BioTorrents is available at http://www.biotorrents.net. It's a neat idea, and will become neater once scientists routinely put DOIs on data as well as papers. I'd be very interested in a usability study to see how easy or hard it is for the average grad student in botany to get this plugged in and turned on. Read More ›

Scimatic Sponsorship
Greg Wilson / 2010-04-15
We're very pleased to announce that Scimatic Software, a Toronto based company that specializes in the development of software for the scientific community, has come on board as a sponsor of this project. Many thanks to Jamie McQuay and Jim Graham! Read More ›

Teaching Open Source
Greg Wilson / 2010-04-12
Over at opensource.com, Red Hat's Greg DeKoenigsberg has a post about a new collaboratively-authored textbook on open source software aimed squarely at undergrad courses. As Máirín Duffy points out in the first comment, it's very code-centric, but in my experience, that's the right approach: students won't be ready for discussion of design until they're proficient in coding [1]. I'm looking forward to borrowing lots from the book for Software Carpentry... [1] This is, by the way, why I believe that attempts to teach "computational thinking" without first teaching programming are doomed to fail, but that's a rant for another time. Read More ›

More on Instructional Design
Greg Wilson / 2010-04-12
Like many programmers, I've learned most of what I know by poking around and breaking things. Quite naturally, that has led me to believe that this is the best way to learn—after all, if it worked for me, it has to be pretty good, right? But research says otherwise. Kirschner, Sweller, and Clark's paper, "Why Minimal Guidance During Instruction Does Not Work: An Analysis of the Failure of Constructivist, Discovery, Problem-Based, Experiential, and Inquiry-Based Teaching", was published in Educational Psychologist in 2006, but the whole text is available online. Although unguided or minimally guided instructional approaches are very popular and intuitively appealing...these approaches ignore both the structures that constitute human cognitive architecture and evidence from empirical studies over the past half-century that consistently indicate that minimally guided instruction is less effective and less efficient than...approaches that place a strong emphasis on guidance of the student learning process. The advantage of guidance begins to recede only when learners have sufficiently high prior knowledge to provide "internal" guidance. A few selections from the main body: Minimally guided instruction appears to proceed with no reference to the characteristics of working memory, long-term memory, or the intricate relations between them. The result is a series of recommendations that most educators find almost impossible to implement...because they require learners to engage in cognitive activities that are highly unlikely to result in effective learning. As a consequence, the most efefctive teachers may either ignore the recommendations or, at best, pay lip service to them. (pg. 76) Inquiry-based instruction requires the learner to search a problem space for problem-relevant information. All problem-based searching makes heavy demands on working memory. Furthermore, that working memory load does not contribute to the accumulation of knowledge in long-term memory because while working memory is being used to search for problem solutions, it is not available and cannot be used to learn... The consequences of requiring novice learners to search for problem solutions using a limited working memory or the mechanisms by which unguided or minimally guided instruction might facilitate change in long-term memory appear to be routinely ignored. The result is a set of differently named but similar instructional approaches requiring minimal guidance that are disconnected from much that we know of human cognition. (pg. 77) None of [this] would be important if there was a clear body of research...indicating that unguided or minimally guided instruction was more effective than guided instruction. In act...the reverse is true. Controlled experiments almost uniformly indicate that when dealing with novel information, learners should be explicitly shown what to do and how to do it. (pg. 79) After a half-century of advocacy associated with instruction using minimal guidance, it appears that there is no body of research supporting the technique. In so far as there is any evidence from controlled studies, it almost uniformly supports direct, strong instructional guidance rather than constructivist-based minimal guidance during the instruction of novice to intermediate learners. Even for students with considerable prior knowledge, strong guidance while learning is most often found to be equally effective as unguided approaches. Not only is unguided instruction normally less effective; there is also evidence that it may have negative results when students acquire misconceptions or incomplete or disorganized knowledge. (pg. 83) There are well over a hundred references into the literature. If they're right (and I'm now convinced), then the material for this course should be presented in smaller chunks than I've used in the past, and each should be accompanied by several worked examples. Read More ›

Measuring Science
Greg Wilson / 2010-04-11
Julia Lane, the director of the Science of Science & Innovation Policy program at the National Science Foundation, wrote an article for Nature a couple of weeks ago titled "Let's make science metrics more scientific". As the summary at the start says: Existing metrics have known flaws A reliable, open, joined-up data infrastructure is needed Data should be collected on the full range of scientists' work Social scientists and economists should be involved The same points could be made about evaluating software developers (or any other kind of knowledge worker). The devil, as always, is in the details, and unfortunately I have to start doing evaluations before those details are worked out. Several of the supporters for this course need me to demonstrate its impact on the productivity of the scientists who take it (so that they can in turn justify their contribution to their funders). It isn't enough to ask students who have completed the course whether they think they know more about programming than they used to: ignoring the obvious problems of survivor bias and self-assessment, I would still have to demonstrate that making people better programmers also makes them better scientists. I believe it does, but belief is not evidence, and doesn't convey scale. The best plan I've been able to come up with so far is to look at how scientists spend their time before and after taking the course, but that would require resources I don't have. If you're interested in studying scientists or software developers empirically, and would like some raw material, I'd like to hear from you. Read More ›

Software Carpentry for Economists in Mannheim This Autumn
Greg Wilson / 2010-04-08
Hans-Martin von Gaudecker is planning to teach a Software Carpentry-style course for economists at Universität Mannheim this autumn — as his announcement says, "I think it is amazing that a profession obsessed with efficiency affords a very obvious inefficiency: Most researchers nowadays spend a fair share of their time programming, but hardly anyone has been taught to do that well." I'll post updates here as he sends them. Read More ›

Platforms
Greg Wilson / 2010-04-08
After Thursday's post-mortem on the latest offering of Software Carpentry at the Universitiy of Toronto, I had a chance to talk further with Jon Pipitone, who was one of the tutors (and who is just wrapping up an M.Sc. looking at code quality in climate models). We got onto the topic of infrastructure for Version 4, which needs to be settled quickly. a way to deliver content to students, including text and images, audio/video, exercises (with solutions), sample data sets, and useful software; a way for students to feed questions back to the course organizers (asynchronously through email and bulletin boards and/or synchronously through VoIP and desktop sharing); a way for instructors (who may or may not be contributors) to respond to students; a way for lay contributors (who may also be students) to offer new content, from pointing out typos to providing exercises or whole new lectures; and a way for core contributors to manage and edit contributions, create some of their own, et cetera. This description implies some social infrastructure, including: some core contributors who are creating lots of course content, and probably teaching the course as well; a second tier of instructors who are creating less content but also interacting with students; and students, who may be registered in a course of some kind or working through the material on their own (and who may eventually move up to answering others' questions and eventually to creating content). This sounds like what you'd find in an open source project (regular users, occasional testers or bug reporters, and contributors) at least as much as it sounds like a traditional college course (students, teaching assistants, and professors). The most important difference is that the divisions between the latter three roles are sharper and deeper than the divisions in open source projects: some undergrad students eventually go to grad school and become TAs, and some of those eventually become faculty, but it's almost unheard of for someone to be in two of those categories at once, or to "bubble up" from one to the next based solely on ability and enthusiasm. That must happen if Software Carpentry is to become self-sustaining: while over 140,000 people have looked at the existing material in the past five years, only three dozen have ever sent bug reports, and only four of those have contributed any substantial content. Whatever we use to accomplish tasks 1-5 above must therefore draw people in and make it easy for them to use, ask, answer, and contribute. With that out of the way, here are a few options for discussion: Project: use SourceForge or Google Code as a host. Retro: a classic turn-of-the-century web site with static HTML for content, bulletin boards and/or mailing lists for discussion, Trac with Subversion for project management. Social: a WordPress blog with lectures and other content as posts (updated several times, with threaded comments for feedback). Turnkey: a fully-fledged learning management system such as Moodle. Wiki: like Retro, but with the content in a wiki. So how do they stack up? Project Retro Social Turnkey Wiki Easy to set up/administer/maintain? +1 0 0 0 +1 Easy for people to contribute? -1 -1 0 -1 0 Comes with everything? 0 -1 0 +1 0 Flexible content delivery? -1 0 -1 +1 -1 Overall -1 -1 -1 -1 -1 A bit of explanation: Project hosting services require little or no setup, but would force us to manage this as a software development project, rather than as a writing project: self-tests and "try this at home" examples aren't built in, and neither of the big open source project hosting sites would be happy if we used them as a video server. The "retro" option would require us to "roll our own" on a lot of things, which would be fun (I like to program) but wouldn't directly deliver value to scientists. WordPress is easy to set up, but doesn't have very many development-oriented or education-oriented plugins, and isn't designed to host video snippets or live examples. I think that building the course as a blog is a neat idea, but the novelty would soon wear off... Learning management systems like Moodle have a lot we don't need (recording grades, for example), but a lot that we do (managing course bundles). I gave this category -1 for ease of setup because I'm unfamiliar with it, and 0 for "easy for people to contribute" because the LMSes I've looked at (ATutor, OLAT, Moodle, and Sakai) all seem to have significant learning overheads for creating content—they're a bigger hammer than I (think I) need. Of course, that could just be unfamiliarity again... A wiki would be easy to set up and maintain, but in my experience, editing large volumes of material in a browser is unpleasant, and there's little support for managing updates, particularly by concurrent authors. As always, I'd welcome your thoughts... Read More ›

Feedback and Boundaries
Greg Wilson / 2010-04-04
Thanks to the initiative of Dominique Vuvan (who took Software Carpentry last summer), we ran a semi-formal version of the course from last November through to this past week for grad students in Psychology, Linguistics, and a few other disciplines at the University of Toronto. Weekly tutorials were offered in both Python and MATLAB by graduate teaching assistants from Computer Science, covering roughly half of the existing material. The two things students like least were the general disorganization of the course and the fact that a lot of the material felt like what we computer scientists though they ought to know, rather than what they could see as being immediately useful. The disorganization reflects the grassroots nature of this round of the course, and the fact that it was our first time teaching in MATLAB. Next time around, we'll use a more natural order for material in MATLAB, rather than sticking to the order that makes sense for Python, but forces students to grapple with some of the more obscure features of MATLAB early on. The "eat your vegetables" tone of the material is going to be much harder to deal with. Software Carpentry is meant to be a second course in computing, not an introduction to programming in general: as the last of the user profiles says: This course is probably too advanced for [a novice], as it assumes familiarity with basic programming concepts like loops, conditionals, arrays, and functions. [They] should probably audit a first-year introduction to programming or find an intensive two-week summer school course before tackling this one. The problem is that if we'd actually applied that rule last November, we would have turned away more than half of the students, most of whom would never have acquired those basic concepts. So, do we: ignore the problem and hope that these people will somehow pick up the basics on their own (despite the fact that most scientists never do), or broaden the course's mission to include basic programming as well. My phrasing makes my preference for the second option clear, but feature creep is the biggest risk this project faces. Teaching the basics of Python to people who already know a bit about programming takes 4-5 lectures out of the 25 budgeted; if they don't know how to program, that figure probably triples, leaving only 10 lecture hours to cover a much-reduced subset of the planned material. On the other hand, sticking to the plan means condemning the majority of potential students to wander lost and frustrated through a bewildering maze of seemingly inconsistent behavior, and to hour upon wasted hour of heartbreaking frustration. (That was a bit melodramatic, but not necessarily inaccurate.) Another argument against option #2 is pacing. Software Carpentry has been run four times at the University of Toronto (twice as non-credit tutorials and twice as a regular for-credit course). Each time, wide variation in students' prior experience levels meant that no matter how material was paced, one third of the class would be bored or another third bewildered. On balance, therefore, I think Software Carpentry has to continue to assume a more advanced starting point than most of its potential audience currently has. If things go well, I hope we'll be able to backfill with more accessible introductory material in a year's time. Read More ›

Simon Singh Wins (and So Does Science)
Greg Wilson / 2010-04-01
Simon Singh, the science journalist who was sued for libel by the British Chiropractic Association, has won the right to rely on the defense of "fair comment". (Full ruling linked from this Index on Censorship post.) Singh had pointed out that there's no evidence to back up BCA claims that their particular brand of pseudoscience could help with asthma and other ailments; it has taken him two years and £200,000 later to get this far, and it may be another two years before the matter is finally settled, but this is an important victory for everyone who believes in rational inquiry. Read More ›

Models To Imitate
Greg Wilson / 2010-04-01
My father once told me that a week of hard work can sometimes save you an hour of thought. In that spirit, I've been looking for asynchronous online courses to imitate. I previously mentioned MIT's Open Courseware, CMU's Open Learning Initiative, and (closer to my scale) Saleem Khan's Khan Academy. Google Code University's lessons on programming languages are also on my radar—I'll blog more about them once I finish the Python material—but another model that I'm looking at closely is Teaching Open Source, a collaborative effort to get more open source into college and university courses. I first encountered them through POSSE (Professors' Open Source Summer Experience), which they describe as: ...a weeklong bootcamp that will immerse professors in open source projects. Participants spend a week of intensive participation in selected open source projects, led by professors with experience in teaching open source development, in partnership with community members who have deep experience and insight. By the end of the session, participants should have a much better understanding of the workings of open source projects, and a strong network of contacts to lean on as they begin to bring students into the open source world. I've also been watching in awe (with a small 'a', but awe nonetheless) as half a dozen contributors have pulled together a textbook called Practical Open Source Software Exploration: How to be Productively Lost, the Open Source Way. It's by no means complete, but I have already bookmarked it in a dozen places, and expect to add more. I always hoped that Software Carpentry would become a community project of this kind; here's hoping that Version 4 finally manages to. Read More ›

Periodic Table of Science Bloggers
Greg Wilson / 2010-03-31
David Bradley has created a periodic table of science bloggers that regular readers might enjoy: Read More ›

Formats
Greg Wilson / 2010-03-30
As I said in last week's announcement, and mentioned again in a later post, one of the main goals of this rewrite is to make it possible for students to do the course when and where they want to. That means recording audio and video, but much of the material will probably still be textual: code samples (obviously), lecture notes (for those who prefer skimming to viewing, or who want to teach the material locally), and exercises will still be words on a virtual page. And even the AV material will (probably) be accompanied by scripts or transcripts, depending on what turns out to work best. Which brings up a question everyone working with computers eventually faces: what format(s) should material be stored in? For images, audio, and video, the choices are straightforward: SVG for line drawings, PNG for images, MP3 for audio, and MP4, MPEG, or FLV or video (I don't know enough yet to choose). But there's a bewildering variety of options for text, each with its pros and cons. Authoring tools: do authors need to use a specialized editor? If so, is it freely available for the three major platforms (Windows, Linux, and Mac)? Composition: can authors "just type", or do they need to spend a lot of keystrokes on markup? Diffing and merging: does the format play nicely with version control systems, i.e., if two or more people edit independently, can their changes easily be merged after the fact? Formatting: does the format allow fine-grained control over layout? (My personal test here is how easy it is to create tables with irregular arrangements of rows and columns.) Multiple output formats: can HTML pages, slides, PDFs, and what-not all be produced from a single source? Referencing: does the format take care of section and figure numbering, cross-references, and bibliographic citations automatically? WYSIWYG: does the raw content have to be compiled or transformed to produce something viewable, or is what you see what you get? Here are the options as I see them: Format A C D F M R W Minimum Microsoft Word -1 +1 -1 +1 -1 +1 +1 -1 OpenOffice 0 +1 -1 +1 -1 +1 +1 -1 DocBook 0 -1 0 0 +1 0 -1 -1 Other XML 0 -1 0 -1 0 -1 -1 -1 Plain Old HTML 0 -1 0 -1 0 -1 +1 -1 S5 and its kin 0 -1 0 -1 0 -1 +1 -1 Wiki text +1 +1 +1 -1 +1 0 -1 -1 LaTeX +1 0 0 +1 0 +1 0 0 I use the minimum in evaluation, rather than the average or total score, because what you notice most when you're working with something is usually what's most annoying about it. Or maybe that's just me... But what do these numbers actually mean? In no particular order: Binary file formats don't work well with version control systems, since the latter use textual differencing to reconcile changes between versions or by concurrent editors. This rules out the default formats used by Microsoft and OpenOffice. Machine-generated XML doesn't fare any better, since the differencing tools used in version control systems ignore the semantics ("element inserted") and become confused by the representation ("18 lines changed"). This rules out various XML-based options for Word and OO. In contrast, XML or HTML that has been written using a plain old text editor usually has line breaks in useful places (i.e., more of the semantics is reflected in the representation) so diff and merge work much better. On the other hand, if you're using a POTE, 20-40% of your keystrokes go into markup (all those angle brackets and attributes) rather than content. WYSIWYG XML/HTML editors help a bit (I'm using the one built into WordPress right now), but most generate the same tangled diff-hostile output as the options dismissed above. With respect to particular formats: "Real" DocBook is a lot of work to produce. O'Reilly's DocBook Lite (a subset of the official format) is less effort, but there are still a lot of angle brackets to type in—I haven't yet found an editor that will let me type Ctrl-B and switch to DocBook-compliant bolding, for example. Homebrew XML markups, like the one used by Pragmatic, all seem to converge on the features of DocBook Lite. There's also the problem of finding (or building), tweaking, and maintaining tools to produce the end result. (I created my own format, and built my own tools, for Version 2 of the course; won't make that mistake again.) Plain old HTML has all the disadvantages of homebrew XML markup, but does have the advantage of being able to view without a compilation step—so long as you don't care about numbering, cross-references, etc. For that, you need tools, which need to be created, maintained, and tweaked. Various HTML-based slideshow formats, like S5, add some semantic information to plain old HTML that a bit of in-browser Javascript can use to produce PowerPoint-style effects. Numbering and cross-referencing still need tools, though, and S5 and various follow-ons are mostly orphaned these days. Wiki text: easy to type in (that's the whole point), and plays well with version control, but (a) processing tools (again), and (b) the degree of control over markup is usually fairly limited. That said, Wiki Creole and reStructured Text are appealing: there are lots of compilation/conversion tools for both. The downside is that both actually require compilation: so far as I can tell, there isn't a WYSIWYG editor for either that is still being maintained. (Update: there may be one for reST: I'd welcome input from anyone who has used it.) LaTeX: ah, LaTeX, my old nemesis—it has been a while, hasn't it? It plays nicely with version control; it handles cross-referencing, gives users fine control over layout—very fine control, if you want it—and there is even a WYSIWYG editor. On the downside, its syntax is complicated, but I've already mastered it, and so have many other scientists. More importantly, though, my past attempts to produce pretty HTML from LaTeX using Latex2Html and Plastex have been frustrating. So, does that mean LaTeX is the right answer? My scoring says I should—what do you think? Read More ›

What's Not on the Reading List
Greg Wilson / 2010-03-29
I mentioned yesterday that I maintain a list of books that haven't been written yet. Partly it's an exercise in sympathetic magic—if the reviews exist, maybe the books will follow—but it's also useful for organizing my thoughts about what a programmer's education should look like. Looking at the books I've matched to various topics in the Software Carpentry course outline, there are some distressing gaps: Given that programmers spend upwards of 40% of their time debugging, there are very few books about it, and only one collection of exercises (Barr's Find the Bug). There's a lot on higher-level programming techniques, but it's scattered across dozens of books as disparate as The Seasoned Schemer, Effective C++, and The Practice of Programming. I haven't read Perrotta's Metaprogramming Ruby yet, but it looks like it will be another rich source of ideas. Material on systems programming—manipulating files and directories, running sub-processes, etc.—is equally scattered. The Art of Unix Programming includes all the right topics, but covers too much, in too much detail, at too low a level. Gift & Jones' Python for Unix and Linux System Administration has the same two faults (from Software Carpentry's point of view—I think both are excellent books in general), but uses a scripting language for examples, so it made the list. Mark Guzdial and others have done excellent research showing the benefits of teaching programming using multimedia, i.e., showing students how to manipulate images, sound, and video as a way of explaining loops and conditionals. That's half of why the revised course outline includes image processing early on (the other halves being "it's fun" and "it's useful"). Once again, most of what I'm familiar with is either documentation for specific libraries, or textbooks on the theory of computer vision, but there are some promising titles in the MATLAB world that I need to explore further. Performance. It's been 15 years since I first grumbled about this, and the situation hasn't improved. Most books on computer systems performance are really textbooks on queueing theory; of that family, Jain's Art of Computer Systems Performance Analysis is still head and shoulders above the crowd. Souders' High Performance Web Sites is the closest modern equivalent I've found to Bentley's classic Writing Efficient Programs, but neither is really appropriate for scientists, who need to think about disk I/O (biologists and their databases), pipelining and caching (climatologists with their differential equations), and garbage collection (everybody using a VM-based language). I had hoped that High Performance Python would fill this gap, but it seems to have been delayed indefinitely. (And yes, I've looked at Writing Efficient Ruby Code; it has some of what our students want, but not nearly enough.) There are lots of books about data modeling, but all the ones I know focus exclusively on either the relational approach or object-oriented design, with a smattering that talk about XML, RDF, and so on. I haven't yet found something that compares and contrasts the three approaches; pointers would be welcome. Web programming. There are (literally) thousands of books on the subject, but that's the problem: almost all treatments are book-length, and this course only has room for one or two lectures. It is possible to build a simple web service in that time, but only by (a) using a cookbook approach, rather than teaching students how things actually work, and (b) ignoring security issues completely. I'm not comfortable with the first, and flat-out refuse to do the second: if this course shows people how to write a simple CGI script that's vulnerable to SQL injection and cross-site scripting, then it's our fault when the students' machines are hacked. This gap is as much in the available libraries as in the books, but that doesn't make it any less pressing. Given these gaps, I may drop one or two topics (such as performance and web programming) and either swap in one of the discarded topics or spend more time on some of the core material. I'm hoping neither will be necessary; as I said above, pointers to books in any language that are at the right level, and cover the right areas, would be very welcome. Read More ›

Recommended Reading
Greg Wilson / 2010-03-28
I'm slightly obsessed with reading lists. (I even maintain a list of books that haven't been written yet, in the hope that it will inspire people to turn some of the entries from fantasy into reality.) Partly to give credit to all the people whose work inspired Software Carpentry, and partly to guide students who want to learn more than we can fit into a double dozen lectures, I have started a bibliography, and added links to relevant books to the lecture descriptions in the course outline. Pointers to other material would be very welcome; I will blog soon about areas that I feel are particularly lacking. Read More ›

Online Delivery
Greg Wilson / 2010-03-26
As the announcement of Version 4 said, Software Carpentry is being redesigned so that it can be delivered in several ways. I want to support: traditional classroom lectures, with someone at the front of the room talking over a series of slides and/or coding sessions to a captive audience; students reading/viewing material on their own time, at their own pace, when and as they need it; and hybrid models, in which students work through as much as they can on their own, then get help (face-to-face or over the web) when they hit roadblocks. #1 isn't easy to do well, but the challenges involved are well understood. #2 and #3 are going to be a lot harder: it's new ground for me, and despite the fact that the Internet is older than many of my students, most of the educational establishment still thinks of it as "new" as well. There are hundreds of books and web sites devoted to e-learning, but the majority just recycle the same handful of inactionable truisms. ("When designing online material, try to make it as engaging as possible." Well, duh.) Most of the high-quality material focuses on research about e-learning, rather than instructional design itself. For example, Richard Mayer's Multimedia Learning says a lot of interesting things about whether people learn more deeply when ideas are expressed in words and pictures rather than in words alone, and the principles he derives from his research are good general guidelines, but again, there's little help offered in translating the general into the specific. If there isn't much explicit guidance available, what about prior art? MIT's Open Courseware got a lot of attention when it was launched, but its "talking heads" approach reminds me of early automobiles that looked like horse-drawn carriages with motors bolted on. Carnegie-Mellon's Open Learning Initiative (which advertises itself as "open courses backed by learning research") is more interesting, but what has really caught my eye is Saleem Khan's Khan Academy, which I first encountered through one of Jon Udell's interviews. Khan has created hundreds of short videos on topics ranging from basic addition to mitosis and Laplace transforms by recording himself sketching on a tablet. The results are just as digestible as Hollywood-quality material I've viewed elsewhere, and with 25 lectures to do in less than 50 weeks, his low-ceremony approach appeals to me for practical reasons as well. Of course, any believer in agile development would tell me that there's only one right way to tackle this problem (and in fact, one did just an hour ago). By the end of May, I plan to put one lecture—probably the intro to relational databases and SQL—up on the web in two or three formats, and then ask for feedback. Is one 50-minute video better or worse than five 10-minute vignettes? Do people prefer PowerPoint slides with voiceover, live sketching/coding sessions (complete with erasures and typos), or some mix of the two? How important is it to close-caption the videos? If classroom-style slides are available as well as the video, how many people look at each? I know how to do these kinds of usability studies, and hopefully enough people will volunteer opinions to help me choose the right path. Read More ›

Instructional Design
Greg Wilson / 2010-03-26
As well as deciding on the format of the course, I have to re-shape its content. In contrast to e-learning, there seems to be a lot of solid material available on instructional design. The most useful guide I've found so far is Wiggins & McTighe's Understanding by Design. I was initially a bit put off by the micro-industry the authors have built around the book, but its step-by-step approach immediately felt right: What are students supposed to understand at the end of the lesson? How is that going to be determined, i.e., what questions will they be answer that they couldn't answer before, or what will they be able to do that they couldn't do before? What lessons and activities are going to help them acquire that knowledge and those skills? The whole thing is a lot more detailed than that, but you get the gist. And note that the last point says "help them acquire", not "teach them": while the latter focuses on what the instructor says, the former focuses on helping students construct understanding, which is both more accurate and a better fit for the level of students this course targets. I've already used their ideas in reshaping the course outline. If the right way to deliver the course turns out to be 200 vignettes rather than 25 lectures, I will need to do some chopping and rearranging, but I think that what I have is a good starting point. Once I know what format I'm going to choose, I will rework the outline in accordance with the three-step approach summarized above and ask for feedback. Read More ›

Summer Course: Analyzing Next-Generation Sequencing Data
Greg Wilson / 2010-03-25
Analyzing Next-Generation Sequencing Data May 31 — June 11th, 2010 Kellogg Biological Station, Michigan State University CSE 891 s431 / MMG 890 s433, 2 cr http://bioinformatics.msu.edu/ngs-summer-course-2010 Applications are due by midnight EST, April 9th, 2010. Course sponsor: Gene Expression in Disease and Development Focus Group at Michigan State University. Instructors: Dr. C. Titus Brown and Dr. Gregory V. Wilson This intensive two week summer course will introduce students with a strong biology background to the practice of analyzing short-read sequencing data from the Illumina GA2 and other next-gen platforms. The first week will introduce students to computational thinking and large-scale data analysis on UNIX platforms. The second week will focus on mapping, assembly, and analysis of short-read data for resequencing, ChIP-seq, and RNAseq. No prior programming experience is required, although familiarity with some programming concepts is suggested, and bravery in the face of the unknown is necessary. 2 years or more of graduate school in a biological science is strongly suggested. Read More ›

Software Carpentry Version 4 is a Go!
Greg Wilson / 2010-03-25
I am very excited to announce that I am going to work full-time on revising the Software Carpentry course from May 2010 to May 2011. This work has been made possible by the generosity of our members: The School of Informatics and Computing at Indiana University The Gene Expression in Disease and Development Focus Group at Michigan State University MITACS The Centre for Digital Music at Queen Mary University of London SciNet SHARCNET The UK Met Office I would also like to thank The MathWorks, the University of Toronto, the Python Software Foundation, and Los Alamos National Laboratory, whose support over the past 13 years has allowed us to help scientists use computers more productively. Version 4 of Software Carpentry will improve on its predecessors in three significant ways. First, the content will be reorganized and updated to better meet scientists' needs. As with Version 3, a typical graduate student or research scientist should be able to cover all of the material in a regular one-term course with approximately 25 hours of lecture and 100-150 hours of exercises. Second, we intend to provide parallel versions of the material in MATLAB and Python, so that scientists who already know numerical computing's most popular scripting language can dive right into the parts that interest them most. We have been testing a MATLAB translation of the Version 3 notes this winter with good results, and are grateful to the students at the University of Toronto who have tried them out and given us feedback. Third, and most importantly, Version 4 of the course will be designed so that students can work through most or all of the material on their own, at their own pace, when they need it. To do this, we will make video recordings of the lectures available, along with screencasts and interactive examples, and provide over-the-web support via email, Skype, and desktop sharing to help early adopters when they run into roadblocks. We hope that this will allow us to reach, and help, many more people that would otherwise be possible. Software Carpentry is an open project: all of the materials are available under the Creative Commons Attribution license, and can be freely shared and remixed provided you include a citation. If you would like to help us help scientists be more productive, please contact us by email at team@carpentries.org or as swcarpentry on Skype. Read More ›

Now on Twitter
Greg Wilson / 2010-03-23
You can now follow our progress at 'swcarpentry' on Twitter. Read More ›

How Much Of This Should Scientists Understand?
Greg Wilson / 2010-03-11
Let's start with the problem description: All of the Software Carpentry course material (including lecture notes, code samples, data files, and images) is stored in a Subversion repository. That's currently hosted at the University of Toronto, but I'd like to move it to the software-carpentry.org domain (along with this blog). However, software-carpentry.org is hosted with site5.com, who only provide one shell account per domain for cheap accounts like the one I bought. Why is this a problem? Because when someone wants to commit to the repository, they have to authenticate themselves. I could let everyone who's writing material for the course share a single user ID and password, but that would be an administration nightmare (as well as a security risk). Site5 does have a workaround based on public/private keys, but it's fairly complicated—i.e., it could break in lots of hard-to-diagnose ways. Another option would be to use the mod_dav_svn plugin for Apache, but Site5 doesn't support per-domain Apache modules either. Dreamhost.com does, so I may be switching hosts in a few weeks. So: how much of this should the average research scientist be expected to understand? If the answer is "none", then how are they supposed to make sensible decisions about moving their work online? If the answer is "all", where does the time come from? (It takes me 30 seconds to read the two paragraphs above; it would take many hours of instruction to teach people enough to do the analysis themselves.) And if the answer is "some", then which parts? To what depth? And who takes care of the rest on scientists' behalf? Read More ›

Panton Principles
Greg Wilson / 2010-02-28
Via Cameron Neylon: the Panton Principles are guidelines for open data in science. In full: Science is based on building on, reusing and openly criticising the published body of scientific knowledge. For science to effectively function, and for society to reap the full benefits from scientific endeavours, it is crucial that science data be made open. By open data in science we mean that it is freely available on the public internet permitting any user to download, copy, analyse, re-process, pass them to software or use them for any other purpose without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. To this end data related to published science should be explicitly placed in the public domain. Formally, we recommend adopting and acting on the following principles: Where data or collections of data are published it is critical that they be published with a clear and explicit statement of the wishes and expectations of the publishers with respect to re-use and re-purposing of individual data elements, the whole data collection, and subsets of the collection. This statement should be precise, irrevocable, and based on an appropriate and recognized legal statement in the form of a waiver or license.When publishing data make an explicit and robust statement of your wishes. Many widely recognized licenses are not intended for, and are not appropriate for, data or collections of data. A variety of waivers and licenses that are designed for and appropriate for the treatment of data are described here. Creative Commons licenses (apart from CCZero), GFDL, GPL, BSD, etc are NOT appropriate for data and their use is STRONGLY discouraged.Use a recognized waiver or license that is appropriate for data. The use of licenses which limit commercial re-use or limit the production of derivative works by excluding use for particular purposes or by specific persons or organizations is STRONGLY discouraged. These licenses make it impossible to effectively integrate and re-purpose datasets and prevent commercial activities that could be used to support data preservation.If you want your data to be effectively used and added to by others it should be open as defined by the Open Knowledge/Data Definition — in particular non-commercial and other restrictive clauses should not be used. Furthermore, in science it is STRONGLY recommended that data, especially where publicly funded, be explicitly placed in the public domain via the use of the Public Domain Dedication and Licence or Creative Commons Zero Waiver. This is in keeping with the public funding of much scientific research and the general ethos of sharing and re-use within the scientific community. Explicit dedication of data underlying published science into the public domain via PDDL or CCZero is strongly recommended and ensures compliance with both the Science Commons Protocol for Implementing Open Access Data and the Open Knowledge/Data Definition. Read More ›

Eighty Per Cent!
Greg Wilson / 2010-02-25
As of this morning, I have signed commitments for 4/5 of the money I need to spend a year working full-time on updating the Software Carpentry course. If you, or someone you know, would like to help me help scientists be more productive, please get in touch: I have only 64 days in which to find the $30K I still need. Read More ›

BEACON Funded!
Greg Wilson / 2010-02-22
Congratulations to Titus Brown and others on the NSF's announcement that it will fund the BEACON (Bio/computational Evolution in Action Consortium) Science and Technology Center. BEACON "...BEACON is focused on studying the evolution of organization across multiple scales—from genomic and cellular, to multicellular, to inter-multicellular (a.k.a. social)—using techniques from experimental evolution, modeling, and digital life systems." Long story short, this means that Michigan State University and its partner institutions "...has money explicitly for supporting students doing really sexy interdisciplinary work combining computation and biology." Read More ›

Two Views
Greg Wilson / 2010-02-12
Darrell Ince (who, as far as I know, hasn't worked directly with climate scientists or their software) wrote about ClimateGate for The Guardian. In response, Steve Easterbrook (who has) wrote a detailed rebuttal. Here's hoping the latter is as widely read as the former. Read More ›

It Seems That Everyone Cares
Greg Wilson / 2010-01-24
Ars Technica isn't primarily a science site, but even they are now worried about reproducibility in computational science. I think it no longer matters how important this "crisis" actually is—sooner or later, major funding agencies are going to mandate adoption of something like the Open Provenance Model. Problem is, given the current skill set of the average scientist, that will almost certainly translate into burden without benefit. Read More ›

Big Science == Big Skills Gap
Greg Wilson / 2010-01-20
Over on Nature News, Eric Hand's article "'Big science' spurs collaborative trend" is subtitled, "Complicated projects mean that science is becoming more globalized." It talks about the benefits of international collaboration, but what it doesn't say is that sharing ideas, results, procedures, and software requires skills that aren't part of the standard curriculum. One of the main goals of the rewrite of Software Carpentry is to teach scientists some of what they need to know in order to do what Hand describes without heroic effort. I'd be grateful for suggestions about topics and tools that ought to be on the list, but aren't. Read More ›

Was Designed To, But Didn't
Greg Wilson / 2010-01-18
Michael Clarke has written a thoughtful post exploring why the web hasn't disrupted scientific publishing, even though it was designed to do exactly that. Read More ›

Whatcha Gonna Do When They Come For You?
Greg Wilson / 2010-01-13
First it was pharma companies withholding "unhelpful" data, then it was ClimateGate, and now there's this: One of the founders of the controversial 'Baby Einstein' range of products is taking the University of Washington to court in an attempt to force the institution's scientists to release their raw data to him...William Clark...wants records relating to two studies published in 2004 and 2007. The latter found an "association between early viewing of baby DVDs/videos and poor language development" while the former suggested "efforts to limit television viewing in early childhood may be warranted". If someone challenged your results, could you reassemble the programs and data you'd used to produce them? And what would happen if you couldn't? Software Carpentry isn't just about making scientists more productive; the skills that will help them do more, faster, will also make their work more traceable and reproducible. Read More ›

Podcast with Jon Udell
Greg Wilson / 2010-01-13
Jon Udell recently interviewed me about Software Carpentry and related topics—the podcast is now on his web site. Read More ›

How We Got Here, and Where We Are
Greg Wilson / 2010-01-10
I gave a talk in Santa Fe early in 1997 describing a set of articles I'd organized for the Summer 1996 and Fall 1996 issues of IEEE Computational Science and Engineering (now Computing in Science & Engineering on the subject, "What should computer scientists teach physical scientists and engineers?" After the talk, John Reynders (then the director of the Advanced Computing Lab at Los Alamos National Laboratory) challenged me to put my money where my mouth was and actually teach basic software development skills to working scientists. Brent Gorda and I ran the course for the first time in July 1997. We used Perl as a programming language, and covered topics such as CVS, regular expressions, and a little bit of web client programming. Our part of the course was three days long, and was followed by a two-day consulting visit from Steve McConnell (whose books Code Complete and Rapid Development were at the top of the charts). We ran the course in various forms another five or six times in the next three years, during which time we switched to Python and expanded it to five days. All told, about 120 LANL technical staff went through the course, most of them under 35. In 2004, after I'd taught the course for the Space Telescope Science Institute and the US Navy, the Python Software Foundation gave me a grant to reorganize, update, and expand the material. That version is the core of what's now online; when I last checked, the site was getting 10-12,000 distinct visitors a month, and the material was being used in whole or in part at Caltech, Indiana, several schools in the UK and Germany, Chile, South Korea, and of course here in Toronto. Based on follow-ups with alumni, I'd guess that it has no effect at all on 20-25%, who take the course because their manager or supervisor told them they had to, and get little out of it. The rest routinely describe it as game changing: a PhD candidate in Psychology who did the course with us in July 2009 told me a few days ago that what she learned probably saved her six months on her current project, and that without it, a second project would simply not have occurred to her to try. As another data point, one of the other alumni of that offering came to me early in October to say that several of her labmates wanted to take the course, and was I planning to offer it again any time soon? I told her that I wasn't, but that I could arrange for a CS grad student to teach it. Three weeks later, 65 students from Pysch and Linguistics had signed up to do it as a non-credit course, roughly 45 of whom have stuck with it so far. While I don't have data to back this up, I believe very strongly that what most students get out of the course isn't specific knowledge about relational databases, regular expressions, or object-oriented programming, but rather a mental map of the computing landscape, so that they know what's supposed to be easy, what else is supposed to be possible, and where to go looking for more information. Another student from the July 2009 offering said that the biggest thing the course did for him was turn "unknown unknowns" into "known unknowns". I'm supposed to conduct a follow-up survey with those students later this month to see how much they're using what they learned, and what impact is has had; I hope to have results up on the web by Easter. And as regular readers will know, I'm presently trying to raise money to update the material: this post explains the background, while this plan incorporates what I've learned from students and instructors on four continents about what material, sequence, and presentation will actually "reach" scientists. Sadly, though, funding agencies and companies mostly still seem to think that only HPC-related training is worth funding, which I feel is asking scientists to run before they can walk. This CiSE paper talks about this particular frustration, while our survey results put weight behind the claim that the overwhelming majority of scientists will benefit much more from being helped with development issues than from anything to do with big iron. Read More ›

New Challenges
Greg Wilson / 2010-01-07
As some of you already know, my contract with the University of Toronto runs out this spring, and I have decided not to seek renewal. I've learned a lot in this job, and had a chance to work with some great people, but it's time for new challenges. What I'd most like to do next is spend a year working full-time on the Software Carpentry course—of all the things I've done, it's the one that I think has the most potential to make scientists' lives better. My goal is to raise approximately CDN$25,000 from each of half a dozen sponsors so that I can reorganize and revamp the content, add screencasts and video lectures, and generally drag it into the 21st Century. An abbreviated proposal is included below the cut—if you or anyone you know would be interested in discussing possibilities, please give me a shout. Computers are as important to modern science as telescopes and test tubes. From analyzing climate data to modeling the internals of cells, they allow scientists to study problems that are too big, too small, too fast, too slow, too expensive, or too dangerous to tackle in the lab. Unfortunately, most scientists are never taught how to use computers effectively. After a generic first-year programming course, and possibly a numerical methods or statistics course later on, graduate students and working scientists are expected to figure out for themselves own how to build, validate, maintain, and share complex programs. This is about as fair as teaching someone arithmetic and then expecting them to figure out calculus on their own, and about as likely to succeed. It doesn't have to be like this. Since 1997, the Software Carpentry course has taught scientists the concepts and skills they need to use computers more effectively in their research. This training has consistently had an immediate impact on participants' productivity by making their current work less onerous, and new kinds of work feasible. The materials [1], which are available under an open license, have been viewed by over 140,000 people from 70 countries, and have been used at Cal Tech, the Space Telescope Science Institute, and other universities, labs, and companies around the world. Despite its popularity, some of the material is now out of date (and users' expectations are higher than they used to be). Our goal is therefore to upgrade the course to bring this training to the widest possible audience. Using lessons learned in the July 2009 offering sponsored by MITACS [2] and Cybera [3], we will create a self-paced version of this material that students can use independently, while also offering them somewhere to turn when they have questions or problems. As described in [4], the revised course will cover the thinks that working scientists most need to know [5], including: Program design Version control Task automation Agile development Provenance and reproducibility Maintenance and integration User interface construction Testing and validation Working with text, XML, binary, and relational data We expect the revised course will reach thousands of graduate students and working scientists, and will increase their productivity in direct and measurable ways. It will also prepare them to tackle the challenges of large-scale parallelism, cloud computing, and reproducible research. We are currently seeking contributions of $20-25K toward the $130K needed to realize this goal. By helping us, you will help current and future staff be more productive and associate yourself publicly with best practices. If you would like to help, please contact Greg Wilson at team@carpentries.org. Biography: Greg Wilson (http://pyre.third-bit.com/blog/cv) holds a Ph.D. in Computer Science from the University of Edinburgh, and has worked on high-performance scientific computing, data visualization, and computer security. He is currently an Assistant Professor in Computer Science at the University of Toronto, where his primary interests are lightweight software engineering tools and education. Greg has served on the editorial boards of Doctor Dobb's Journal and Computing in Science and Engineering; his most recent books are Data Crunching (Pragmatic, 2005), Beautiful Code (O'Reilly, 2007), and Practical Programming (Pragmatic, 2009). Links: The current course materials are at http://software-carpentry.org. http://www.mitacs.ca. http://www.cybera.ca. This page describes the revised course, and this one describes its target audience. In 2009, we conducted the largest survey ever done of how scientists actually use computers. The results are reported in this article, a shorter and more readable version of which is here. This and this explain why scientists need to learn on these skills before tackling parallelism, cloud computing, and other leading-edge technologies. Read More ›

Osmosis is Just a Fancy Name for Failure
Greg Wilson / 2009-12-30
My last post linked to a PLoS paper by Dudley and Butte on developing effective bioinformatics programming skills. I asked, "How many hours do the authors think are needed to acquire these skills?" In response, Atul Butte said, "I think the ideal scenario is when one's research projects enable one to learn these skills, so that these skills get learned in a practical way outside the classroom too, while doing science," while Luis Pedro Coelho asked, "Does it matter over the long (or even medium) term? Isn't improving your skills even you if aren't being immediately productive what school is for?" To which I can only respond, "Yeah, but that doesn't work." People have been doing computational science for almost seventy years, and have been calling it the third branch of science since (at least) the mid-1980s. If picking things up by osmosis was going to work as an educational strategy, we'd know by now. Instead, what we actually see hasn't changed in 25 years: a small minority working wonders, and the vast majority not even knowing where they ought to start. We don't expect grad students to pick up all the math and stats they need by osmosis, on their own, without any structured guidance—why should expect them to become proficient computationalists that way? Read More ›

Dudley and Butte on Software Skills
Greg Wilson / 2009-12-27
Via Titus Brown, a new PLoS paper titled "A Quick Guide for Developing Effective Bioinformatics Programming Skills" by Joel Dudley and Atul Butte. Their recommendations are: Programming languages Embracing open source Unix command-line skills Keeping projects documented and manageable Preserving source code with version control Embracing parallel computing paradigms Structuring data for speed and scalability Understanding the capabilities of hardware Embracing standards and interoperability Put a high value on your time I think all these things matter, but: How many hours do the authors think are needed to acquire these skills? We've tried very hard to fit Software Carpentry into 25 hours of lecture and 50-100 hours of practical work because we recognize that every one of those hours is time students aren't spending doing science. Shouldn't testing be in the top 10? Or the top 5, or 3? These days, I care a lot more about how (and how well) someone tests than I do about their mastery of any particular programming language. Read More ›

NSF Programs
Greg Wilson / 2009-12-19
I'd be interested in hearing from anyone who has enough direct experience of the following NSF programs to know whether they might be willing to support Software Carpentry: Course, Curriculum, and Laboratory Improvement Integrative Graduate Education and Research Traineeship CISE Pathways to Revitalized Undergraduate Computing Education Innovations in Engineering Education, Curriculum, and Infrastructure Read More ›

Double Standards
Greg Wilson / 2009-12-18
Nicola Scafetta is refusing to release the software on which he bases his claims that the sun is responsible for much of terrestrial warming during the last century. I obviously think that scientists should be required to do this as a condition of publication; coming as this does on the heels of Climategate, it will be interesting to see if journals finally start pushing in that direction. It also highlights the need to add more material to this course to cover packaging for release and data provenance. Read More ›

Why Opening Up (Probably) Wouldn't Help
Greg Wilson / 2009-12-11
Good post from Steve Easterbrook on why open-sourcing climate models probably wouldn't make a difference. Read More ›

Thanks, Jamie
Greg Wilson / 2009-11-28
The warmup tutorials for our grassroots Software Carpentry course started this week, and we'd like to send a "thank you" to Jamie Winter at The MathWorks, who has provided students with temporary licenses for MATLAB. It's all been very last minute, and we're grateful to Jamie for pulling this off on such short notice. Read More ›

Caesar's Wife
Greg Wilson / 2009-11-26
Improving the way scientists use computers isn't just about making them more productive: it's also essential to defending the integrity of their work. Stories like this one about a researcher struggling in vain for three years to replicate someone else's results can only undermine public confidence at a time when we need to make a lot of hard decisions in a hurry. Sadly, we have no one to blame but ourselves... Later: see also Victoria Stodden's post. Read More ›

Tutorials Start This Week
Greg Wilson / 2009-11-24
After a lot of hard work from Dominique and Jon, we're kicking off warmup tutorials for Software Carpentry this week. 65 students from Psychology, Linguistics, Chemical Engineering, and a couple of other departments will get three weeks of review on basic programming, then start the regular material in January. Our thanks to MITACS, the MathWorks, SciNet, and DCS for their support. Read More ›

Serendipitous and Unexpected
Greg Wilson / 2009-11-22
Via Ryan Lilian: Most research effort does not produce what is thought of as a traditionally publishable result. That doesn't mean, however, that nothing was gained by conducting the research. These results, whether they are failures or merely perplexing, can provide valuable insights into open problems and prevent other researchers from duplicating work. We started a journal that focuses on serendipitous (I have no idea why this worked) and unexpected (it seems like this technique should work on this problem but it doesn't) results. The goal of the journal is to provide a venue where ideas can flow and be debated. The Journal of Serendipitous and Unexpected Results (JSUR) is an open-access forum for researchers seeking to further scientific discovery by sharing surprising or unexpected results. These results should provide guidance toward the verification (or negation) of extant hypotheses. JSUR has two branches, one focusing on Computational Sciences and the other on the Life Sciences. JSUR submissions include, but are not limited to, short communications of recent research results, full-length papers, review articles, and opinion pieces. Recently, we launched the beta version of the journal site at http://jsur.org. We would love to get your feedback and even better, a submission for the first issue. To get the journal started, we're looking to collect a large number of short (2-4 page) reports. I know you have something to publish. Please help us spread the word and forward this information to interested colleagues. Sincerely, The JSUR Editorial Board Read More ›

Special Issue of Computing in Science and Engineering
Greg Wilson / 2009-11-18
A special issue of Computing in Science & Engineering that Andy Lumsdaine and I edited, devoted to software engineering in computational science, is now available. We'd like to thank everyone who contributed: Report on the Second International Workshop on Software Engineering for CSE, by Jeffrey Carver (University of Alabama) Managing Chaos: Lessons Learned Developing Software in the Life Sciences, by Sarah Killcoyne and John Boyle (Institute for Systems Biology) Scientific Computing's Productivity Gridlock: How Software Engineering Can Help, by Stuart Faulk (University of Oregon), Eugene Loh and Michael L. Van De Vanter (Sun Microsystems), Susan Squires (Tactics), and Lawrence G. Votta, (Brincos) Mutation Sensitivity Testing, by Daniel Hook (Engineering Seismology Group Solutions) and Diane Kelly (Royal Military College of Canada) Automated Software Testing for MATLAB, by Steve Eddins (The MathWorks) The libflame Library for Dense Matrix Computations, by Field G. Van Zee, Ernie Chan, and Robert A. van de Geijn (University of Texas at Austin), and Enrique S. Quintana-Ortí and Gregorio Quintana-Ortí (Universidad Jaime I de Castellón) Engineering the Software for Understanding Climate Change, by Steve Easterbrook (University of Toronto) and Timothy Johns (Hadley Centre for Climate Prediction and Research) Read More ›

Cloud Computing for Beginners
Greg Wilson / 2009-11-15
Ana Nelson has posted step-by-step instructions showing how to use Amazon's EC2 cloud computing platform to run simulations. There are still a lot of fiddly details, but the barriers to entry are getting lower all the time... Read More ›

Packaging
Greg Wilson / 2009-11-13
Martijn Faassen has posted a nice history of packaging and distribution tools for Python. Yes, it's a topic only a geek could love, but anyone who wants to distribute software to other people needs to grapple with these issues. The question is, should these tools, the problems that motivate them, and the technology underlying them be part of this course? Or should something equivalent (and if so, what)? Read More ›

Python in Science
Greg Wilson / 2009-11-06
Guido van Rossum just posted a description of a variety of scientific Python projects. The diversity is pretty impressive... Read More ›

Our Target Audience
Greg Wilson / 2009-11-01
Some graduate students at the University of Toronto have asked us to run the course for them later this fall or during the winter. There's an obvious selection bias (if they were expert programmers, they wouldn't need this course), but I think they're pretty representative of scientists at their level: 01. Name 39/39 100% 02. Email address 39/39 100% 03. Level of study MSc 12/39 30% PhD 27/39 70% 04. Primary programming language MATLAB 16/39 41% Python 2/39 5% Other 6/39 15% None 15/39 39% 05. Knowledge of primary language Don't know how to use it 28/39 72% Understand basic commands 10/39 26% Can program competently 1/39 2% Expert 0/30 0% 06. What other languages do you know? HTML 11/27 41% R 4/27 15% Other (VB, Java, Perl, etc.) 10/27 37% No answer 12/39 07. Would you like pre-class tutorial on programming basics (loops, files, if/else)? Yes 36/39 92% No 3/39 8% 08. Do you have a laptop? Yes 39/39 100% No 0/39 0% 09. Preferred OS Windows XP 14/39 36% Windows Vista 12/39 31% Mac OS X 7/39 18% Linux/Unix 9/30 0% 10. Do you have a MATLAB license? Yes 9/39 23% No 30/30 77% 11. Which topics are you interested in? Databases 16/39 47% Functions and Modules 14/39 41% Debugging 10/39 29% Image Processing 10/39 30% Object-Oriented Programming 10/39 30% Web Application Programming 9/39 26% GUI Programming 8/39 23% Web Client Programming 7/39 21% Computational Complexity 6/39 18% How Web Servers Work 6/39 18% Regular Expressions 6/39 18% XML 6/39 18% Automated Builds 5/39 15% Sets and Dictionaries 5/39 15% Unix Shell Scripting 5/39 15% Binary Data 3/39 9% Empirical Software Engineering 3/39 9% Quality Assurance 3/39 9% Unit Testing 3/39 9% Version Control 3/39 9% Software Development Lifecycles 1/39 3% Other (please specify) 10/39 30% Read More ›

By Popular Request...
Greg Wilson / 2009-10-30
...I have added a lecture on high performance computing to the revised outline for the course. Several people suggested it, and what's the point of asking for feedback if I don't listen? Read More ›

Cryptography Isn't Security
Greg Wilson / 2009-10-23
One topic that I've tried to include in this course a couple of times, without success, is security. I feel irresponsible not saying something about how to share safely, but I've never found something that (a) would fit into one hour, (b) wasn't platitudes, and (c) gave listeners something they could act on. One reviewer suggested talking about public/private key pairs (to help people set up SSH), signing things digitally, and the like. I'm leery of going down that road, though, since it could easily leave people with a misplaced faith in technical solutions to security problems. As always, suggestions would be welcome... Read More ›

Should Modeling Be Part of This Course?
Greg Wilson / 2009-10-21
Jon Pipitone has a good description on his blog of work the grad students in our department are doing to translate work in climate change into software engineering terms. Their first step is to represent the ideas in MacKay's excellent Sustainable Energy Without the Hot Air in two of the graphical notations that computer scientists use for system design. I was initially very skeptical, but looking at their work so far, I'm quite impressed. My question is, would it be useful for scientists to know how to do this themselves? More specifically, is the lecture on data modeling that I've planned to include in Version 4.0 worthwhile or not? Read More ›

Creating New Niches
Greg Wilson / 2009-10-21
"Publish or perish" is the central credo of academic life: despite all the hoopla about the blogosphere and online what-not, the reality for most of us is that if our work doesn't get into a respected journal or conference, it doesn't count. But what do you do if there isn't a home for your kind of work? People working in scientific computing have been struggling with this for at least a quarter century: while there are many places to submit the results of programs, there are very few places where you can publish a description of the program itself, even if building it took years and required one intellectual breakthrough after another. In contrast, if you design a new telescope, there are at least half a dozen places you could turn. (This isn't just a problem in scientific computing, by the way: Software: Practice & Experience and The Journal of Systems & Software are the only academic venues I know for descriptions of real systems, which may be one of the reasons why so much of the software written in academia is crap—there's just no payoff for doing it right.) I don't know if this situation is going to change, but one hopeful sign is a new journal called Geoscientific Model Development (which I found via Jon Pipitone). It's still early days, but I hope that giving people some kind of credit for talking about how they do things will encourage them to do those things better, and allow newcomers (like us) to get up to speed more quickly. Read More ›

Revised Plan
Greg Wilson / 2009-10-16
I've posted updates to the revised course outline. In particular, I have: Moved testing earlier. Clarified intent in a couple of places. Made an list of things we're leaving out. As always, feedback would be welcome. I'd also be grateful for pointers to places that might fund this work: as I've found in the past, many people think the course is a good idea, but it doesn't quite fit into their funding mandates :-( Read More ›

Videos from Symposium Are Now Online
Greg Wilson / 2009-10-08
I have put video recordings of the guest talks given at our July 29 symposium on Science 2.0 — please click the titles of the talks on the index page. Thanks again to all of our speakers, to the MaRS Centre technicians for the raw recordings, and to Tanya Murdoch for editing. Read More ›

Comments on Course Reorganization
Greg Wilson / 2009-10-06
I'm grateful to Lorin Hochstein for sending detailed feedback on my proposal to reorganize the course. His comments are below, with my replies and his counter-replies interspersed; more comments would be very welcome. Content I think you could drop if you wanted to save time: Read Data Directly From Hardware. I suspect that this would be relevant to only a small minority of your audience. Especially if you're teaching the course mostly in Python, because this is the sort of thing you should really do in C. Greg: Agreed; it's mostly to motivate a discussion of binary data handling, which I guess isn't that important to most people either. Vectorization: I think you could drop this, especially since you have the general "Make a Program Go Faster" section. (Then again, I don't know that much about vectorization...). Greg: Would a title change make it clearer? This is where I wanted to introduce whole-array manipulations (MATLAB-style operations), which I think many scientists do care about. Lorin: Ah, I didn't realize this was about MATLAB vectorization (I thought it was related to using an optimizing compiler to take advantage of SIMD instructions). You're right, this is worth teaching. Back when I was a grad student, I was amazed at the orders of magnitude performance improvement you can get in MATLAB by getting rid of loops and recasting your problems as linear algebra operations. There was a grad student I knew at Boston University who was amazing at turning loops into matrix multiplications. Content-specific comments: Clean Up This Code. Great idea for a topic. I'm not sure "cyclomatic complexity" is really that important. I vaguely recall a paper that demonstrated that all complexity metrics correlated very closely with function size, so that "size" is really the most important complexity metric there is. Greg: The paper is El Emam et al's "The Confounding Effect of Class Size on the Validity of Object-Oriented Metrics", and yeah, that's something I want to add to the lecture. Test Some Software. I was surprised to see this so late in the curriculum. One of the hardest things I've found about unit testing is writing code so that it's testable. I would have put it up earlier and used unit tests throughout the problems, which would also illustrate how to use unit tests in the different contexts (e.g., unit testing with image analysis). It would also be nice to see some SE testing concepts like category partition testing, code coverage, and fuzz testing. Greg: I've tried that, but given most people's instinctive aversion to testing, I found that I had to move it later so that I'd built up enough credibility that they'd listen to me :-) You're right, though, I should move it earlier. Lorin: I think that if you could do nothing else but reduce people's aversion to testing, the course would still be worth it. ;) An astounding development (to me, anyways), is how "cool" testing has become in the (agile) software engineering community, unit testing in particular. There are all sorts of testing tools and frameworks everywhere, and many TDD advocates. I don't have a clue how to transfer this interest to the scientific community, though. Share Work With Colleagues. In the version control lecture, you note that "this lecture will use a GUI like SmartSVN so that students don't need to know how to use a shell in order to use version control." But, don't the students really need to learn how to use the shell to use many of their tools effectively? You have "Using the Unix Shell" as a topic in the course announcement, but I don't see it show up as its own topic. Greg: I'm planning to take the shell out—while I use it all the time, and think most power users do likewise, it didn't make the cut when the number of lectures was restricted. (And it's hard to convince someone who's used to GUIs that the shell is worth learning: the payoff takes a long time to arrive...) If I cut binary data handling and/or vectorization, this is a strong candidate to go back in. Lorin: That makes sense... It does take a long time before you're more productive in the shell than the GUI. It's a shame, though. XML. You could probably drop XHTML safely. I don't think it's that popular in practice, and since most HTML out there is not valid XML, if they tried to use XML-based approaches to do HTML scraping, it would fail pretty quickly. (You really need something like Beautiful Soup to do HTML parsing, but I wouldn't use that to teach XML!). Greg: Agreed. Some of the topics I would call "paradigms", these are going to be hard to fit into a single lecture, such as: Object-Oriented Programming. I'm torn about this. It's hard for me to imagine teaching the OOP concepts in a single lecture. I think the Liskov Substitution Principle could probably be dropped (how often does it really come up in practice?) I'm also a little fearful because inheritance tends to be overused in practice. I'd also drop the design patterns (I don't think they'll understand OO well enough to observe that at this point), and possibly even the overloading operators. Greg: I agree that it's impossible, but everyone asks for it every time the course is taught. Represent Information. This is a lot of concepts to squeeze into a lecture. If you were to prioritize this, I think database design (and ERD) are more importance in practice than some of the UML stuff. RDF can be safely dropped. Greg: Good points. Build a Desktop User Interface. Event-driven programming is a big conceptual leap. I'd probably put state diagrams or statecharts in here. Plus, it's always very tough to pick a GUI toolkit. Greg: I was going to use Tkinter—yes, it's broken, but if the main goal is to teach event-driven programming, it'll get the idea across without students having to install anything else. Lorin: Yeah, that sounds reasonable. Tkinter is nice and simple, and it's a great example of the application of first-class functions. It's too bad Python doesn't come with a drag-and-drop GUI builder. When you're starting out with GUI building, it's hard to see the advantage of programmatically defining a GUI layout. Other comments: Maybe have some content about online resources: where to go to ask a question when you try to apply these and get stuck. StackOverflow, IRC channels, "How to ask questions the smart way", pastebin.com/pastie.com, showmedo.com, etc. (This really wouldn't be a full lecture, maybe just a web page on this?) Personally, I'm bored to tears sitting in a lecture when there's source code in the slides. I think your ultimate idea of having a self-paced web-based course is a good one. There's lots of reference material out there on these concepts, but finding worked out examples is rarer. I think the biggest challenge for someone trying these things will be when their personal problem diverges for the example problem in some way and they don't know how to proceed. Final question: Have you followed up on previous SC students to see what techniques/practices they adopt after attending the course? Greg: I did once, but can't use the data (long story); I'll be following up with the students from this past July at Christmas to see what's stuck and what hasn't. Wish I'd been more systematic in the past, but 20/20 hindsight... Read More ›

The Hacker Within
Greg Wilson / 2009-10-05
I did a web search for "software carpentry" a couple of weeks ago to update my impressions of how the material was being used, and stumbled across The Hacker Within,a self-help group at the University of Wisconsin for grad students in science who want to improve their computing skills. I had a great chat with the organizers by phone on Thursday, and hope to get there in the spring to meet them in person. It's always great to meet the people the course is designed to help; I'd welcome pointers to other groups if you have them. Read More ›

A Strange Obsession
Greg Wilson / 2009-09-29
I just spent a few minutes browsing these slides from a talk given in Karlsruhe in September by Microsoft's Fabrizio Gagliardi. The talk's title was "Cloud Computing for Scientific Research", and it's chock-full of big: mega-this, peta-that, and isn't it all exciting? The only mention of anything at the desktop scale is on slide 19, which mentions a plugin to allow MATLAB to talk to the cloud, and Excel views of Azure data. Once again, I'm puzzled (and a bit disappointed) that the world's premier desktop software company has decided to ignore what most scientists care about most. I'm equally disappointed that there was nothing at all in these slides about improving scientists' skills, especially since Microsoft has invested so heavily in improving its own processes. Oh well... Read More ›

Presentation, Presentation, Presentation
Greg Wilson / 2009-09-24
Right now, the Software Carpentry material is basically printed pages on the web. Each lecture is a linear HTML page: bullet point follows bullet point, interrupted only by code snippets, tables, and diagrams. If I'm going to update the content, I'd also like to update the presentation; the question is, "To what?" An audio recording of me talking over the slides would add some value, though I think that typing in what I would say would probably be more useful, since most people can read faster than I can speak, and audio still isn't googleable. I've also thought about recording screecasts (audio on top of a video recording of my computer desktop). That would allow me to show live coding sessions, which I think many students would find valuable. Flipping that around, I could embed small snippets of video in the HTML pages. Then there are tools like Crunchy that allow you to create tutorials by embedding snippets of Python in web pages. That could help the programming parts of the course, but not with version control, Make (if we stick to Make, which I hope we don't), or many other parts. So: what's the best online tutorial you've ever seen? What made it the best? Do you know how much effort it took to build the first time? How much effort it would take to build once the authors were experts in [name of tutorial-building technology goes here]? Pointers would be very welcome... Read More ›

Grant Proposal
Greg Wilson / 2009-09-22
Rebuilding the Software Carpentry course is too big to do in anyone's spare time, so I've submitted a small funding application (included below). If you know any venues that might welcome a similar proposal, I'd be grateful for pointers. A Self-Paced Software Skills Bootcamp for Research Scientists Proposed Dates and Venue Trial run to take place online in July 2010. Type of Activity Online learning community built around a web-based self-paced course. Executive Summary The aim of this work is to convert a highly-successful three-week training course into an online learning community where graduate students and researchers in science and engineering can learn, improve, and share the fundamental software development skills that are directly relevant to their work. Participants will learn how to create, use, and share software that does what they need reliably and efficiently, so that they can spend more time doing leading-edge science. Academic/Scientific Objectives Background It is commonplace to observe that computers are as important to modern science as test tubes and whiteboards. What is pointed out less often is how poorly scientists use them. From high school onward, scientists are expected to calibrate their equipment and take careful notes when doing experiments. When those same scientists use computers for simulations and data analysis, however, they have very different standards: many do not keep track of software and parameters accurately enough to be able to reproduce their results, and have no idea how reliable their software is. To date, scientists have not worried about these issues because there has been no incentive for them to do so. Journal or conference reviewers rarely ask how (or whether) code was tested, and grant reviewers rarely ask whether how much of the time spent writing software was well spent. As the pace of discovery accelerates, however, there is increasing pressure for scientists to build things right and quickly. A wealth of empirical software engineering research over the past 30 years has demonstrated that the best-in fact, the only-way to improve quality and productivity is to improve the way in which software is built. As in manufacturing and medicine, investments here repay themselves several times over because mistakes are more expensive to fix than to prevent. This realization is at the heart of modern software development processes, but to date, these have only been adopted by a small minority of scientists. Prior Work Since 1997, I have been teaching software engineering to scientists and engineers at national laboratories, companies, and universities in the US and Canada. My aim has not been to turn them into professional programmers, but rather to equip them with the skills they need to design, build, maintain, and use software effectively in their research. The materials for my course, which have been available under an open license since August 2006, have been viewed by over 135,000 distinct visitors from 70 countries, and have been used at universities and companies in both Americas, Europe, and the Far East. Topics covered include: Version control Basic object-oriented design Automated builds Unit testing and test-driven development Basic scripting Agile development processes Maintenance and integration Working with plain text, XML, and relational data Thanks to sponsorship from MITACS and Cybera, this material was offered to graduate students from several Canadian universities as a condensed three-week "crash course" in July 2009. For the first time, the course was run in a distributed fashion: half the students taking part were in Toronto, while the other half were in Edmonon, and lectures were webcast in both directions. Students did not collaborate directly on programming projects during the course, but all took part in the same interactive question and answer sessions after each lecture. While there were a few technical hiccups, response from participants was extremely positive. As the letters of support attached to this proposal show, both they and their supervisors felt that this training would make them significantly more productive, and allow them to tackle problems that were previously out of reach. Proposal My term goal is to bring this kind of training to the widest possible audience. Building on this past summer's success, and on the experiences of colleagues who have begun to teach online, I believe the time has come to create a self-paced version of this material that would use video lectures and screencasts to present the material, and Web 2.0 collaboration tools to foster an online learning community around it. This would allow students to focus on topics that were most directly relevant to their needs, and to absorb material at whatever pace suited them best, but at the same time give them somewhere to turn with questions. It would also indirectly help foster ties between young Canadian researchers in science and engineering without requiring them to take three weeks out of their lives (something that participants singled out as their major complaint about the course in the post mortem held on July 31). The proposed schedule for this work is: May'10-Jun'10: Update the existing course material and create the first six video lectures. Based on colleagues' experiences, I estimate between a 20:1 and 50:1 ratio of production time to content length during this trial period. Jul'10: Make these initial lectures available online to students who agree to participate in follow-up interviews to identify areas for improvement, and to help bootstrap the online learning community-as many studies have shown, such communities are far more likely to take off if they are seeded with some initial content. Aug'10-Sep'10: Analyze interviews, make improvements to initial lectures, and draw up a detailed proposal for full implementation to submit to MITACS and other agencies. To jumpstart the online learning community, the lecture notes and examples will be converted to MediaWiki format (the same one used in Wikipedia). The first wave of students will correct, clarify, and extend these notes under the supervision of a course instructor; they, and the instructor, will then provide feedback on changes proposed by students doing the course remotely at their own pace. This will in turn be combined with user-contributed video content of the kind hosted at ShowMeDo.com, which already features some screencasts of Software Carpentry material, and with the by-now-usual mix of online forums and collaborative link curation. Read More ›

Another Reason to Care About Provenance
Greg Wilson / 2009-09-21
A vice president at ETH is resigning today because of accusations of fraud. A key lab notebook is missing, retractions have been retracted, and according to ETH President Ralph Eichler, "there is now no legal way of finding out for sure who was responsible for the falsifications." Question: if someone accused you of falsifying results, how well and how easily could you defend yourself? How long would it take you to pull together all the notes, data, and programs you used three years ago to produce the paper being challenged? And would you career survive? Even if you "won", you would probably lose weeks or months of research time. I think this is one of the strongest arguments in favor of using data provenance systems. Young scientists' lives are difficult enough (see for example Peter Lawrence's recent PLoS Biology article); an accusation of fraud, well-intentioned or otherwise, could effectively destroy someone's career even if it was unfounded. Using computers to create an audit trail is just good insurance... Read More ›

Updated Outline for Revised Course
Greg Wilson / 2009-09-18
I have updated the description [no longer online] of how I plan/hope to reorganize the course. My thanks to everyone who commented on the earlier draft; I'd be very grateful for feedback on this one as well—I realize that some of the lectures are still hopelessly ambitious, but I hope it's at least a target to shoot at. Read More ›

Partial Outline of New Version of Course
Greg Wilson / 2009-09-15
I have put a mostly-complete draft of my ideas for a rewrite of the course online. This version would start with specific problems, then backfill the tools and skills needed to solve them, but there are some big outstanding issues: There still isn't anything on the computational thinking skills that Jon Udell identified. I don't know how to tackle these without relying more than I want to on closed-source offerings like Yahoo! Pipes that could disappear at a moment's notice. I just can't see how to fit some topics—particularly object-oriented programming and web programming—into the available space (a maximum of two hours of lecture and four hours of lab for each). What are the minimum useful problem/lecture for each? There are a few others as well (just search for 'TODO'), but those are the big ones. Any help you can give would be very welcome. Read More ›

Two Links
Greg Wilson / 2009-09-11
Mendeley has been getting some good press—anyone have first-hand experiences to share? A Nature special on data sharing. Bryn Nelson's piece on empty archives is particularly worth reading. Read More ›

Job Opening: MITACS Scientific Coordinator
Greg Wilson / 2009-09-11
MITACS (which provided funding for this summer's offering of the course, and which funds a lot of other mathematically-oriented work in Canada) has an opening for a scientific coordinator. Details are in the job posting; specific responsibilities include: Promote MITACS programs within the scientific community and provide a point of contact for feedback and questions Ensure quality, fairness and timeliness of scientific peer-review processes Program management tasks such as: Establish program strategy and action plans for new MITACS programs Develop and monitor program budgets Coordinate with other MITACS staff on program implementation and ongoing monitoring of program effectiveness Work closely with the Communications and Government Relations Department on program promotion and feedback Provide regular status updates, reports and program evaluation Other duties as assigned from time to time If you'd like to help shape the development of mathematics and science in Canada, this might be the job for you... Read More ›

R for Programmers?
Greg Wilson / 2009-09-05
What's the best introduction to the statistical language R for experienced programmers who are more interested in its support for objects, systems programming, and the like than they are in statistics? Read More ›

Is The Future Waving At You?
Greg Wilson / 2009-08-30
Cameron Neylon has been playing with Google Wave, and he likes it. His presentation at Science Online in London in August explains why (you can also watch video, though sadly there's no soundtrack). He's even writing robots to automate some scientifically interesting tasks. Nature News liked Wave too, which (a) reminds me yet again of how prescient Jon Udell's "Internet Groupware for Scientific Collaboration" was ten years ago, and (b) makes me wonder (also again) how much this course should be re-thought. The Unix shell philosophy of creating lots of simple single-purpose tools and then combining them in rich ways has clearly found its second wind on the web. Just look at the options: Ad hoc services using something like Django or Rails Drag-and-drop GUIs like Yahoo! Pipes Special-purpose frameworks like Galaxy Workflow tools like Taverna Next-generation scripting with something like PowerShell and on and on and on. Each has its own opinion on what the problem to be solved actually is; each requires different skills, and with the exception of Taverna and Galaxy, they regard scientific computing as one niche interest among many. The problem, of course, that with so many different ways to do it, no matter which one(s) the course covers, students will probably be faced with something else when they go back to the lab. Read More ›

How Important is Geospatial Data to You?
Greg Wilson / 2009-08-26
Software Carpentry currently teaches students how to manipulate text (using regular expressions), XML (using DOM), relational data (with SQL), and binary data. A decade ago, when we first put the course together, that covered everything I'd ever seen more than one or two scientists use. Today, though, an increasing number are using geospatial (map) data as well. How important is this to your work? If the answer is "very", what data do you work with, what do you do with it, and what would you like to be able to do? Read More ›

Who Owns Your Data?
Greg Wilson / 2009-08-24
Nature's "Great Beyond" blog reports another attempt to force climate scientists to release their data. I'm of at least two minds on this: I believe openness is absolutely crucial to science (and society as a whole), but I understand scientists' concern about being scooped, and equally their concern about having their work misrepresented or quoted out of context. Mostly, I come down on the side of openness—how about you? Read More ›

Science and JoVE
Greg Wilson / 2009-08-24
Science and JoVE, the Journal of Visualized Experiments, have partnered to produce and publish scientific videos online. The aim is to enhance scientific articles published in Science through video demonstrations of experimental techniques. See the announcement for a link to the first joint work. Read More ›

Playing Safe
Greg Wilson / 2009-08-24
This thoughtful article from the New York Times asks whether the current grant system for funding research discourages researchers from taking risks. My personal experience undoubtedly biases me, but I tend to agree—the problem is coming up with something better. Read More ›

Bad News and Good News
Greg Wilson / 2009-08-24
The bad news is, retractions of scientific papers have risen tenfold since 1990. The good news is, the rate has gone from 0.0007% to 0.007%. Going back to bad news, though, some estimates of how many papers ought to be retracted are around 1%, so we still have a long way to go. Read More ›

The Delight Is In The Details, Too
Greg Wilson / 2009-08-23
They say the devil is in the details, but so's the delight, because it's the details that determine whether something works or doesn't. So let's take a look at how to translate the last post's "big picture" into actual course content. Every competent developer uses some kind of tool to automate tasks that involve dependencies. The best known is still Make, which compares the last-modified-times on files to see which ones are stale, and runs shell commands to bring them up to date. Ant, Rake, and whatever's built into your IDE all work basically the same way, and can all be used (with greater or less ease) to recompile software, re-run tests, prepare distributions, update web sites, and what have you. Dependency managers are an example of the kind of tool scientists are willing to spend an hour learning (more if they're working with C++ or Java, less if they're working with a scripting language). Understanding how they work, though, requires at least some familiarity with: automation (obviously) declarative programming (the user declares the "what", the computer figures out the "how") graphs (which is how these tools figure out what order to do things in) queries (since rules are often best expressed using pattern matching) programs as data (since dependency managers are programs that run other programs) So, can we use Make to teach these concepts? Or teach these concepts using Make as an example? I thought so back in 2003 when I put together the first version of "CSC207: Software Design" for the University of Toronto. In their first two programming exercises, students worked with graphs and wrote simple text parsers using regular expressions. They then had to put the two together to create a very (very) simple version of Make. I thought it worked well, but over the years the exercises were cut back until eventually this one disappeared entirely. There was just too much material in the course, and the various bits weren't connected strongly enough. While it might work in theory, it didn't in practice, and would probably fare even less well if crammed into two days of an intensive two-week course. It's still a good example of how I'd like to tie the practical and conceptual parts of the course together, though; the trick is finding a way to make it work. Read More ›

The Big Picture
Greg Wilson / 2009-08-21
One of the lessons we learned at Los Alamos National Laboratory in the 1990s and early 2000s is that most scientists don't actually want to learn how to program—they want solve scientific problems.To many,programming is a tax they have to pay in order to do their research.To the rest,it's something they really would find interesting,but they have a grant deadline coming up and a paper to finish. Getting scientists to make time to learn fundamental ideas that aren't directly relevant to the problems in front of them is an even harder sell. Partly it's those pesky deadlines again, but it's also often the case that the big picture doesn't make sense until you have first-hand experience with the details. Take abstraction, for instance, or the difference between interface and implementation: if you haven't written or modified software where those ideas saved you time and heartache, no amount of handwaving is going to get the idea across. The problem, of course, is that it's impossible to program well without understanding those bigger concepts. Software Carpentry therefore has to: Give scientists programming skills that have a high likelihood of paying large dividends in the short term. Convey the fundamental ideas needed to make sensible decisions about software without explicitly appearing to do so. Based on our experiences in the last 12 years, the skills that students need are fairly settled: Clean coding (both micro-level readability and macro-level modularity) Version control Process automation for building, testing, and deploying software How to package software for distribution and deployment Managing information and workflow (from bug trackers to blogs) Consuming data: Text (line-oriented parsing with regular expressions) Hierarchical (XML) Binary Relational Building desktop GUIs and visualizing data Basic security: public/private keys, digital signatures, identity management Publishing data and providing services on the web As Karen Reid and others have pointed out, doing all of that properly would earn you at least a minor in Computer Science at most universities. Cramming it into two weeks is simply not possible. The bigger picture stuff isn't as clear yet, but is starting to come into focus. The buzzword du jour, computational thinking, means different things to different people, but Jon Udell's definition is a good starting point. For him, computational thinking includes: Abstraction: ignoring details in order to take advantage of similarities A key concept is the difference between interface and implementation Querying: understanding how fuzzy matching, Boolean operations, and aggregate/filter dataflow works This depends somewhat on understanding how to think in sets Structured data: including hierarchical structure, the notion of meta-data (such as tagging and schemas), and so on Equally important is understanding that programs work best with structured data, so structure improves findability and automation Automation: having the computer do routine tasks so that people don't have to Indirection: giving someone a reference to data, rather than a copy of the data, so their view of it is always fresh Syndication: publishing data for general use, rather than sending it directly to a restricted set of people The inverse is provenance: where did this data come from, and what was done to it? I would like to add all of the following, though I realize that doing so gets us back into "B.Sc. in a week" problems: Name spaces, call stacks, and recursion Computational complexity: why some algorithms are intrinsically faster than others How data is organized: Values vs. references and the notion of aliasing By-location structures (lists, vectors, and arrays) By-name structures (dictionaries and records) By-containment structures (trees) By-traversal structures (graphs) Programming models: Procedural Aggregate (whole-array, whole-list, etc.) Object-oriented Declarative Event-driven (which brings in the difference between frameworks and libraries Programs as data Functions as objects (another form of abstraction) Programs that operate on programs (Make, drivers for legacy programs) Quality, including: What makes good code better than bad code (psychological underpinnings) Testing (including the economics of testing) Debugging (the scientific method applied to software) The difference between verification ("have we done the thing right?") and validation ("have we done the right thing?") Continuous improvement via reflection on root causes of errors Basic concurrency: Transactions vs. race conditions Deadlock (much less important in practice) Handling failures Bricolage: how to find/adapt/combine odds and ends (these days, on the web) to solve a problem I call on all of this knowledge routinely even when solving trivial problems. This morning, for example, I: did a search to find a wiki markup processor I could run from the command line, downloaded and installed it, changed five lines in the main routine to insert some extra text in its output, added a ten-line filter function to overwrite the inserted text with some command-line parameter values, and added fourteen lines to a Makefile to turn the wiki text into HTML whenever it's stale. It took roughly 15 minutes, and will save me hours in the weeks to come. However, it only took 15 minutes because I've spent 29 years mastering the skills and ideas listed earlier. The challenge in creating Version 4.0 of this course will be to figure out how to convey as many of those skills and ideas can be squeezed into two weeks. Read More ›

You Can Do a Lot Without Programming
Greg Wilson / 2009-08-15
From Cameron Neylon, a short video showing how to embed molecules from the ChemSpider service into a wiki page. I was surprised and impressed to discover during his visit to Toronto just how little programming Cameron does: mostly, he leverages his understanding of how information moves around the Internet to plumb existing tools and services together. This is part of (or dependent on) what Jeannette Wing calls "computational thinking", and one of the goals for the next revision of this course is to focus more on those kinds of skills. Read More ›

It's Like Not Wearing Your Cleats in the House
Greg Wilson / 2009-08-15
Carl Zimmer, one of my favorite science writers, recently posted about three new books aimed at scientists: Unscientific America, Am I Making Myself Clear?, and Don't Be Such a Scientist. All three are aimed squarely at the biggest problem modern science faces—the inability of most scientists to explain themselves to non-specialists—and all three are now on my read-soon list. I don't think communication skills will ever be part of this course, but given the problems our planet faces, they damn well need to be part of every scientist's education. Read More ›

American Scientist Article on How Scientists Use Computers
Greg Wilson / 2009-08-06
American Scientist has just published a short article summarizing the results from the survey we did last year of how scientists actually use computers. Read More ›

The Ice Cream Test
Greg Wilson / 2009-08-04
Extensive experimentation has uncovered a foolproof way to tell Aussies from Brits: simply observe their reaction to ice cream. Aussie Brit Read More ›

What *Is* Open Science?
Greg Wilson / 2009-08-03
Over at the OpenScience Project, Dan Gezelter is trying to define what "open science" means. He thinks there are four key points: Transparency in experimental methodology, observation, and collection of data. Public availability and reusability of scientific data. Public accessibility and transparency of scientific communication. Using web-based tools to facilitate scientific collaboration. What changes need to be made to this course to prepare people to do these things effectively? Read More ›

Guest Speakers' Slides Now Available
Greg Wilson / 2009-08-03
Our guest speakers' slides are now on the web; video will follow soon. Read More ›

Next Steps
Greg Wilson / 2009-08-02
It's clear from Friday's end-of-course review that the course needs shaking up. Before that starts, though, there's a higher-level question to answer: should the course notes be converted to a wiki to encourage contributions from others? It was always my hope that other people would contribute material, but in four years, only five ever have; perhaps wikification would change that. Right now, the notes are stored as HTML pages in a Subversion repository and compiled by a little Python script to resolve cross-references, insert code samples, and so on. The advantages of this approach are: People can work locally and push coordinated changes when ready. Slide format can be skinned by changing a flag in the Makefile to select different CSS. (For example, I'm still hoping to get S5 or S5R working.) The build step can also insert code fragments, ensure that bibliography references resolve, etc. Advantages of a wiki are: Easier collaboration: people can make small fixes in place without doing an "svn checkout" or running Make. As a programmer, the first three weigh heavier in my mind than the last one, but again, only five people have contributed material in four years, which isn't sustainable. What do you think? Would switching to a wiki make you more likely to add material or not? Read More ›

Post-Mortem
Greg Wilson / 2009-08-01
Today was the last day of the course, so we spent the morning talking about what had gone well and what had not. The high and low points were: Good The course was fun. The TAs were fantastic. The format (one hour of lecture plus two hours of lab, twice a day) worked well. Enjoyed the parts where the instructors programmed live. Liked the emphasis on working practices that complement coding. Liked the spread of topics, and the variability of things that are useful in all the different fields. Liked the pair programming. Welcomed exposure to standard libraries that weren't necessarily covered in the course. Liked the pre-arrival questions about what people knew, were doing, and wanted from the course. The examples were good. So were the donuts. Bad Three weeks is too long. Some of the later topics were not as useful. Would have preferred to use standard libraries for the image processing lecture and exercises instead of simplified libraries. Too little coverage of too many subjects. The formatting of the slides leaves much to be desired. Too many lectures ran over time (which was particularly hard in afternoon sessions). Divided attention in FriendFeed is a problem. The less applied stuff (e.g., computational complexity) wasn't as useful or as interesting. Students weren't given enough time to work on their own projects. Didn't feel encouraged to make suggestions or provide feedback. Not enough on shell programming. Too much shell programming. A/V between Toronto and Edmonton was crude by modern standards. Changes More on object-oriented programming. More feedback on the students' solutions to the exercises—they didn't get the equivalent of grading. Put the exercises up before the class, so that students know what the lecture's going to be leading them to. It's been a good three weeks—I enjoyed getting to know the students, and look forward to seeing what they do with what they've learned. Read More ›

Day[-2]
Greg Wilson / 2009-07-31
Today (Thursday) was the second-to-last of the course. It's been a long haul, but we hope a rewarding one. In the morning, the students had an hour-long overview of results from empirical studies of real-world software engineering; in the afternoon, we looked at how traditional and agile development processes are responses to those facts. Tomorrow morning, we'll spend an hour talking about what's gone right and what's gone wrong, then head out for a farewell lunch. And in late-breaking news, one of the students, Mark Tovey, has started a blog on open source cognitive science. Thinking about it now, we should have required all of the students to start and maintain blogs during the course; here's hoping some will do it now of their own accord (hint, hint). Read More ›

A Good Afternoon
Greg Wilson / 2009-07-31
Yesterday afternoon, the students and ninety other guests were treated to six engaging talks about Science 2.0 from Titus Brown, Cameron Neylon, Victoria Stodden, David Rich, Michael Nielsen, and Jon Udell. We'll post slides and video here as soon as we get them; until then, you can catch up on what happened in the FriendFeed room or by reading Steve Easterbrook's real-time blog of the event. Our thanks once again to everyone who made the day possible: MaRS for the space, MITACS and Cybera for funding, SciNet, Steve Easterbrook, and an anonymous donor for additional sponsorship, our student volunteers for taking care of all the little things, and most especially Jennifer Dodd for organizing it all. Later: Andrew Louis has posted some pictures; we'd be grateful for pointers to more. Joey deVilla's notes and photos from Titus Brown's opening talk. ...and from Cameron Neylon's talk on open notebook science (which are echoed at the MSDN Developer Connection blog)... ...and from Victoria Stodden's (which are ditto). Jon Udell has some thoughts about LaTeX-in-the-web and user innovation. Cameron Neylon discusses both the undergrad student demos he saw in the morning, and the Science 2.0 talks from the afternoon. Titus Brown's impressions. Andrew Petersen thinks it'll be a long time coming. Read More ›

Every Day Is a Big Day...
Greg Wilson / 2009-07-29
...but today is bigger than most: after lectures yesterday on server-side web programming and building GUIs, students will spend this morning exploring ways to apply what they've learned in the course to their own research problems. This afternoon, we have a stellar line up of speakers from 1:00 to 6:00 pm at the MaRS Centre to talk about how the web is changing the way science is done. We'll post video of the talks as soon as we can, but if you'd like to follow along in real time, we are: Twitter tag: #tosci20 Friendfeed: Toronto Science 2.0 (http://friendfeed.com/toronto-science-2-0) Read More ›

Day 11 and Day 12
Greg Wilson / 2009-07-28
Yesterday (Monday) morning we covered the basics of handling binary data, including bit twiddling and the use of Python's struct module to pack and unpack binary representations of objects. The afternoon was a lightning introduction to how the web works: a simple socket example (just to show students the plumbing) was followed by a description of HTTP's requests and responses, then a look at urllib. Today is going to be devoted to the absolute bare bones of server-side programming, with GUI programming using Tkinter as a follow-up. "Fast paced" doesn't even touch it, but I hope students will come away with an idea of what's possible, and where to look for more information. Read More ›

Where This Course Came From
Greg Wilson / 2009-07-27
I have created an Amazon.com Listmania list [no longer online] for the books that most directly influenced this course. Some topics aren't directly represented—there's nothing devoted to handling XML or binary data, for example, or on big-Oh complexity or web programming—but I hope what's there will be of use. Read More ›

Martin Fenner on SciBarCamp
Greg Wilson / 2009-07-26
Nature's Martin Fenner has blogged a summary of what he heard and saw at SciBarCamp'09 in Palo Alto a couple of weeks ago. Cameron Neylon was there too—in fact, I'd be willing to bet that a healthy number of attendees posted their experiences in some form or another. Maybe the next challenge for scientific publishing is something that will aggregate and summarize disparate reports of such events? What I'm most interested in, though, is figuring out what's needed to make that kind of reporting happen. Mark Tovey has been reporting on the Software Carpentry course in near-real time at FriendFeed, but there have been very few comments or contributions from other students. Is his record complete enough that no one can think of anything to add? As quickly as he types, that still seems unlikely, and even if it was true, I'd expect people to have questions or to want to add more detail. Is it the classroom setting, and all the behavioral baggage that brings with it? I had the students in my software engineering class last term write up lectures as wiki pages to earn 10% of their course grade; perhaps that kind of stick has to accompany the carrot of social good? Read More ›

Day 9
Greg Wilson / 2009-07-24
I meant to write this post yesterday, but we're watching Torchwood: Children of Earth, and, well, you know how it goes. We spent two sessions looking at the basics of SQL and relational databases, which almost every scientist is going to encounter at some point in their career. The morning exercises were obviously too easy—students completed them in a little over and hour—but the afternoon seemed to go at the right pace. This morning we're going to look at XML, and this afternoon at text processing with regular expressions. Read More ›

Day 10 Done - and With It, Week 2
Greg Wilson / 2009-07-24
XML in the morning, regular expressions in the afternoon—it's been a long week, but a productive one (I hope). We're going to shuffle some of the material around so that we can do binary data processing, web client and server programming, and GUI programming next week. The highlight, of course, will be the guest speakers on Wednesday afternoon. I hope the students have a restful weekend; I'm looking forward to the last lap. Read More ›

Day 8: Getting It Right
Greg Wilson / 2009-07-22
This morning's lecture was a high-level look at software testing, with a long-ish detour into exceptions; this afternoon was an introduction to Python's unittest framework, with some handwaving about testing that functions throw the right kinds of exceptions, replacing files and other slow objects with mock equivalents, and how you decide what tests to write first. It felt like it went better than yesterday's lectures on object-oriented programming, which I think left some students a little bewildered. We're spending tomorrow looking at databases, which means I have to brush up on my SQL tonight... Read More ›

Day 7: Lots More Objects
Greg Wilson / 2009-07-22
We took a detour from the notes yesterday morning and built some classes to represent matrices. One reason was to show students how to break a problem down into pieces and solve them one by one; another was to introduce the notions of encapsulation, polymorphism, and inheritance. Aran Donohue (one of the TAs here in Toronto) thinks that inheritance is over-taught, and taught too early—he may be right, but it's also hard to avoid, since so many libraries and frameworks (and tutorials, but that's a circular argument) expect familiarity with it. We looked at operator overloading and (very briefly) at design patterns in the afternoon, then did a modeling exercise. What classes would you create if you were writing a program to simulate solar systems? What would those classes' responsibilities be, and which classes would collaborate with which others? The most creative solution included Santa Claus; the most complete took collisions and explosions into account. Day 8 will be quality assurance and unit testing. By the time we're done, we'll be halfway through... Read More ›

Elsevier's Future, Version 0.1
Greg Wilson / 2009-07-21
Elsevier (a big scientific publisher) have unveiled their guess of what the future of scientific publishing looks like. Tabs? Got 'em. Audio? Yup. RSS feeds for updates, corrections, and comments? The raw data, complete with provenance information so other researchers can try to reproduce the results? Still waiting... Read More ›

Day 6: Theory and Practice
Greg Wilson / 2009-07-21
We started Week 2 with a morning of theory: Francois Pitt, a senior lecturer in U of T's Computer Science department, gave students a crash course in algorithmic complexity. It's not something they'll use every day (or even every week), but if they ever have a conversation with a computer scientist about making something go faster, the odds are good it'll come up. The afternoon's material—the basics of classes and objects and Python—was more immediately practical. As always, I struggled with the fact that you have to know several things before any of them are useful: constructors don't make sense until you know what objects are, but you can't build objects without constructors (at least, not cleanly). The students spent the afternoon building simple classes to represent molecules; we'll spend tomorrow building more classes, then move on to testing and QA on Wednesday. Read More ›

Quantum to Cosmos: October 15-25 in Waterloo
Greg Wilson / 2009-07-19
The Perimeter Institute is organizing a ten-day science festival in Waterloo this October called "Quantum to Cosmos". It promises to take you from the strange subatomic world of the quantum to the outer reaches of the cosmic frontier. And for those who can't make it, all events will be streamed online live and on demand 24 hours a day. Hope to see you there! Read More ›

Day 5
Greg Wilson / 2009-07-19
Day 5 of the course: Paul Lu gave a lecture on Make in the morning, and students had a chance to work on problems of their choice in the afternoon. Of all the tools we teach in this course, Make is the least satisfying (at least to me): the basic concept is simple, and I wouldn't run a project without some automating repetitive tasks, but Make's syntax and limitations are repellant. The problem is, the alternatives available today are equally unsatisfying. Ant requires human beings to write XML (the assembly code of the internet), and extending it requires serious skillz. Its creator, James Duncan Davidson, said back in 2004 that if he had it to do all over again, he'd have used a general-purpose scripting language as a base instead. Lots of build systems do, including CONS, SCons, and Rake, but they're all still niche products with small user bases, weak IDE integration, and even weaker documentation. Most tellingly, none of these tools has a native debugger that's as useful as breakpoint-based debuggers for conventional programming languages. Rocky Bernstein's "remake" for GNU Make comes closest, but figuring out why something didn't get updated, or why hundreds of commands are executing when they're not supposed to, is still hard. This course is now 12 years old, and for 11 of those 12 years, I've been wishing for something better to offer students so that they could see just how much work task automation could save them. I'm still wishing... Tomorrow (Monday) is Day 6 of the course; Francois Pitt will talk about computational complexity in the morning, and we'll start object-oriented programming in the afternoon. I'm enjoying it so far — hope the students are too. Read More ›

Day 4
Greg Wilson / 2009-07-17
Today was all about the Unix shell: Karen Reid covered a wide range of topics, often using live examples. Some of the students had never used a command-line prompt before, so all of the material was new; others already knew their way around pretty well, but Karen's explanations filled in some gaps and made sense out of what had, for some of them, seemed like arbitrary magic. Tomorrow morning, Paul Lu is going to introduce them to Make. We're going to spend the afternoon working on the students' own problems, which range from automating workflows to visualization with VTK to manipulating geographical data with GRASS. There may also be some ice cream... Read More ›

Day 3
Greg Wilson / 2009-07-16
Day 3 of the course—we did sets and dictionaries in the morning (glossing over the handwaving in the notes about computational complexity, since Francois Pitt is going to cover that properly later), and then Paul Gries showed students some basic image processing. Students found the morning exercise challenging, but seemed to be getting lots out of it. The visit to the pub this evening was still quite welcome, though :-). Tomorrow, Karen Reid is introducing them to the Unix shell; we'll let you know how it goes. Read More ›

Day 2
Greg Wilson / 2009-07-15
After three Python lectures yesterday, we slowed down a bit today and looked at a pair of tools every professional programmer relies on: symbolic debuggers and version control systems. Paul Lu covered the former topic using WingIDE; Ken Bauer covered the latter using the SmartSVN GUI for Subversion. Working in pairs seems to agree with the students, though as is so often the case, installing and configuring tools is often more work than actually using them. We'll look at Python's sets and dictionaries tomorrow morning, then Paul Gries will give a quick introduction to image processing in the afternoon. Read More ›

Aaaand They're Off!
Greg Wilson / 2009-07-13
We had our first lecture this morning, and the students are working on their first exercises right now in pairs. You can follow progress at http://friendfeed.com/softwarecarpentryjuly2009. Read More ›

See You Monday!
Greg Wilson / 2009-07-10
We're all set up and ready to roll! We decided in the end not to record audio or video for this set of lectures, but we'll be posting updates to the lecture notes as we go along. I look forward to meeting our students on Monday... Read More ›

Registration for July 29 Talks is Now Open
Greg Wilson / 2009-07-04
As part of this summer's run of the course, we have organized an afternoon of public talks from a stellar lineup of speakers. The event is free, but registration is required, and you can now sign up at http://science20.eventbrite.com/. The talks will run 1-6 pm on Wednesday, July 29, and will be followed by a wine and cheese. For details, including speaker bios and abstracts for their talks, see the "Guest Speakers" page. Note: the guest speakers page has been retired. Read More ›

The Environmental e-Science Revolution
Greg Wilson / 2009-06-29
Steve Easterbrook has summarized some papers from a recent Royal Society workshop on how the web is changing the way environmental science is done. We'll be asking students in the Software Carpentry course how much of this they're doing, and how much they would like to. Read More ›

Ready for Proofreading
Greg Wilson / 2009-06-29
We're ready for feedback — if you check out the new slides, you'll see a little feedback bubble at the bottom of each topic. Clicking that will give you a chance to send us email to tell us about formatting glitches, factual errors, or anything else you'd like fixed. Please let us know what you think... Read More ›

Quality Control and Traceability
Greg Wilson / 2009-06-29
Nature News recently reported work by Gurt Vriend at other to clean up records in the widely-used Protein Data Bank (PDB). This is great news, but what's missing is a way to track forward from entries to see what already-published papers have relied on data that's now known to be incorrect. One of the many goals of this course is to give scientists the understanding they need to tackle this problem. Read More ›

Updating the License
Greg Wilson / 2009-06-25
In response to several requests, we have updated the license on the course material: the course content is now covered by the Creative Commons Attribution license, while the example code is (still) covered by an open source MIT license. In plain English, this means that you can re-use course content however you want, as long as you give us credit. Read More ›

Topics and Schedule
Greg Wilson / 2009-06-24
A draft schedule for the July 13-31 offering of the course is now available. We'd welcome your feedback: does this order make sense, are there topics we've included that you don't care about, have we left out anything really important (and if so, what should we drop to make room for it), etc. Note: the draft schedule has been retired. Please see the course outline instead. Read More ›

Another New Version of the Slides
Greg Wilson / 2009-06-23
A new(er) version of the slides has been posted at http://software-carpentry.org that includes styling changes courtesy of Ryan Feeley. There are still many minor formatting glitches; we'll fix them in the coming week, and post a schedule showing which lectures are going to be given when. We've also updated the Guest Speakers page with bios and abstracts for the people who'll be talking at the MaRS Centre in Toronto on July 29. Talks will run 1-6 pm, and will be followed by a wine and cheese. The event is free, but will require advance registration—we'll post details here as soon as we have them. Note: the guest speakers page has been retired. Read More ›

Sightings
Greg Wilson / 2009-06-15
Peter Saffrey sent email from Glasgow last week to say that he'd run a one-day course based on Software Carpentry. If you see other uses, please let us know—we'd be happy to link to them. Read More ›

Neylon's Head in the Clouds
Greg Wilson / 2009-06-15
Cameron Neylon (a guest speaker at this summer's offering of the course) is putting together a paper for the new BMC journal Automated Experimentation titled "Head in the Clouds: Re-imagining the experimental laboratory record for the web-based networked world". It's an excellent description of what a web-native lab notebook could/should look like, and much more besides. Read More ›

And Speaking of Sightings...
Greg Wilson / 2009-06-15
A new (but very rough) version of the slides for the course is now up at http://software-carpentry.org. This uses Eric Meyer's S5 package for formatting and pagination (along with a short Python script to insert cross-references). We were planning to use LaTeX, but after messing around with packages, dependencies, and the like, HTML started to look at lot simpler. We still have a lot of reformatting to do (particularly with tables and code inclusions); if you'd like to help, or if you are a CSS expert and can help make the slides look good with Internet Explorer (right now they're styled for Firefox), please let us know. Read More ›

Two Spots Left in Toronto
Greg Wilson / 2009-06-02
There are only two spots left for the Toronto offering of the course July 13-31. As announced earlier, graduate students from anywhere in Canada are welcome to aplply, and we can provide up to $1500 in support for travel and accommodation costs for out-of-town students. If you are interested, please contact us. Read More ›

Software Carpentry in Edmonton July 13-31
Greg Wilson / 2009-06-02
Registration is now open for students wishing to do Software Carpentry in Edmonton July 13-31. The course will be co-taught with its counterpart in Toronto, though the Edmonton edition is only open to graduate students and post-docs at Alberta institutions. For more information, please see the full announcement. Read More ›

SECSE Workshop
Greg Wilson / 2009-06-01
The latest in a series of workshops on "Software Engineering for Computational Science and Engineering" was held in Vancouver on May 23, just after ICSE'09. Steve Easterbrook has written a good summary of what was discussed, and Jeffrey Carver's longer summary will appear in a future issue of Computing in Science and Engineering. Read More ›

Big Code vs. Science 2.0
Greg Wilson / 2009-06-01
The original goal of this course was to give scientists and engineers the skills they needed to build large pieces of software without heroic effort. It's increasingly clear, though, that another goal is equally important: to help them take part in what's sometimes called "Science 2.0": the sharing of information through the Internet. Cameron Neylon's slides on capturing process are, like Jon Udell's on computational thinking, a good summary of the skills required—skills which overlap with, but are distinct from, those needed by software developers. One interesting wrinkle on this is that Quantiki (a web site devoted to quantum information science) is now hosting video abstracts for papers. As their site says, "The abstracts provide a 'teaser' for the paper and should guide the audience into your work, emphazising what you think is the most important result." This is very cool, as is the video that accompanied the paper in Cell about the effects of adding the human FOXp2 gene to mice. With teenagers routinely creating and sharing how-to videos for skateboarding, it's time scientists started documenting their lab procedures and popularizing their work the same way. The question is, what can we teach so that these archives will be findable and searchable? Another interesting development is Mendeley (which decloaked a couple of weeks ago). Its goal is to make tagging, sharing, and discovery of scientific information so easy that it becomes ubiquitous. This will require new social skills (not to mention better legal and institutional frameworks); to be really effective, it will also require people to pick up new technical skills so that they can customize, blend, and filter information. Again, what can we teach, and how should we teach it? Read More ›

Error Handling
Greg Wilson / 2009-05-12
One topic that isn't currently in the curriculum that I'd really like to add is detecting, handling, reporting, and recovering from errors. This makes up is 10-30% of the code in real applications, but dealing with errors is almost omitted from textbook examples and tutorials for the sake of clarity (Tanenbaum's Minix book being a laudable exception). I have asked elsewhere for someone to write an entire book on the subject; if anyone wants to take a crack at an hour-long lecture, please get in touch. Read More ›

Links for Summer Interns
Greg Wilson / 2009-05-11
Our summer interns started today; our first job is to define exactly what they'll be working on this summer, so it seems like a good time to round up a few links on interesting topics. My apologies for those hidden behind paywalls... Steve's Project Ideas Social network analysis for scientists Electronic lab notebooks Reproducible Research If I said, "I just got a really interesting result in the lab, but I didn't record the steps I took or the settings on the machine," no reputable journal would publish my paper. If I said, "I just got a really interesting computational result," most reviewers and editors wouldn't even ask if I'd archived my code and the parameters I used, or whether that code would run on someone else's machine. Reproducible research (RR) is the idea of making computational science as trustworthy as experimental science by creating tools and working practices that will allow scientists to re-create past results. WaveLab and Reproducible Research The Madagascar project The Sweave project Special issue of Computing in Science & Engineering on reproducibility Data Provenance The "provenance" of an object is the history of where it came from, and how it got here. The provenance of a piece of data is similar: what raw values is it derived from, and what processing was done to create it? Ideally, every piece of scientific software should track this automatically; in practice, very few do, and most scientists don't take advantage of the capability when it's there. That's changing, though, particularly as emphasis on reproducibility grows. The Provenance Challenge: a series of competitions to benchmark provenance tools against one another. Special issue of Concurrency and Computation: Practice & Experience reporting the results of the first challenge Science 2.0 Also called "computer-supported collaborative science", this is the idea of leveraging modern web-based collaboration tools to better connect scientists, their experiments, and their results. It encompasses a broad range of ideas, but "social networking for scientists" based on their interests is near the core, as is "open science" (the idea of making scientific results public in the same way as open source software or Creative Commons publications). Overview article in Scientific American Jon Udell's Internet Groupware for Scientific Collaboration may be several years old, but it's still prescient Jean Claude Bradley's blog Cameron Neylon's personal blog (see for example his post on "FriendFeed for Scientists") and lab blog Scientific Programming Environments Compared to professional software developers, most scientists use fairly primitive programming environments, in part because they've been too busy learning quantum chemistry to learn distributed version control, and in part because software developers seem to go out of their way to make tools hard to set up and learn. Lots of people have tackled this from a variety of angles. Unfortunately, a lot of work to date has focused on supercomputing, which is sort of like studying modern medicine by focusing on heart surgeons... Greg Wilson's "Where's the Real Bottleneck in Scientific Computing?" Carver, Kendall, Squires, and Post's "Software Development Environments for Scientific and Engineering Software: A Series of Case Studies" Matthews, Wilson, and Easterbrook's "Configuration Management for Large-Scale Scientific Computing at the UK Met Office" is an example of tools done right Read More ›

How Scientists Use Computers: Survey Part 2
Greg Wilson / 2009-05-09
Thank you once again for taking part in our Fall 2008 survey of how scientists use computers in their research. We will present a paper describing our findings at ICSE'09 in Vancouver on May 23, and will make the results public as soon after that as possible. There will also be an article in American Scientist magazine discussing what you've told us some time this summer. Our next step is to figure out what makes some scientific computer users so much more productive than others. We would therefore be grateful if you would take a few minutes to answer the questions below and email the result to team@carpentries.org: If you think that you use computers more effectively in your work than some of your peers: explain why you think so describe what you do or know that they don't If you can think of someone in your research area who uses computers more effectively in their work than you do: explain why you think so describe as best you can what they do or know that you don't If you answered either question, we would be very grateful if you could pass this email on to the colleague or colleagues you were thinking of and ask them to answer it as well—we believe we will learn a great deal by comparing responses, as well as from the responses themselves. If they wish to remain anonymous, please ask them to return their response to you for forwarding to us. Otherwise, please have them reply directly to us. (It would be very helpful in the second case for them to mention your name, so that we can pair their response with yours.) As with the original survey, only the researchers directly involved in this study will have access to respondents' contact information and/or identities. This information will not be shared with any third party in any way. Thanks in advance for your help—we hope you'll find the results useful. Prof. Greg Wilson Dept. of Computer Science University of Toronto team@carpentries.org http://www.cs.toronto.edu/~gvwilson Read More ›

Topics and Schedule Posted
Greg Wilson / 2009-05-06
We've put up a list of the topics we intend to cover, and the order in which we intend to cover them. It's very provisional; we'll update it regularly, and your comments would be very welcome. Note: the topics page has been retired. Please see the course outline instead. Read More ›

Entrance Requirements
Greg Wilson / 2009-05-04
Our rough guide to what students should know before taking this course is now on the prerequisites page. If you don't feel confident you know this material, but still want to take the course in July, please let us know: we're organizing some tutorial sessions in May and June. Note: the prerequisites page has been retired. Please see the target audience page instead. Read More ›

What If Scientists Didn't Compete?
Greg Wilson / 2009-05-01
Interesting summary in the New York Times of some collaborative science done by Dr. Sean Cutler. One of the goals of this course is to give scientists the skills they need to do things like this routinely. This raises a few questions: Should one of the "graduation exercises" for the course be to write some kind of search tool (or a wrapper around several search tools)? Finding potential collaborators is sometimes the biggest challenge. Should more emphasis be placed on sharing and merging data sets? Or on looking for inconsistencies in them? What else do you need to be able to do in order to collaborate more effectively with your colleagues? Read More ›

Empirical Software Engineering and Scientific Computing
Greg Wilson / 2009-04-28
The slides for my talk at the National Research Council on empirical software engineering and how scientists actually use computers are now up on SlideShare. The colors in some of the embedded images were messed up during upload, but the result should still be readable. Read More ›

Madagascar Course in Delft June 12-13
Greg Wilson / 2009-04-27
Via Victoria Stodden, a link to a course on reproducible research with Madagascar being run in Delft on June 12-13. Focus will be seismic research, but the ideas are more generally applicable. Read More ›

Firming Up Course Goals
Greg Wilson / 2009-04-27
The best way to design a course is to describe the things students will be able to do when it's over; the best way to do that is to specify graduation exercises. Ours are listed in the Goals page on this blog. We would be grateful for feedback: are these the things you want to be able to do? What did we forget? What could we take out to make room for things you care about more? Please leave your comments on the page itself; we'll update it regularly based on what you say. Note: the goals page has been retired. See the course outline page instead. Read More ›

What Supervisors Need To Know
Greg Wilson / 2009-04-23
I received an interesting email yesterday from a grad student who took this course the last time it was offered at the University of Toronto. It said in part: My supervisor could better advise students doing computational work if they had more background knowledge. They are routinely faced with questions like: Is a project possible, given the background of the student and the difficulty of the tasks? How long should a project take, and what can be considered good progress? What training should a student have? How to manage collaboration between students, data archives, etc? How to make sense of and build upon work done by previous students? On a more personal note—I would enjoy my supervisor having a clearer idea of what I do. It's an interesting list, and quite different from a grad student's. What else do you think people directing computational research, rather than doing it themselves, need to know? Read More ›

We've Started a FAQ
Greg Wilson / 2009-04-08
We have started a FAQ [no longer online] for the July 2009 offerings of the course in Edmonton and Toronto. Please let us know if you have any questions that it doesn't answer yet. Read More ›

Software Carpentry in Alberta
Greg Wilson / 2009-04-08
I'm very pleased to announce that thanks to generous support from Cybera, Software Carpentry will be offered at the University of Alberta in Edmonton this summer. The course will be co-taught with the offering at the University of Toronto from July 13 to 31. For more information, or to enrol, please contact Professor Paul Lu. Read More ›

Cameron Neylon on the Three Opens
Greg Wilson / 2009-04-03
Cameron Neylon has another good post up, this one on open data, open source, and open process. Like many advocates of open science, he feels he has to choose between using open source software on one hand, and getting more science done on the other. I sympathize, especially since my colleagues and I have to choose what to use and not use in the July 2009 run of the Software Carpentry course. Read More ›

Software Carpentry in Toronto July 13-31 2009
Greg Wilson / 2009-04-01
Thanks to a grant from MITACS, the University of Toronto will offer the Software Carpentry course as a condensed three-week bootcamp this summer from July 13-31, 2009. This course is an accelerated introduction to software development aimed at graduate students in science and engineering; its goal is to give them the tools and skills they need to use computers more effectively in their research. 16 spaces are available to students registered in full-time graduate programs in Canada; the fee for the course is $500, but grants of up to $1500 for students from outside the Greater Toronto Area are available to help offset travel and accommodation costs. If you wish to attend the course, or would like more information on content, schedule, prerequisites, eligibility, or other details, please contact Greg Wilson by email at team@carpentries.org. Please also subscribe to the new Software Carpentry blog at http://software-carpentry.org/blog/ for updates. Overview Many scientists and engineers spend much of their lives programming, but only a handful have ever been taught how to do this well. As a result, they spend their time wrestling with software, instead of doing research, but have no idea how reliable or efficient their programs are. Software Carpentry is an intensive introduction to basic software development practices for scientists and engineers that can reduce the time they spend programming by 20-25%. All of the material is open source: it may be used freely by anyone for educational or commercial purposes, and research groups in academia and industry are actively encouraged to adapt it to their needs. Originally developed for Los Alamos National Laboratory, the course has been used at research labs and universities on four continents. Topics include: Using the Unix Shell Version Control Automated Builds Basic Scripting with Python Testing and Quality Assurance Systematic Debugging Object-Oriented Design Data Crunching with Regular Expressions, XML, and SQL Basic Web Programming and Security Agile Software Development Process The course will be structured as an hour-long lecture and a two-hour lab session twice daily. Students are strongly encouraged to co-apply with peers so that they can work together on projects relevant to their research during the latter half of the course. Guest lecturers will discuss computer-supported collaborative science, grid computing, and legal issues related to sharing scientific data and software. Instructor Greg Wilson holds a Ph.D. in Computer Science from the University of Edinburgh, and has worked on high-performance scientific computing, data visualization, and computer security. He is now an Assistant Professor in Computer Science at the University of Toronto, where his primary research interest is software engineering for computational science. Greg is on the editorial board Computing in Science and Engineering; his most recent books are Data Crunching, Beautiful Code, and Practical Programming. Read More ›

User Stories
Greg Wilson / 2009-03-30
One of the tricks I teach my undergraduates is to create fictional personas to describe the intended users of a system—or in this case, a course. Here are three of the "people" I've had in mind while developing Software Carpentry to date; my goal is to update these stories to better reflect how scientists work today. Bhargan Basepair Bhargan Basepair received a B.Sc. in biochemistry five years ago. He has been working since then for Genes'R'Us, a biotech firm with labs in four countries. He did a Java programming course as a freshman, and a bioinformatics course using Perl as a senior. Bhargan and his colleagues are developing fuzzy pattern-matching algorithms for finding similarities between DNA records in standard databases. To help other Genes'R'Us researchers, and to test his group's heuristics, Bhargan runs an overnight sequence query service. Researchers email sequences in a variety of formats (in-line, attachments, URLs to pages behind the company firewall, etc.). Bhargan saves them in files called search/a.in, search/b.in, and so on, then edits them to add query directives. He is very conscientious, and almost never accidentally overwrites one query with another. Before leaving at night, he runs a Perl script that processses these inputs to create output files with matching names like search/a.out. When Bhargan comes in the next morning, he pages through his mail again, sending .out files to the appropriate people. (He almost never sends the wrong file to the wrong person.) He then uses another Perl script to copy all the input and output files to a directory with a name corresponding to the date, such as 2009-07-23. He and his colleagues would like to do statistics on these saved queries and results to see how well their algorithms are doing, but have never found the time. This course will teach Bhargan how to automate his overnight service by writing simple scripts to retrieve, process, and reply to email queries. Those scripts will automatically record queries, results, and other data, and produce a daily summary of the performance of the pattern-matching algorithms. Helen Helmet Helen Helmet, a Ph.D. student in mechanical engineering, is currently doing a six-month internship at an engineering firm designing carbon-fiber helmets for firefighters and other emergency service personnel. Her undergraduate courses included an introduction to scientific computing using MATLAB, a robotics course using C, and a numerical methods course that also used MATLAB. She taught herself Fortran during a co-op placement between her junior and senior years, and used it again in a graduate course on finite elements. Helen's task is to model the non-combustive thermal degradation (otherwise known as "melting") of candidate materials. Her starting point is a 14,000-line program her supervisor wrote a decade ago. After deciding that there isn't time to re-write it in C++ (which she would like to learn), she comments out the calls to the mesh deformation routine in the main loop and begins to write a replacement. She sometimes deletes what she has written and starts over three or four times before she is satisfied. Helen tests her program by writing the total heat content of the mesh at each time step to a file. She then loads this data into MATLAB to graph the percentage differences between these values and the ones produced by the original program for six sample problems. In one case, the difference grew as large as 30\% by the end of the simulation. Helen added \code{write} statements to her program to display values until she managed to convince herself that the difference was due to a bug in the original subroutines. Helen keeps a to-do list on her home page. Every two or three days, she updates this list to show the progress she has made. She keeps completed tasks on the page until the end of the month, when she writes a short status report for her supervisor. This course will teach Helen to design software before she starts typing, and that there are better ways to manage code evolution than commenting out one section, and replacing it with another. She will also learn more effective testing and debugging procedures, and how to use a version control system to ensure that she can back down to an old version of code when she needs to. Finally, she will be shown how to use an issue-tracking system to manage her to-do list, and how to write a small script to generate his monthly progress report automatically. Stefan Synthesis Stefan Synthesis is a graduate student in chemistry who is working as a lab technician to help cover his costs. His only programming experience is a general first-year introduction to computational science using Python. Stefan's supervisor is studying the production of fullerenes (also known as "buckyballs"). Each set of experiments involves 100 different reactant mixtures, 20 different temperature regimes, and 5 different pressures. Using a machine built by a collaborating lab, Stefan can run all the mixture and temperature combinations at once, so that the output of each experiment is five files containing 2000 lines of data each. The controller for the experimental machine writes these files to Stefan's workstation approximately an hour after the experiment begins. To analyze them, Stefan opens them with Excel, copies and pastes to merge the data into one spreadsheet, then creates a chart using the chart wizard. He saves the chart as a PNG file on the group's web site, along with the original data file. Two or three times a week, Stefan receives results from his supervisor's collaborators. He creates charts for each, which he uploads to the web site, then merges summary statistics into a master spreadsheet. This course will teach Stefan how to automate the process described above. More importantly, it will teach him how to track the provenance of the data he is working with, so that scientists in his group and others can trace backward from the final charts to the raw data they represent. Read More ›

Open Notebook Science Badges
Greg Wilson / 2009-03-25
I blogged last summer about creating a badging scheme for open science. Turns out it's been done: ONS Claims has badges for four flavors of open science. Two sets are available in various sizes, all variations on the themes below: All content Selected content Immediate release Delayed release Here's hoping they're widely adopted. Read More ›

Inference for R
Greg Wilson / 2009-03-25
Inference for R lets users nest the R statistical language in Word and Excel. It's a neat idea, and another example of the kind of bottom-up innovation that I predict will eventually lead to fully-fledged extensible programming systems. (If Bespin [merged into Ace, as of Jan 2011] made it easier to do things like this, I might understand all the excitement...) Read More ›

Legal Frameworks for Reproducible Research
Greg Wilson / 2009-03-17
My grad students and I met Victoria Stodden for the first time yesterday, and had a great time talking about reproducible research, science 2.0, and most particularly the kind of legal/copyright frameworks needed to move science forward. She has two papers up that anyone interested in the subject should read: a short one that appeared in Computing in Science and Engineering titled The Legal Framework for Reproducible Research in the Sciences: Licensing and Copyright, anda longer one due out soon called Enabling Reproducible Research: Open Licensing For Scientific Innovation. If you have thoughts on the subject, I'm sure she'd enjoy hearing from you. Read More ›

Open Science and Autism's False Prophets
Greg Wilson / 2009-03-02
Paul Offit has a new book out called Autism's False Prophets, which looks at how the myth about vaccination causing autism arose and spread. As his condensed essay on the subject says, it's a far from simple story, but one that has echoes in areas such as climate change where science and public policy must dine at the same table. I still haven't decided whether open science will make a difference to this or not: putting data, calculations, and papers freely available online in real time will help scientists talk to one another, but I don't think it will help them communicate with the other 99.9% of our species. Thoughts? Read More ›

Das Kapital, Computational Thinking, and Productivity
Greg Wilson / 2009-02-23
Somewhere in The Age of Uncertainty, Galbraith wrote that what made Das Kapital and the Bible great books was that they were so large, and so full of contradictions, that everyone could find support in them for anything they wanted. I have felt the same way about the phrase "computational thinking" ever since I attended a workshop at Microsoft Research in September 2007. In one of the breakout sessions, six of us tried to operationalize our understanding of the term by coming up with questions for a quiz that could be given to someone to determine if he or she was thinking computationally. It quickly became clear that we meant very different things when we used those two words. It was also clear (to me at least) that this ambiguity was socially very useful, since (to switch metaphors) it allowed people to attend the same church while disagreeing on the nature of salvation. It's not a polite fiction per se, but rather a—um, damn, I don't know the word—a thing that no one looks at closely because doing so would cause discomfort or friction. Eventually, though, things do have to be looked at closely. In this case, it's the productivity of scientific programmers. Based on feedback from people who've taken it, I believe that Software Carpentry significantly increases how much scientists can do with computers, but I don't have anything that would pass muster as "proof". I'm actually not even sure what form such proof would take, since I don't know how to measure the productivity of programmers of any other kind either—not in any reasonable amount of time, either. (Waiting to see if alumni produce more papers would take at least a couple of years, maybe more.) If someone could figure out how to measure computational thinking ability, on the other hand, before-and-after testing might be good enough. Any thoughts? Read More ›

Open Science Panel at Columbia
Greg Wilson / 2009-02-18
Via Jon Pipitone: there's a panel discussion tomorrow at Columbia titled "Open Science: Good For Research, Good For Researchers?" Jean-Claude Bradley, Barry Canton, and Bora Zivkovic are all going to be there, and yes, video will be distributed. I'm looking forward to it—it'll be a lot of thinking on computer supported collaborative science in one place. Read More ›

Computer Supported Collaborative Science
Greg Wilson / 2009-02-18
I've used the term "CSCS" a few times now; time to start groping toward a definition. "Computer supported collaborative science" (CSCS) is a specialization of computer supported collaborative work, which is the study of "how collaborative activities and their coordination can be supported by means of computer systems". Insert the word "scientific", and you have CSCS. More specifically, CSCS includes science 2.0, open notebook science, reproducible research, workflow & provenance, and other things modern computing technology can do to help scientists find and share information. Another way to look at CSCS is "areas where typical researchers in software engineering and/or HCI can directly help scientists". The word "typical" rules out HPC, numerical methods, very large databases, and a whole bunch of other "computational science 1.0" topics, since most SE/HCI people don't have the background for those. The stuff that falls under "e-science" or "grid science" (depending on which side of the Atlantic you're on, and which grant agency you're trying to seduce) might or might not be included, depending on which part you're looking at—there's certainly overlap. The same goes for the semantic web, data visualization, and a bunch of other things. Ironically, it's not clear whether traditional software engineering research falls under the CSCS heading either, at least not if you define SE as the study of software construction—it takes a lot of SE skill to build the kinds of things CSCS is about, but I don't see where CSCS requires the invention or study of new ways of building things. On the other hand, if your definition of SE includes end-user programming or the study of how to do empirical studies of tools and techniques in action, then there's definitely overlap with CSCS. So that's my opening shot: anyone want to volley it back? Read More ›

Enough Players to Hand Out Medals
Greg Wilson / 2009-02-16
I blogged last August about the first and second Provenance Challenge, in which the creators of systems for tracking scientific data and workflows were given sample problems, then asked to have their tools answer a variety of questions. (Results from the first were reported in Concurrency and Computation, but ironically, those articles are not openly available; the third challenge will kick off soon.) Chasing down one of those references again, I came across the Open Notebook Science Challenge, which "...calls upon people with access to materials and equipment to measure the solubility of compounds (aldehydes, amines and carboxylic acids are a priority) in organic solvents and report their findings using Open Notebook Science". This isn't quite the same thing as automatically tracking data provenance; instead, it is "...the practice of making the entire primary record of a research project publicly available online as it is recorded". There are lots of interesting research questions for computer scientists here, ranging from privacy and security issues to notification, peripheral awareness, ontological engineering, and more — for example, see Cameron Neylon's latest post synthesizing discussion about using OpenID to identify scientific researchers and their contributions. Read More ›

Python Textbooks for Biotech
Greg Wilson / 2009-02-11
Jan Erik Moström recently posted a request for textbook recommendations for teaching programming (using Python) to biotechnology students. He has now posted the responses [no longer online] he received — might be interesting to some readers of this blog. Read More ›

MTEST
Greg Wilson / 2009-02-11
Steve Eddins has posted an xUnit-style testing harness for MATLAB called MTEST on the MATLAB Central File Exchange. It's a nice piece of work, and I hope numerical programmers will make heavy use of it. Read More ›

Carl Zimmer's Readers' Reading List
Greg Wilson / 2009-02-11
Carl Zimmer (prolific and talented writer on biology and evolution) has posted a crowd-sourced reading list of great science writing. Lots of good stuff... Read More ›

Sharing Data Isn't That Easy
Greg Wilson / 2009-02-06
Interesting post from Systeme D called "ShareAlike considered harmful" (for geo data, anyway). I should give this to my students and ask them to think it through... Read More ›

Cameron Neylon Says Interesting Things
Greg Wilson / 2009-02-04
This time, he has blogged about best practices for making scientific data available. I think this kind of thing will have a much bigger impact on scientists' productivity than any amount of parallel supercomputing, and that computer scientists could have a lot of impact by helping "real" scientists figure out how to do it better. Read More ›

Communicate First, Standardize Second
Greg Wilson / 2009-01-30
That quote from Jean-Claude Bradley is on slide 34 of Cameron Neylon's presentation "Open Access, Open Data. Open Research?" Very worthwhile... Read More ›

Web Native Lab Notebooks
Greg Wilson / 2009-01-27
Good post from Cameron Neylon about the lab notebooks of the future: "The traditional paper notebook is to the fully integrated web based lab record as a card index is to Google." Read More ›

A New Kind of Big Science
Greg Wilson / 2009-01-23
Via Michael Nielsen, a guest column by Aaron Hirsh at the NY Times on "A New Kind of Big Science": There is another way to extend our scientific reach, and I believe it can also restore some of what is lost in the process of centralization. It has been called Citizen Science, and it involves the enlistment of large numbers of relatively untrained individuals in the collection of scientific data. To return to our architectural metaphor, if Big Science builds the high-rise yet higher, Citizen Science extends outward the community of villages. Read More ›

I *Want* To Be A Number
Greg Wilson / 2009-01-10
Via Science in the Open's summary of the workshop on open science at PSB'09, a link to a paper explaining why scientists will want to identify themselves with unique serial numbers. Read More ›

Time to Freshen It Up
Greg Wilson / 2008-12-31
I tell the students in my software engineering classes that the absolute value of code coverage in testing isn't as important as the trend: if you're testing a smaller percentage of your software as time goes by, you're headed for trouble. The same is true of site stats: I don't care much about the absolute number of visitors, but if the curve is down and to the right, it's time to give the site a polish, or retire it. Which brings us to this year's stats for Software Carpentry: Looking at visits (upper right), things aren't too bad. Looking at pages, files, and hits (on the left), it's clear the web is losing interest. And looking at the total downloads (bottom right), I'm just confused: we didn't do any major surgery on the site, or retire any heavyweight content, so I don't understand why traffic weight would be cut by two thirds. So that's another of my New Year's resolutions for 2009 that will be a repeat of one made in 2008: give the site a makeover. If you have 50-100 hours of quality time to donate to help, please let me know... Read More ›

Things I'd Like To Finish In the Next 489 Days
Greg Wilson / 2008-12-26
One of the things I teach my students is that the real purpose of a schedule is to tell you when to start cutting corners and dropping features. The ticker on my web site tells me I have 489 days left in my contract with the university; I signed up hoping to study ways of teaching second-stage novices [1] how to be better programmers, but after four failed attempts to get NSERC funding [2], it's time to lower my sights. Here are the things I'd like to finish off before my stint at U of T is over: Help Samira Abdi, Jeremy Handcock, and Carolyn MacLeod finish their Master's theses, and get Aran Donohue, Alecia Fowler, Alicia Grubb, Zachary Kincaid, Jason Montojo, and Rory Tulk through theirs. Publish Practical Programming (the "CS-1 in Python" book that Jennifer Campbell, Paul Gries, Jason Montojo, and I have been writing). It's currently in beta, and due for release in a month or so; we'd like to do a Python 3 update in a year or so, but that's likely to slip. Finish the study of how scientists actually use computers. Data from the initial survey is now being processed; we'll put together a follow-up survey in the next couple of months, write a "popular science" paper for American Scientist in the spring, present results at the SECSE workshop in Vancouver in May, and submit a paper by year's end. Co-edit a special issue of Computing in Science & Engineering on "Software Engineering and Computational Science". Andy Lumsdaine and I have four articles lined up, and are looking for two more—if you'd like to volunteer, please give me a shout. Submit a proposal for a professional master's degree in Computer Science to U of T's School of Graduate Studies. This is mostly a matter of filling in forms, but that's kind of like looking at Everest and saying, "It's mostly a matter of going uphill." "Finish" a much-improved DrProject. I originally planned to use it as a platform for research, as well as teaching; there isn't enough time left for that, but I still hope to make it easier for software engineering instructors to introduce students to modern tools. Rewrite Software Carpentry. Tina Yee has translated some of the lectures into MATLAB; the next step is to make the whole thing look like it was written in the 21st Century [3]. Everything else has to go by the boards. In particular: I have resigned from my contributing editor post at Doctor Dobb's Journal. It was a lot of fun, and I really enjoyed working with Jon Erickson, but as I said back in October, I'd rather not do it than do it badly. The software developers' reading group I'd planned to start this January isn't going to happen. I'd really like something to pick up the slack now that DemoCamp seems to have stalled (if only to provide an excuse to get together with former students on a regular basis), but someone else is going to have to organize it. After this term, I'm going to stop supervising student projects (except those directly relevant to DrProject and/or Software Carpentry). Next to 10:00 am coffee breaks with the lecturers, this is the part of university life I enjoy the most, but there just isn't time... :-( The Software Project Coloring Book (my attempt to write down everything I try to teach undergraduates about real-world software development) is being put back on the shelf. I have written 35,000 words, but those were the easy bits: conservatively, I'd need 4-6 months of full-time work to finish it off. On the upside, Sadie got me some biking gear for Christmas, so now I'll have to shed the twenty pounds I've picked up in the last couple of years, and I get to start taking our daughter to music classes every week. To quote a friend, it isn't what I planned—it's better. [1] People who already know how to write programs, but not how to develop applications. I'm specifically interested in undergraduate Computer Science students, and graduate students in other disciplines. [2] Companies like Nitido, the Jonah Group, Idee, and Rogers have kindly donated a few thousand dollars each to keep things like DrProject going, as have several of my fellow professors, but a $24K grant from The MathWorks is the only "research" funding I've been able to raise. [3] As I said yesterday, I'm looking for a mentor in the Toronto area who can show me how to do this. Read More ›

A Healthy Dose of Scepticism
Greg Wilson / 2008-12-24
Titus Brown's latest post (which opens with, "The latest hot idea for making a protein-protein interaction database leaves me lukewarm") should be read by every computer scientist who's "just trying to help": ...while tools can be helpeful, the fundamental problem is much more, well, fundamental: science is hard. Connecting the dots is hard. Thinking clearly about the problem and separating the wheat from the chaff, so to speak, is hard. I worry that for the majority of biologists, new tools are going to be more distracting than helpful. We need to build simpler, easier-to-use tools, not more complicated tools; we need to keep our focus on the goal (solving biological problems) and not just on intermediate stages like improving databases and building better prediction tools. Read More ›

The National Academy Would Like to Hear From You
Greg Wilson / 2008-12-19
Via Carl Zimmer: the US National Academy of Sciences would like you to fill in a two-minute survey about what science topics you care about most. Current results are: Read More ›

Google Pulls the Plug on Scientific Data Sharing Project
Greg Wilson / 2008-12-19
Google has decided not to launch its scientific data sharing service — another victim of the recession, I suppose. Bummer :-( Read More ›

Three Reasons to Distrust Microarray Results
Greg Wilson / 2008-12-10
Interesting post: ...the paper actually demonstrated that is it possible to distinguish microarray experiments conducted on one day from experiments conducted another day. That is, batch effects from the lab were much larger than differences between patients who did and did not respond to therapy... As is so often the case, data were mislabeled. In fact, 3/4 of the samples were mislabeled. Read More ›

Igor, Connect the Electrodes!
Greg Wilson / 2008-11-30
The Software Carpentry course site is still getting a fair bit of traffic, although readership is definitely tailing off: I'm hoping to run an intensive three-week version of the course in June 2009 in Toronto (details to follow); hope I can find time between now and then to finish wikifying the course notes, get the MATLAB material online, and generally freshen the site up. Read More ›

SECSE'09 Call for Papers
Greg Wilson / 2008-11-21
Second International Workshop on Software Engineering for Computational Science and Engineering Saturday, May 23, 2009 Co-located with ICSE 2009 — Vancouver, Canada http://www.cs.ua.edu/~SECSE09 Overview This workshop is concerned with the development of: Scientific software applications, where the focus is on directly solving scientific problems. These applications include, but are not limited to, large parallel models/simulations of the physical world (high performance computing systems). Applications that support scientific endeavors. Such applications include, but are not limited to, systems for managing and/or manipulating large amounts of data. A particular software application might fit into both categories (for example, a weather forecasting system might both run climatology models and produce visualisations of big data sets) or just one (for example, nuclear simulations fit into the first category and laboratory information management software into the second). For brevity, we refer to both categories under the umbrella title of "Computational Science and Engineering (CS&E)". Despite its importance in our everyday lives, CS&E has historically attracted little attention from the software engineering community. Indeed, the development of CS&E software differs significantly from the development of business information systems, from which many of the software engineering best practices, tools and techniques have been drawn. These differences include, for example: CS&E projects are often exploring unknown science, making it difficult to determine a concrete set of requirements a priori. For the same reason, a test oracle may not exist (for example, the physical data needed to validate a simulation may not exist). The lack of an oracle clearly poses challenges to the development of a testing strategy. The software development process for CS&E application development may differ profoundly from traditional software engineering processes. For example, one scientific computing workflow, dubbed the "lone researcher", involves a single scientist developing a system to test a hypothesis. Once the system runs correctly and returns its results, the scientist has no further need of the system. This approach contrasts with more typical software engineering lifecycle models, in which the useful life of the software is expected to begin, not end, after the first correct execution. CS&E applications often require more computing resources than are available on a typical workstation. Existing solutions for providing more computational resources (e.g., clusters, supercomputers, grids) can be difficult to use, resulting in additional software engineering challenges. CS&E developers may have no formal knowledge of software engineering tools and techniques, and may be developing software in a very isolated fashion. For example, it is common for a single scientist in a lab to take on the (formal or informal) role of software developer and to have to rely solely on web resources to acquire the relevant development knowledge. Recent endeavors to bring the software engineering and CS&E communities together include two special issues of IEEE Software (July/August 2008 and January 2009) and this current ICSE workshop series. The 2008 workshop brought together computational scientists, software engineering researchers and software developers to explore issues such as: Those characteristics of CS&E which distinguish it from general business software development; The different contexts in which CS&E developments take place; The quality goals of CS&E; How the perceived chasm between the CS&E and software engineering communities might be bridged. This 2009 workshop will build on the results of the previous workshop. Similar to the format of the 2008 workshop, in addition to presentation and discussion of the accepted position papers, significant time during the 2009 workshop will be devoted to the continuation of discussions from previous workshops and to general open discussion. Submission Instructions We encourage submission of position papers or statements of interest from members of the software engineering and CS&E communities. Position papers of at most eight pages are solicited to address issues including but not limited to: Case studies of software development processes used in CS&E applications. Measures of software development productivity appropriate to CS&E applications. Lessons learned from the development of CS&E applications. Software engineering metrics and tool support for CS&E applications. The use of empirical studies to better understand the environment, tools, languages, and processes used in CS&E application development and how they might be improved. The organizing committee hopes for participation from a broad range of stakeholders from across the software engineering, computational science/engineering, and grid computing communities. We especially encourage members of the CS&E application community to submit practical experience papers. Papers on related topics are also welcome. Please contact the organizers with any questions about the relevance of particular topics. Accepted position papers will appear in the ICSE workshop proceedings and appear in the IEEExplore Digital Library. Please observe the following: Position papers should be at most 8 pages. Format your paper according to the ICSE 2009 paper guidelines. Submit your paper in PDF format to carver@cs.ua.edu. Deadline for submission: January 19, 2009 Submission notification: February 6, 2009. Organizing Committee: Jeffrey Carver, University of Alabama, USA (chair of the organizing committee) Steve Easterbrook, University of Toronto, Canada Tom Epperly, Lawrence Livermore National Laboratory, USA Michael Heroux, Sandia National Laboratories, USA Lorin Hochstein, USC-ISI, USA Diane Kelly, Royal Military College of Canada Chris Morris, Daresbury Laboratory, UK Judith Segal, The Open University, UK Greg Wilson, University of Toronto, Canada Read More ›

Getting the Science Right-Or At Least, Less Wrong
Greg Wilson / 2008-11-20
Via The Great Beyond: The US National Academy of Sciences has created an initiative that will link TV and movie directors with scientists and engineers to incorporate more accurate science content into entertainment. Press release here, web site here. That would be a cool job... Read More ›

Science Lessons for MPs
Greg Wilson / 2008-11-17
Via Nature: politicians from the UK Conservative Party will be required to take science lessons. On the one hand, kind of sad that they didn't learn the basics in grade school. On the other hand, yay!, and when will Canadian parties require the same? Read More ›

What Sciences Are There?
Greg Wilson / 2008-11-16
Over 1900 people have already responded to our survey of how scientists use computers, and it still has two weeks left to run. Our next task will be to analyze the data we've collected, which (among other things) means coding people's free-form descriptions of their specialties so that we can talk about physicists and chemists as opposed to "this one person who's doing N-brane quantum foam approximations to multiversal steady-state thingummies". Except: are "physics" and "chemistry" too broad? At that level, there are only a handful of sciences: astronomy, geology, biology, mathematics, psychology, um, computing, er, Curly, Larry, and Moe. Or maybe you'd distinguish "ecology" from "biology". Or "oceanography" from something else, or — you see the problem. Rather than making up our own classification scheme, I'd like to adopt one that's widely used and generally intelligible, but I'm having trouble finding one. Yahoo!, Wikipedia, and other web sites have incompatible (and idiosyncratic) divisions; the Dewey Decimal System and other library schemes have a very 19th Century view of science, and the ACM/IEEE publication codes are domain-specific. If anyone can point me at something else (ideally, something with about two dozen categories — that feels like it ought to be about right, just from eyeballing the data we have so far), I'd be grateful. Read More ›

One Good Survey Deserves Another
Greg Wilson / 2008-11-04
While we're running our survey of how scientists use computers [link no longer active], the folks at MATLAB are asking their users a few questions too. If you use any MathWorks products, and have a few minutes, they'd be grateful for your help. Read More ›

1731 People
Greg Wilson / 2008-11-02
1731 people have completed our survey of how scientists use computers since it went online three weeks ago. That's pretty cool, but I'd like to double the number (at least). If you consider yourself a working scientist, and haven't taken the survey yet, please take a moment and do so. If you aren't a scientist, but know some, please pass on the link: http://softwareresearch.ca/seg/SCS/scientific-computing-survey.html [link no longer active] Thanks! Read More ›

Finding and Re-using Open Scientific Resources
Greg Wilson / 2008-10-27
Via Cameron Neylon, a workshop in London in November on finding and re-using open scientific resources. Wish I could go... Read More ›

Surveying Scientists' Use of Computers
Greg Wilson / 2008-10-15
Computers are as important to modern scientists as test tubes, but we know surprisingly little about how scientists develop and use software in their research. To find out, the University of Toronto, Simula Research Laboratory, and the National Research Council of Canada have launched an online survey in conjunction with American Scientist magazine. If you have 20 minutes to take part, please go to: http://softwareresearch.ca/seg/SCS/scientific-computing-survey.html [link no longer active] We'd also be grateful if you'd spread the word through any mailing lists, blogs, or bulletin boards you have access to. Thanks for your help! Jo Hannay (Simula Research Laboratory) Hans Petter Langtangen (Simula Research Laboratory) Dietmar Pfahl (Simula Research Laboratory) Janice Singer (National Research Council of Canada) Greg Wilson (University of Toronto) Read More ›

Science in the 21st Century
Greg Wilson / 2008-09-11
I'm at the "Science in the 21st Century" conference at the Perimeter Institute today. There are 32 people in the room right now: 23 are male and 9 are female, but only one is non-Caucasian, which pretty much matches the numbers in the picture from the conference dinner last night. That's about the same M/F ratio I see in science grad courses at U of T, but definitely not the ethnic distribution—wonder why? It can't just be a "seniority effect" — this is a pretty young crowd. We see the same thing at DemoCamp: non-Caucasians are often a majority n sci/tech classes and companies in the Greater Toronto Area, but definitely a minority on Tuesday nights. Thoughts? Michael Nielsen says that SciBarCamp was 50/50... Beth Noveck: "Designing Digital Institutions: Science in Government 2.0". Talked about crowdsourcing patent review; wonder if U of T would run a grad course for sci/eng students to teach them how to do this (and as a side effect, get them to do some useful patent reviewing)? Might be a good central theme for a tech reading/writing course. Eric Weinstein: "Sheldon Glashow Owes Me a Dollar". His main point seemed to be that radical thinkers need to find wealthy benefactors (Medicis or Gates) in order to have the freedom to pursue really wild ideas. What I took away from it was how fundamentally the influx of physicists into banking is reshaping the language used by the latter. You can follow the others in real-time on FriendFeed, or better yet, watch videos of the talks on the Perimeter Institute's site. Read More ›

Science 2.0: the Future of Online Tools for Scientists
Greg Wilson / 2008-09-04
A pub night and panel with Timo Hannay, Cameron Neylon, and Michael Nielsen, hosted by Nature Network Toronto What does the future hold for the way we do science? Are online repositories such as GenBank and the physics preprint ArXiv, or social tools such as Nature Network, about to change science profoundly? To find out, join Nature Network Toronto for an interactive panel discussion over drinks at the pub. Date: Sunday September 7 at 7:30pm Place: Fionn MacCool's (181 University Avenue, near corner with Adelaide) About the panelists: Timo Hannay is Publishing Director of Nature.com at the Nature Publishing Group, publishers of Nature and over seventy other scientific journals, plus numerous online resources for scientists. He is responsible for new online initiatives in social software, databases and audio-visual content. Timo trained as a neurophysiologist at the University of Oxford and worked as a journalist and a management consultant before becoming a publisher. Cameron Neylon is a biophysicist working in molecular biology, biophysics, and high throughput methods. He has a joint appointment as a Lecturer in Combinatorial Chemistry at the University of Southampton and as a Senior Scientist in Biomolecular Sciences at the ISIS Pulsed Neutron and Muon Facility. He is developing an electronic notebook for biochemistry labs which has lead to his involvement in the Open Research movement and to his group moving to an Open Notebook. Michael Nielsen is a writer living just outside Toronto, Canada. He is currently working on a book about The Future of Science. One of the pioneers of quantum computation, he coauthored the standard text on quantum computation that is the most highly cited physics publication of the last 25 years. He is the author of more than fifty scientific papers, including invited contributions to Nature and Scientific American. For more information visit Nature Network Toronto (http://network.nature.com/group/toronto), or contact Eva Amsen (eva.amsen@gmail.com) or Jen Dodd (jen@jendodd.com, 519 572 2275). Read More ›

Bil Lewis Works With Biologists...
Greg Wilson / 2008-08-22
...and occasionally finds it frustrating. Read More ›

Data Provenance Challenge
Greg Wilson / 2008-08-13
John's summary of our discussion about what to teach scientists about reproducible research if they already believe it's a good thing, and want to start doing it reminded me that I never posted about the Provenance Challenge. It has been run twice so far; each time, authors of tools to track the provenance (or lineage) of scientific data have to implemented some workflows, then answers questions about where data came from, what was done to it, and so on. The results of the first challenge are described system-by-system in these papers (sorry, but it's behind a wall — if you google for combinations of the authors' names, you can find PDF preprints). This is a very cool research area, and I hope one of my incoming grad students will want to do something with it. Read More ›

SciFoo, eGY, and Splitting
Greg Wilson / 2008-08-11
OpenWetware has posted notes from SciFoo. I'm sorry I missed it; looking forward to "Science in the 21st Century" even more than before. Those notes pointed me at the Electronic Geophysical Year declaration. I agree in principle, but think that something like the Open Source Initiative's "certified open" badge would be more useful than UN-ish statements like, "Effort should be made to identify and rescue critical Earth system data and ensure persistent access to them." We're very close to re-launching the Software Carpentry site as a wiki, and I'm wondering if I should move the science-and-computing thread out of this blog into a separate one at that site. Thoughts? Read More ›

They're Breeding Like Rabbits
Greg Wilson / 2008-08-01
Cameron Neylon complains about the proliferation of networking sites, aggregators, and what-not for scientists. I think he's right: none of them will succeed until there's massive consolidation. Maybe LinkedIn or someone like that could offer a cheap-but-not-free service customized to scientists' need on top of its existing infrastructure? Read More ›

Next Lecture?
Greg Wilson / 2008-07-28
The Software Carpentry course currently contains the following lectures: Introduction The Unix Shell (2 lectures) Version Control Automated Builds Basic Scripting (bool/int/float, for/while/if) Strings, Lists, and Files Functions and Libraries Programming Style Quality Assurance (basic testing) Sets, Dictionaries, and Complexity Debugging Object-Oriented Programming (2 lectures) Unit Testing (unittest — should switch this to nose) Regular Expressions Binary Data XML Relational Databases Spreadsheets Numerical Programming (the basics of NumPy) Integration (subprocess+pipes and wrapping C functions) Web Client Programming (HTTP request/response, URL encoding) Web Server Programming (basic CGI processing) Security (the weakest lecture of the bunch) The Development Process (a mish-mash of sturdy and agile) Teamware (introduces portals like DrProject) Conclusion (various "where to look next" suggestions) Between now and Christmas, I want to tidy them up, duplicate the examples in MATLAB, and add some of the content I wrote for "CSC301: Introduction to Software Engineering". Since I won't have time to do everything, I'd like your help prioritizing. Which of the following topics do you think is most important to add? And what have I forgotten entirely? Lifecycle: should I split the existing "Development Process" lecture into two, and cover agile methods (focusing on Scrum) and sturdy methods (i.e., longer release cycles, more up-front planning, legacy code). Neither exactly fits scientists' "exploratory programming" paradigm, but they're all we've got... Quality: this would expand the "Programming Style" lecture with material from Spinellis's Code Reading and Code Quality to describe what makes good software good. Deployment Currently based on the patterns in Nygard's Release It!, which focus on designing scalable fault-tolerant applications. Should I instead cover the creation and distribution of packages (e.g., RPMs, Distutils, Ruby Gems, etc.)? Refactoring: a combination of Fowler's original Refactoring and Feathers' Working Effectively with Legacy Code. UML: I devote three lectures to this in CSC301; I don't see any reason to inflict it on scientists. Reproducible Research: it's already important, and likely to become more so; it also ties in with "open science", though I'm not sure what I could say about either that wouldn't just be rah-rah and handwaving—tools like Sweave are interesting, but I don't people would be willing to learn R just to use it, and there don't seem to be equivalents (yet) in other languages. The same goes for data lineage: it's an important idea, and there are plenty of research prototypes, but nothing has reached the "used by default" level of (for example) Subversion. GUI Construction: people still use desktop GUIs, and it's worth learning how to build them (if only because it forces you to come to grips with MVC and event-driven programming), but what everyone really wants these days is a rich browser-based interface, and I don't think it'd be possible to fit that into this course. High Performance Blah Blah Blah: this one keeps coming up, but (a) one of the motivations for Software Carpentry is the belief that there's too much emphasis on this in scientific computing anyway, and (b) what would it include? GPU programming? MPI? Grid computing? Some other flavor-of-the-week distraction from the hard grind of creating trustable code and reproducible results without heroic effort? Oh, wait, are my biases showing? Read More ›

Quick Quiz to Measure What Scientists Know
Greg Wilson / 2008-07-23
Suppose you have a room full of scientists—hundreds of 'em—and want to find out how they actually use computers in their work. There isn't time to interview them individually, or to record their desktops during a typical working week, so you've decided to ask them to self-assess their understanding of some key terms on a scale of: No idea what it is. Use it/have used it infrequently. Use it regularly. Couldn't get through the day without it. My list is below; what have I forgotten, and (more importantly) how would you criticize this assessment method? A command-line shell Shell scripts Version control system (e.g., CVS, Subversion) Bug tracker Build system (e.g., Make, Ant) Debugger (e.g., GDB) Integrated Development Environment (e.g., Eclipse, Visual Studio) Numerical Computing Environment (e.g., MATLAB, Mathematica) Inverse analyzer (e.g., Inane) Spreadsheet (e.g., Excel) Relational database (e.g., SQLite, MySQL, Oracle) Layout-based document formatting (e.g., LaTeX, HTML) WYSIWYG document formatting (e.g., Word, PowerPoint, OpenOffice) Now, you have the same room full of scientists, and you want to find out how much they know about software development. There still isn't time to interview them or have them solve some programming problems, so again you're falling back on self-assessment. This time, the scale is: No idea what it means. Have heard the term but couldn't explain it. Could explain it correctly to a junior colleague. Expert-level understanding. and the terms themselves are: Nested loop Switch statement Stable sort Depth-first traversal Polymorphism Singleton Regular expression Inner join Version control Branch and merge Unit test Variant digression Build and smoke test Code coverage Breakpoint Defensive programming Test-driven development Release manifest Agile development UML Traceability matrix User story Once again, my questions are (a) what have I forgotten, and (b) how "fair" is this as an assessment method? Read More ›

Badge of Reproducibility
Greg Wilson / 2008-07-23
Coming back to the badge meme from earlier this week, John Cook's new Reproducible Research blog pointed me at this page on the EPFL site advertising a paper called "What, Why and How of Reproducible Research in Signal Processing". Notice the "Reproducible Research" badge? The "add your evaluation" link takes you to a formlet that lets you choose between: I have tested this code and it works I have tested this code and it does not work (on my computer) I have tested this code and was able to reproduce the results from the paper I have tested this code and was unable to reproduce the results from the paper It's a good start... Read More ›

Reviving the Software Carpentry Mailing List
Greg Wilson / 2008-07-22
Luke Petrolekas and I are thiiiiis close to having the Software Carpentry notes converted to a wiki. Once they are, I'm going to be working with Tina Yee to update them, do the examples in MATLAB as well as Python, and fix some longstanding bugs. I'm also going to resurrect the project's two mailing lists (one for occasional announcements, the other for people interested in developing new material and/or teaching the course). If you'd like to be on either or both, please let us know. Read More ›

Badge of Honor?
Greg Wilson / 2008-07-19
I met up with Shirley Wu, Michael Nielsen, and a few other ISMB attendees yesterday to talk about what's variously called Science 2.0 or Open Science. It was pretty rushed (and not helped by the bar we wound up in), but it got me thinking about creating an "open science" badge that scientists could apply to their work. Right now, people are using a variety of terms in inconsistent ways; it sometimes takes a very close reading to figure out exactly what the mean. I'd really like to see the PSB workshop (or some other meeting like it) put a peg in the ground and say, "If you do the following things, you can put this 'open science' badge on your lab's web site, and put, 'This research is certified open.' in your papers." The W3C's familiar badges and the Open Source Initiative's certification of software licenses have done a lot to clarify discussion, and have given people standards to aspire to. Nine years after the "Open Source/Open Science" workshop at Brookhaven National Laboratory, maybe it's time to borrow those ideas and put them into practice. Read More ›

Kevin's Been Busy
Greg Wilson / 2008-07-01
Kevin Brown has been busy — he's been coordinating, installing, maintaining, fixing, and figuring out how to use a new $20 million supercomputer for cancer research. No word on how much money will be spent training people how to use it effectively, but hey, I'm easy to reach... :-) Read More ›

What a Proposal Looks Like
Greg Wilson / 2008-06-13
I got word earlier this week that The MathWorks (makers of MATLAB) had approved my request for funding to spruce up the Software Carpentry notes, and find out how scientists are actually using computers. I faxed a signed copy of the paperwork down to them today—with luck, work will start in a couple of weeks, and I'm very excited to have a chance to work with the NRC's Janice Singer on the survey. And since students (graduate and undergraduate alike) occasionally ask about how academic funding works, the text of the proposal is below the cut. It isn't quite the same as programming, but in the end it might be more useful... In 200 words or less, describe your proposed project. Software Carpentry is a widely-used open source introduction to basic software engineering skills for computational scientists. Originally developed for Los Alamos National Laboratory, it has been used by universities, companies, and government labs on four continents, and its site has had over 160,000 distinct visitors since August 2006. This project will (a) translate the code examples from Python to MATLAB to make them more accessible; (b) transform the course site into a wiki to encourage community contribution; and (c) survey professional MATLAB users to determine what development processes and tools they currently use so that the course materials can more directly address their needs. All material will be released under a Creative Commons license and/or published in peer-reviewed venues. The bulk of the project will be completed with six months of receipt of funding (ideally, before the end of 2008); follow-up work to assess the impact of the work will complete six months later. What is the primary goal of your project and what educational problem or opportunity does your project address? The problem that Software Carpentry addresses is that most scientific programmers don't know how to develop medium-to-large applications efficiently, primarily because they have never been introduced to modern development tools and techniques in a systematic way. We have repeatedly seen the existing course increase productivity by 20-25%, but there are still significant barriers to entry. One of the most important is the use of Python as an example language: while it is easier to learn than many alternatives, it is yet another learning curve for the course's audience to climb. Surveys indicate that 50-75% of the course's target audience are already familiar with MATLAB; translating the examples in the course material will therefore make it more accessible, which in turn should help computational scientists and engineers accomplish more with less pain. What is your step-by-step plan to accomplish the goal outlined above? Familiarization with the latest version of MATLAB (so that examples will not use deprecated or out-of-date features). Updates to course content management tools to support parallel examples. Translation of existing examples to static text. Conversion of course content from static text to MediaWiki or Markdown wiki format. (In parallel) Interview intermediate and expert MATLAB users to determine the scope and structure of the survey. Prepare the survey, disseminate it through public forums and mailing lists, then collate and analyze the results. Publish and disseminate the results, and feed them into further revisions or extensions of the course. Track uptake and impact of the course material via activity on the web site, discussion in existing MATLAB user forums, etc., over a six-month period. How will you use MathWorks products in your project? All examples will be run and checked using the latest release of MATLAB. More importantly for this project, we will need to consult with trainers and educators, and to distribute notice of a survey through to current users through mailing lists and other forums. Read More ›

Faking Results
Greg Wilson / 2008-06-06
Via BoingBoing, a story about scientists Photoshopping experimental results. Sometimes it's outright fakery; sometimes they're just "cleaning up" or "correcting". Either way, it raises an interesting question: how often are people doing this with computational results? Without scientists' code, or any other way to reproduce their work, we'll probably never know. Read More ›

Three Weeks and Change
Greg Wilson / 2008-06-03
Everyone's making good progress: Ming and Bing have posted their second demo — next step is to do some serious design of the final product. After correcting an earlier post of mine, Xuan has blogged a fuller description of what she and Edward are building. Zeev Lieber posted a brief summary of what Dmitri and William did last term. Dmitri's still wrestling with GWT exceptions. Jeff Balogh shortens his build. Matthew Basset sent his first data. Joseph Yeung is converting an EXE to an MMC. Victoria Mui completes a subtask. Daniel Servos likes Moodle plugins. Qiyu Zhu can edit roles. Nick Jamil provides more details. Qi Yang bursts into song. Oh, and it looks like I'm going to get a grant from The MathWorks to upgrade the Software Carpentry course and find out what scientists are actually doing with computers — more on that once I know whether Sadie and I have bought a house or not. Read More ›

Programming and Scientific Education on Slashdot
Greg Wilson / 2008-05-30
Via Adam Goucher, a Slashdot thread about whether programming should be part of science education. 300+ comments and counting, almost all relating personal experiences. Read More ›

Reminded of the Difference Once Again
Greg Wilson / 2008-05-27
Via Irving Reid, a great quote from Henrik Kniberg about how to tell when you're done: So when a team member says that a story is Done and moves the story card into the Done column, a customer could run into the room at that moment and say "Great Lets go live now", and nobody in the team will say "no, but wait". Ironically, Kniberg's post is called "Version Control for Multiple Agile Teams". Why ironic? Because I was talking yesterday to a professor in my department — young guy, smart, already has a better research and publication record than I'll ever have. I brought up version control (as in, "Why do less than 10% of faculty and grad students use it?"); his reply was, "Well, we're just going to back everything up once a day." Read More ›

Interviewed by Jon Udell
Greg Wilson / 2008-05-25
Jon Udell has posted an interview he did with me last week at IT Conversations. The title is "High Performance Computing Considered Harmful", and slides from a talk of the same name that I gave in Austin are available. Later: here's Jon's summary of the interview. Curmudgeonly? Moi? :-) Read More ›

Why Don't We Do This?
Greg Wilson / 2008-05-21
Source Code for Biology and Medicine is a peer-reviewed journal from BioMed Central devoted to, well, source code for biology and medicine. It's been around for at least a couple of years; based on a quick scan of four of their most popular papers, they seem to cover everything from getting and installing software to its numerical properties, user interface, and typical applications. This is very cool—but why isn't there something like this for computer science? All I can think of are Software: Practice & Experience and The Journal of Systems and Software, both of which are one meta removed from SCBM. Read More ›

But I Was Gone Less than 48 Hours!
Greg Wilson / 2008-05-16
I left Toronto for Austin mid-day Wednesday, and got back at midnight last night. Lots happened in the interim, so here's a linkandthoughtdump (which I bet actually is one word in German): Gave a talk about Beautiful Code to the Austin Python Users' Group Wednesday at Enthought's swanky offices. (They're the kind folks who provide web hosting for the Software Carpentry course.) About 27 people in attendance, and good discussion afterward; was grateful to Travis Vaught and Sergey Fomel for rides from the airport and to the hotel respectively. Gave another talk titled "HPC Considered Harmful" at the Texas Advanced Computing Center's Second Annual Scientific Software Days. I was a bit nervous about telling people at a supercomputing center that focusing on massive parallelism and peak performance is wrongheaded, but there were a lot of nodding heads. I made lots of notes from two other talks that I want to follow up on at some point: Robert van de Geijn's FLAME system lets you draw matrix operations, then automatically generates the corresponding high-performance code. It's a great example of a real high-level programming tool for scientists (and yet another special case of what a real extensible programming system would support). Eric Jones (also from Enthought) talked about a tool they're building that watches changes to variables in Python programs, and automatically generates interactive plots of their values. It sounds simpler and less impressive than it actually is; I've asked him to put together a screencast, and I think you'll be wowed—I was. (Later: Steve Eddings from The Mathworks sent me a link about data linking in MATLAB, complete with a video tutorial.) At roughly the same time, half a world away, Diomidis Spinellis presented a study comparing the code quality of Linux, Windows, OpenSolaris, and FreeBSD. Very cool work; wish I'd been at ICSE'08 to ask questions. Meanwhile, Dmitri Vassiliev, who is continuing his work on SlashID this summer, has discovered that generated code is next-to-impossible to debug. Not to be a one-note symphony or anything, but I said in that same article about extensible programming systems that the real challenge is not extending notation, but creating extensible debugging tools so that those notations and high-level representations can be fixed when they break. Robert van de Geijn doesn't think FLAME needs a debugger; respectfully, I disagree. Science in the Open has a plea to scientists to make their raw data available, motivated by yet another irreproducible result. Kosta Zabashta has posted early thoughts about integrating IRC into DrProject. (Gray on black? Kosta...your design skills rival mine...) I need to tell him that DrProject's RPC module doesn't handle tickets because Jeff Balogh is going to replace the entire ticketing system with an extensible one this summer, using his Dojo Form Editor as a front end... Elisabeth Hendrickson has thoughts on automating tests for legacy web applications. Students, take note. Thanks to Nick Jamil and others, we have instructions for installing DrProject on Windows. Yay! Everything old is new again, including Ada and the Bletchley Park Colossus. And then there's this: Thanks again to Sergey Fomel for inviting me down, and for introducing me to the reproducible research community—I'm looking forward to many more discussions. Read More ›

SE-CSE Workshop
Greg Wilson / 2008-05-15
One of the downsides of being in Texas is that I couldn't attend the First International Workshop on Software Engineering for Computational Science and Engineering, which was held at ICSE'08 in Leipzig this week. Papers are here (I'll be reading them on the flight home); they look interesting, but the biggest thing for me is the change in the workshop's name—it used to be "Software Engineering for High-Performance Computing Applications" (2004, 2005, 2006, 2007), and I'm hoping the change of name reflects a genuine broadening of focus. Read More ›

Those Who Will Not Learn From History...
Greg Wilson / 2008-05-05
My article "Those Who Will Not Learn From History..." is in this month's Computing in Science and Engineering (and mirrored here). Read More ›

SPOC
Greg Wilson / 2008-04-14
Regarding the idea of reproducible research, I stumbled over I-SPOC while looking through Google Summer of Code stuff. From their pitch: The overall goal of this project is to build computational and social infrastructure to support the use of a new form of scientific communication called a SPOC (Scientific Paper with Open Communication). A SPOC combines a standard academic paper with open source computational models written in any publicly accessible computer language. SPOCs will (i) link computational results with the models that produce them, allowing independent verification and validation (ii) create incentives for cleaner, more transparent code and for the sharing of code (iii) enable others to extend and improve existing computational models and to verify model robustness (iv) bring computational models to life allowing faculty, students, and other scholars to see dynamic phenomena emerge and (v) have an enormous effect on the teaching of science. The reality isn't (yet) as impressive as the vision, but it's still intriguing. I think there's some great work in requirements engineering waiting to be done here: is reproducibility both necessary and sufficient for scientists to regard their peers' computational work as science? If so, what must a tool do or provide in order to satisfy that need? If not, what are the requirements, and why? Read More ›

Three Studies (Maybe Four)
Greg Wilson / 2008-04-10
We're in the thick of picking students and projects for Google Summer of Code, which has inspired some less-random-than-usual thoughts. Here are two studies I'd like to do (or see done): What has happened to previous students? How many are still involved in open source? How many have gone on to {start a company, grad school, prison}? What do they think they learned from the program? How much of the software they wrote is still in use? Etc. Every one of the 175 organizations blessed by Google this year is using the same web application for collecting and voting on projects. From what I can tell, they're all using it in different ways: +4 means something very different to the Python Software Foundation than it does to Eclipse or SWIG. They're also using a bewildering variety of other channels for communication: wikis, IRC, Skype chat sessions, mailing lists (the most popular), and so on. Why? Is this another reflection of Jorge Aranda's finding that every small development group evolves a different process, but all those processes "work" in some sense, or is it—actually, I don't have any competing hypotheses right now, but I'm sure there are some. And while we're on the subject of studies, I just read Hochstein et al's paper "Experiments to Understand HPC Time to Development" (CT Watch Quarterly, 2(4A), November 2006). They watched a bunch of grad students at different universities develop some simple parallel applications using a variety of tools, and measured productivity as (relative speedup)/(relative effort), where relative speedup is (reference execution time)/(parallel execution time), and relative effort is (parallel effort)/(reference effort). The speedup measure is unproblematic, but as far as I can tell, they don't explain where their "reference effort" measure comes from. I suspect it's the effort required to build a serial solution to the problem, and that "parallel effort" is then the additional time required to parallelize; I've mailed the authors to ask, but haven't heard back yet. I wasn't surprised when I realized that the authors hadn't done the other half of the study, i.e., they hadn't benchmarked the productivity of a QDE (quantative development environment) like MATLAB—many people talk and think as if scientific computing and high-performance computing were the same thing. At first glance, it doesn't seem like it would be hard to do—you could use the performance of the MATLAB or NumPy code over the performance of a functionally equivalent C or Fortran program for the numerator. You have to be careful about the denominator, though: if my guess is right, then if things were done in real-world order, you'd be comparing: time to write parallel code after writing serial code time to write serial code from scratch vs time to write MATLAB from scratch time to write serial code having written MATLAB Even with that, I strongly suspect that MATLAB (or any other full-featured QDE) would come out well ahead of any parallel programming environment currently in existence on problems of this size. Yes, you need big iron to simulate global climate change over the course of centuries, but that's not what most scientists do, and the needs of that minority shouldn't dominate the needs of the desktop majority. I'd also be interested in re-doing this study using MATLAB parallelized with Interactive Supercomputing's tools. I have no idea what the performance would be, but the parallelization effort would be so low that I suspect it would once again leave today's mainstream HPC tools in the dust. And now let's double back for a moment. I used the phrase "desktop majority" a couple of paragraphs ago, but is that really the case? What do most computational scientists use? What if we include scientists who don't think of themselves as computationalists, but find themselves doing a lot of programming anyway, just because they have to? If you plotted rank vs. frequency, would you get a power law distribution, i.e., does Zipf's Law hold in scientific computing? Last term, I calculated a Gini coefficient for each team in my undergraduate software engineering class using lines of code instead of income as a raw metric; what's the Gini coefficient for the distribution of computing cycles used by scientists (i.e., how evenly or unevenly is computing power distributed)? And how should the answers to these questions shape research directions, the development of new tools, and what we teach in courses like Software Carpentry? Read More ›

The Retractions Just Keep Coming In
Greg Wilson / 2008-04-02
Via Titus Brown, yet another published result retracted because of a bug in code. Read More ›

Summer Plans for Software Carpentry
Greg Wilson / 2008-04-02
The Software Carpentry site is still getting a lot of traffic, despite my neglect: This summer, I'd like to: Convert the site from static HTML pages to a wiki to make it easier for people to contribute content and fix bugs. Translate the examples into MATLAB to make them accessible to a larger audience. (Yes, Python is still my favorite language, and yes, the Python versions will remain—I just want it to be possible for the average mechanical engineer to follow the discussion of testing without first having to learn a new programming language). Add some of the material that I developed for CSC301: Introduction to Software Engineering, and some of what Titus Brown wrote for Intermediate and Advanced Software Carpentry. The odds of all three happening are close to zero: my grad students are going to be in the middle of real research, we're hoping to have half a dozen or ten undergraduate interns as well, we have a "CS-1 in Python" book to finish for Pragmatic, and oh yeah, I'm getting married twice. If you want to help out, now would be a good time to raise your hand... :-) Read More ›

Meet the New Flaw
Greg Wilson / 2008-03-31
I was pretty excited when I heard that Microsoft was getting into scientific computing. As the world's biggest desktop software company, I figured they might understand that scientific computing and high-performance computing are not automatically the same thing, and that reliability and reproducibility are more important than peak performance. Turns out I was wrong: the workshop I attended last September was dominated by discussion of topics like GPU programming and computational grids that are still bleeding-edge computer science, rather than the nuts and bolts that would actually help most scientists be productive day-to-day, Microsoft's new HPC++ Computational Finance lab's site [since closed] has a lot on speed but nothing on correctness, et cetera. So where should they be spending their time? If I ran the world, they'd start by reading Buckheit and Donoho on reproducible research, double back to Jon Claerbout's notes on the same, check out the Madagascar project, and then try to figure out how to scale up those ideas to hundreds of thousands of scientists and publications in as diverse a range of fields as possible. It won't give the senator something to stand beside on opening day, but it'll do science a lot more good. Read More ›

Nice Quote
Greg Wilson / 2008-03-26
An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. —David L. Donoho, WaveLab and Reproducible Research, 1995, p. 5. (via Andrew Lumsdaine) Read More ›

Survey: Silent Errors in Scientific Code
Greg Wilson / 2008-03-07
Posted on behalf of Daniel Hook and Diane Kelly: We are members of a software research group from Queen's University who are investigating ideas and tools to assist with the development of scientific software. We are starting a project focused on finding silent or hidden errors in scientific code. (Silent errors are errors that don't result in a crash, an error message or an other obvious indicator of a problem.) To create a catalogue of common silent errors, we would like to hear debugging "war stories" from computational scientists. Using these stories we hope to provide improved code testing techniques specifically for scientists. To conduct our study, we are looking for scientific software debugging stories: just a few lines explaining what the problem was and how you managed to solve it. You can contribute to our study by sending us a story and/or by passing this email along to colleagues who might be able to help with a story. Please navigate to http://www.cs.queensu.ca/~hook/err_intro.html for more details if you are interested in contributing an error story to this study. Thank you for your interest. Sincerely, Daniel Hook and Diane Kelly Read More ›

LearnHub Launches with Software Carpentry Front and Center
Greg Wilson / 2008-03-06
Toronto-area startup LearnHub.com has launched their new e-learning site with part of the Software Carpentry course as their featured offering. I'm looking forward to meeting a whole new batch of students... ;-) Read More ›

Scientific Groupware Revisited
Greg Wilson / 2008-02-26
Nature would like "to hear from physicists what kind of tools would help in managing the ever-growing tide of information from, and the exciting possibilities of, the internet." (I presume scientists in lesser disciplines will be asked to chip in at a later date ;-). It will be interesting to see how much (if anything) has changed since Jon Udell wrote "Internet Groupware for Scientific Collaboration" eight years ago. Read More ›

O'Reilly Creating a Web Version of Mathematica
Greg Wilson / 2008-02-20
According to this press release, O'Reilly is partnering with Wolfram Research to create a Web 2.0 version of Mathematica that will "be browser accessible and, utilizing AJAX technologies, will emulate the desktop version of the software with remarkable fidelity". I'll be very interested to see how they handle equations (MathML isn't well supported) and grahpics (IE7 doesn't do SVG, so what, Flash? Silverlight?). Read More ›

Grumpy Minds Think Alike
Greg Wilson / 2008-02-14
Ned Gulley pointed me at this talk by fellow-Mathworkser Steve Eddins that hits a lot of the same core ideas as Software Carpentry. Time to convert SC to a wiki, I think, and start pushing it again... Read More ›

SciBarCamp in Toronto March 15-16
Greg Wilson / 2008-02-02
SciBarCamp is weekend unconference for scientists, artists, and technologists. The goal is to create connections between science, entrepreneurs and local businesses, and arts and culture. I think this is a great idea, and wish I could attend (I already have a plane ticket for SIGCSE in Portland); it's free, but there is only space for around 100 people, so please register by sending an email to Jen Dodd. Feb 17: Tim O'Reilly comments. Read More ›

Python Supercomputing Statistics
Greg Wilson / 2007-12-09
I have a major grant application and (hopefully final) revisions to my next children's book due on Friday, so of course I'm reading white papers about Python-friendly supercomputing from Interactive Supercomputing, a Boston-area firm that's about three years old. IS offers several kinds of parallelism for MATLAB, Python, R, SAS, and other high-level languages; I don't know if their tools are any easier to use than anyone else's, but they have an impressive team (including Russ Barbour, ex-Apollo, and Steve Reinhardt, ex-Cray). What's more immediately interesting to me is two of their papers (free, but registration required). The first, "Python Technical Computing End-User Study", was prepared by Fletcher Spaght, Inc.; based on 604 responses to a survey, it concludes that: significantly increased performance of Python codes would cause large or revolutionary improvements to 35% of technical users (8% would experience revolutionary benefits from 10X performance boost); most (52%) organizations using Python for technical applications consider their codes to be important to accomplishing their mission; technical Python program run times are long (31% typically over 1 hour); Python data sets are large (41% GBs or larger) for technical applications; large amounts of time are spent optimizing codes to run them productively on desktop workstations; in organizations using Python, tools such as C (91%), MATLAB (49%),and Fortran (32%) are also widely used for developing technical applications; Most (63%) organizations surveyed are interested in running Python on HPC resources and at least 65% of Python technical users have access to such systems; and half o survey respondents have ported their technicalPython codes [doesn't say to what], but only 17% do so with any frequency. Some of the details in the paper are interesting too. 36% use Python for test & measurement, 29% for communications [presumably communications applications, rather than inter-application communication and coordination, but this is not clear], and 24% each for signal/image processing and physical design. 33% describe their use of Python as "glue language", while 42% use the numerical libraries, and 24% use external libraries. 91% of users also use C/C++, 49% use MATLAB, 32% use Fortran, and 22% each use Mathematica and R. The other paper was prepared by the Simon Management Group. Its conclusions are more motherhood-and-apple-pie-ish: for example, "HPC software development environments vary widely by factors such as size and focus." There are still a few interesting itms, though: the median team size is 4-6 developers, 50% of respondents report that their organization works on 1-5 projects at a time (and 11.5% report working on more than 30 at a time), the expected median data ste within three years ranges from 200 to 600 GB, and 42% indicated that projects typically last 6 months, while 23.1% describe their projects as open-ended. I'm not sure what it all means just yet, but they're good numbers to know... Read More ›

The Burning Man of HPC
Greg Wilson / 2007-10-26
It's a small world: Andy Oram, my co-editor on Beautiful Code, just interviewed Brent Gorda, co-author of the first version of the Software Carpentry course, about the cluster challenge that's running at Supercomputing'07. Don't be fooled by the purple-on-black titles: this is a very cool idea. Students teams get fixed time (and fixed electrical power) to put together a cluster and run some benchmarks. Doing this earned you a PhD twenty years ago; ten years ago, it would have been a line item in a department's budget. It's now a contest for students, and I look forward to finding a cluster in the bottom of a cereal box before I die... ;-) Later: Brent has started blogging about the Cluster Challenge at O'Reilly's ONLamp site. I'm looking forward to hearing how the teams do. Read More ›

Doomed to Repeat It
Greg Wilson / 2007-10-02
Those who cannot learn from history are doomed to repeat it. (Santayana) I spent a day and a half last week at a workshop on computational science education. There were lots of smart people in attendance, all very passionate about the need to teach scientists how best to use the awesome power of a fully functional mothership—sorry, of modern hardware and software—but as the discussion went on, I grew more and more pessimistic. I'm old enough to remember the first wave of excitement about computational science in the 1980s; dozens of university-level programs were set up (mostly by physicists), and everyone involved was confident that the revolution would be unstoppable. So there I was, twenty years later, hearing people say almost exactly the same things. "A third kind of science": check. "Need to weave it into the curriculum, rather than tack it on at the end": check. "Encourage interdisciplinarity, and break down the fusty old departmental walls": check. "Revise the tenure process, or even eliminate it": yup, heard that one too. What I didn't hear was, "Here's why the first wave failed, and here's what we're going to do differently." There was some mention of the fact that we'll either have to drop stuff from existing core curricula to make room for computing, or introduce five- or six-year programs, but it was muted: as soon as you say it out loud, you realize how hard it's going to be, and how unlikely it is to happen. Many participants also fell into the trap of identifying computational science with high-performance computing, or of thinking that the latter was intrinsic to the former. In fact, that's not the case: most computational results are produced on workstations by sequential code, and focusing on the needs of people working at that scale would pay much greater dividends than trying to get them to work on the bleeding edge of massively-parallel GPU-based functional 3D virtual reality splaff. I was particularly disappointed by how little attention was paid to what I believe are the two biggest problems in computational science: making sure that programs actually do what they're supposed to, and getting them built without heroic effort. I've preached on these topics before, but it wasn't until this workshop that Brent Gorda, Andy Lumsdaine, and I summed it all up. The position paper we wrote for the workshop is included below; I'd be interested to hear what you think. We have been teaching software development to computational scientists since 1997. Starting with a training course at Los Alamos National Laboratory, our work has grown into an open source course called Software Carpentry that has been used by companies, universities, and the US national labs, and has had over 100,000 visitors since going live in August 2006. Based on our experiences, we believe the following: People cannot think computationally until they are proficient at programming. (Non-trivial coding is also a prerequisite for thinking sensibly about software architecture, performance, and other issues, but that's a matter for another position paper...) Like other "knowing how" skills, such proficiency takes so much time to acquire that it cannot be squeezed into existing curricula without either displacing other significant content, or pushing out completion dates. (Saying "we will put a little into every course" is a fudge: five minutes per lecture adds up to three or four courses in a four-year degree, and that time has to come from somewhere.) Most universities are not willing to do either of these things. The goal stated in the workshop announcement, "Create better scientists not by increasing the number of required credits," is therefore unachievable. In contrast with experimentalists, most computational scientists care little about the reproducibility or reliability of their results. The main reason for this is that journal editors and reviewers rarely require evidence that programs have been validated, or that good practices have been followed—in fact, most would not know what to look for. Most scientists will not change their working habits unless the changes are presented in measured steps, each of which yields an immediate increase in productivity. In particular, the senior scientists who control research laboratories and university departments have seen so many bandwagons roll by over the years that they are unlikely to get excited about new ones unless the up-front costs are low, and the rewards are quickly visible. The most effective way to introduce new tools and techniques is over an extended period, in a staged fashion, in parallel with actual use—intensive short courses are much less effective (or compelling) than mentorship. Peer pressure helps: if training is offered repeatedly, and scientists see that it is making their colleagues more productive, they will be more likely to take it as well. Team learning also helps: non-trivial scientific programming is a team sport, and mentorship is one of the best training methods we know. In particular, team members with distinct roles rallying around the science is the most successful strategy of all, but does not translate into a classroom setting. A mix of traditional and agile methodologies is more appropriate for the majority of scientific developers than either approach on its own: scientists are question-led (which encourages incremental development), but the need for high performance often mandates careful up-front design. Scientists cannot therefore simply adopt commercial software engineering practices, but will have to tailor them to their needs. Raising general proficiency levels is the only way to raise standards for computational work, and raising standards is necessary if we are to avoid some sort of "computational thalidomide" in the near future. Later: see also this rant by Victor Ng, and this piece by Kyle Wilson (no relation). Read More ›

Another Sighting of Software Carpentry
Greg Wilson / 2007-09-25
Tiziano Zito is teaching Software Carpentry in Berlin. I admit I'm biased, but I think that's pretty cool ;-) Traffic on the site is holding steady as well — wish I had time to go back and update the notes... Read More ›

Openness and (the promise of) XML
Greg Wilson / 2007-09-05
Over at O'Reilly, people are gloating about ISO's decision not to ratify Microsoft's OOXML document "standard". At the same time, Jon Udell is saying much more interesting things about what smart document formats could do for the sciences. To quote him quoting Clifford Lynch, "Scientific literature that is computed upon, not merely read by humans." Read More ›

Random Survey about HPC
Greg Wilson / 2007-08-31
This was just forwarded by a friend: The High Performance Computing (HPC) community is initiating a study to develop education and training pathways in order to enable scientists and engineers to advance the pace of discovery by taking advantage of high performance computing (HPC) and grids. In order to inform our study, we invite you to complete a survey about your experiences. We would like you to evaluate a list of categories of expertise according to how relevant they are for your work. We also would like to collect additional information about your training experiences. The survey is very short and should only take a few minutes to complete. Please go to the following link to complete the survey: http://www.surveymonkey.com/s.aspx?sm=_2f7D2YUJfkH9Dua0PP_2f30_2fQ_3d_3d. The survey will be open for input through September 11, 2007. Your input is very valuable to us and will help us understand your training needs. Thank you for your participation in this study! Scott Lathrop TeraGrid Director of Education, Outreach and Training — www.teragrid.org SC07 Education Program Chair — www.sc-education.org The questions seemed pretty random/subject to broad interpretation, but I filled it in anyway—I'll be interested to see what if anything comes of it. Read More ›

How I'm Doing
Greg Wilson / 2007-08-07
I wasn't happy with the two courses I taught this past winter — too many distractions, too little preparation. The feedback on the Software Carpentry course was therefore a pleasant surprise: I've heard second-hand that several of the Computer Science grad students were disappointed by its slow pace, but overall I did better than I expected. On a 1-5 scale: Background required to successfully complete the course: 2.0 How easy to obtain details/background to supplement lecture material: 2.0 Did term work increase understanding: 3.7 Material was presented too slow/fast: 2.0 Material was too broad/specialized: 2.9 Workload was too light/heavy: 2.9 How well organized was the lecturer: 4.0 (no idea whose class they were in...) How satisfied: 4.5 Overall rating: 4.2 The most common positive comments were that the course was practical and pragmatic, and that the collaborative projects were worthwhile. Negatives include the assignment being distributed and marked very late, not enough examples of what good programs actually look like, the course being slow for CS students, a lack of depth in some areas (particularly security), and my jokes being corny. Read More ›

How Not to Collaborate
Greg Wilson / 2007-07-31
I posted a note a while back about an upcoming workshop at Microsoft Research on computational education for scientists. If you read the call for papers, you'll discover that there aren't any instructions on how to submit material; nor is there any contact information, other than the generic "contact us" link the bottom corner that gives you nothing more than Microsoft's generic 1-800 number ("press 1 for Windows Vista support..."). After bouncing around for five minutes, I got a human being who told me that she couldn't give out any phone numbers, but I was welcome to fax my question to them... On the bright side, Microsoft is running another computational science conference this fall (in North Carolina). Read More ›

Computational Education for Scientists
Greg Wilson / 2007-07-18
Via Zachary Dodds: Microsoft is hosting a workshop on computational education for scientists in September. I'm going to try to be there—anyone else? Read More ›

Win a Trip to Reno!
Greg Wilson / 2007-07-05
Francis Sullivan is running a contest: describe what you'd do if someone gave you a desktop petaflop machine, and you could win a trip to Supercomputing '07 in Reno. 1000 words, due by August 10. Read More ›

Another Sighting of Software Carpentry
Greg Wilson / 2007-07-04
Rick Wagner, a computational astrophysicist at UCSD, is teaching a course based on Software Carpentry. Yay! Read More ›

Two Studies of ASCI (and no, that's not a typo)
Greg Wilson / 2007-06-27
After spending ten years helping scientists write programs for massively-parallel computers, I realized that what scientists really needed was to learn how to program, full stop. It took me another eight years to (a) get up to speed with the theory and practice of modern software engineering, (b) realize how big the gap between the two was, and (c) accept that the only people who are entitled to have an opinion about how we ought to be building software are the ones who are studying how well their favorite tools and practices actually work in the field. I no longer care what you're pushing—formal methods and cleanroom development or agile adhocracy and pair programming—unless you've gone the extra mile and collected data to show what effect it's actually having. That's why I was so pleased to come across these two papers: Post and Kendall: "Software Project Management and Quality Engineering Practices for Complex, Coupled Multiphysics, Massively Parallel Computational Simulations: Lessons Learned from ASCI". Intl. Journal of High Performance Computing Applications, 18(4), Winter 2004, pp. 399-416. Carver, Kendall, Squires, and Post: "Software Development Environments for Scientific and Engineering Software: A Series of Case Studies". Proc. ICSE 2007, May 2007, pp. 550-59, 0-7695-2828-7. "ASCI" is the Accelerated Strategic Computing Initiative. Launched in the mid-1990s, its mission was to produce a new generation of software for the US nuclear weapons program. These are big pieces of code: millions of lines, with lifespans measured in decades, doing some of the most complicated math ever devised by human beings. Hundreds of millions of dollars have been spent, and thousands of programmer-years, so it's worth asking, "How's it going? And what could be done better?" According to the authors of these papers, who have spent a lot of time studying the major projects within ASCI, the answers are "So-so" and "Lots" respectively. Some parts of ASCI are considered outright failures: the Blanca project, for example, was so much in love with funky technology like template metaprogramming that it never delivered the science it was supposed to. Other parts, though, have come through, though many have taken longer to do so than originally envisioned. As the authors point out, though, management was "aggressive" in setting ASCI's specs, schedule, and resourcing levels, so it's not surprising that mere mortals couldn't live up to them. A lot of what these papers say is standard project management dogma. For me, the most important point was, "Emphasize 'best practices' instead of 'processes'." Every successful development team I've ever seen worried more about "doing the right thing" than about following the steps in a flowchart. Knowing what the "right thing" is, now, that's the tricky part, but it's what I'd like most to impart to my students. If you come back in ten years and ask me how I'm doing, I ought to have some data for you. Read More ›

Software Carpentry at LLNL
Greg Wilson / 2007-06-26
Titus Brown has posted about the first run of his Advanced Software Carpentry course at Lawrence Livermore National Laboratory. It was a bit bumpy, but seems to have gone well overall, and he has put a ton of useful material on the web. Read More ›

Software Carpentry Screencasts by Chris Lasher
Greg Wilson / 2007-06-20
Chris Lasher, a grad student at the University of Virginia Virginia Tech (sorry, sorry), has posted some screencasts about version control based on the Software Carpentry notes at ShowMeDo.com; a second series about using the shell should be up shortly. It's just plain cool to see the material picked up and carried forward in ways I wouldn't even have thought of a couple of years ago. Later: now there's Bioscreencast.com, the Journal of Visual Experiments, and others — the world moves on. Read More ›

Inspirational Videos
Greg Wilson / 2007-06-20
This, on scientific computing and visualization, is pretty fricken cool. Read More ›

Nature Precedings
Greg Wilson / 2007-06-18
Tim O'Reilly posted a note from Timo Hannay, of Nature magazine, about their new "Precedings", which: giv[es] researchers a place to post documents such as preprints and presentations in a way that makes them globally visible and citable. Submissions are filtered by a team of curators to weed out obviously inappropriate material, but there's no peer-review so accepted contributions appear online very quickly — usually within a couple of hours. Read More ›

Praising the Good
Greg Wilson / 2007-06-11
I frequently gripe about how backward scientific software development is, so it's only fair that I give praise where praise is due. Steve Easterbrook pointed me at the UK Met Office's Flexible Configuration Management system, which combines code management using Subversion and Trac with a GNU Make-based build and an extraction system that pulls data out of the repository and builds it for particular platforms. This image shows how the three parts fit together; it ain't rocket science, but it's very professional, and a good model for others to follow. Read More ›

Computational Scientists Still Don't Get It
Greg Wilson / 2007-05-05
A workshop called "Software Issues in Computational Science and Engineering" is running in Uppsala, Sweden, this August. Here's their blurb: Software for numerical computations faces multiple challenges. The software should be easy to use. Ideally, adaptation to new applications should be flexible, and extension to incorporate new numerical techniques straightforward. At the same time the software should execute extremely efficiently on various high-performance computing platforms. Accuracy and robustness are other key features. The overall challenge is to find ways to construct numerical software so that all these different goals are met simultaneously. Once again, there's no mention of making the programs right—nothing about testing, nothing about tracking results so that when a bug does appear you know what you should retract, nothing. I'm sure the organizers would say, "Oh, that's part of accuracy," but I've been part of enough discussions to know that when numerical scientists say "accuracy and robustness", they're talking about algorithms, not about coding bugs. Given stories like this one, it's a revealing oversight. Read More ›

Titus Brown Teaching Software Carpentry
Greg Wilson / 2007-04-02
Titus Brown will be teaching "Intermediate and Advanced Software Carpentry with Python" at Lawrence Livermore National Laboratory this spring, and is open to teaching it elsewhere after that. Meanwhile, Chris Lasher has put together some more debugging material for the course. As mentioned previously, they're planning to run a writing spree at SciPy'07; hope to see lots of people there. Read More ›

Sign Error: Five Papers Retracted
Greg Wilson / 2007-03-19
Via Genome Biology (free registration for trial access required), news that scientists from the Scripps Institute have to retract five papers published in various prestigious journals because of a sign error in a computer program. As Gregory Petsko says in the article: Their mistake has consequences beyond the damage to the unfortunate young investigator and his team. For five years, other labs have been interpreting their biophysical and biochemical data in terms of the wrong structures. A number of scientists have been unable to publish their results because they seemed to contradict the published X-ray structures. I personally know of at least one investigator whose grant application was turned down for funding because his biochemical data did not agree with the structures. One could argue that an entire sub-field has been held back for years... If I was a twenty-something working toward my PhD, I'd be thinking very hard about how I was going to validate the programs I was writing—the odds are growing steadily that journal editors and granting agencies are going to start demanding some sort of due diligence, sooner rather than later. Read More ›

SciPy'07 Dates Announced
Greg Wilson / 2007-03-11
According to www.scipy.org, The SciPy 2007 Conference will be on August 16-17 this year; tutorials and sprints will run on the 14th, 15th, and 18th. I won't be able to attend (new baby), but I'd like to organize a half-day or one-day session to update and extend the Software Carpentry notes. Lots of modules need writing, both on Python-specific stuff and on general software engineering skills for scientists and engineers. I'd particularly like to see: A lecture or two on NumPy (used to have one, it fell behind Travis Oliphant's coding, and it's probably now the biggest gap in the lectures) A whole lecture on the subprocess module, job control, and remote execution A second lecture on security Some screencasts on Python IDEs (Wing 101, IDLE, Eclipse, and Komodo) A lecture on connecting to C and Fortran A lecture on design patterns A lecture on professional ethics and responsibilities And stuff on requirements, traceability, data lineage, and, oh, what else do you want? If you're interested, please let me know... Read More ›

Reproducibility of Computational Results
Greg Wilson / 2007-03-10
Via Titus Brown, a link to the Insight Journal, an open access online publication covering medical image processing. They have a very interesting process requirement: your source code must compile & be verifiable by an automatic system. I've been expecting something like this for a long time; glad to see it happening. Later: and, via Gary Bader, Source Code for Biology and Medicine. Anyone know of a journal or journals like this for physics, chemistry, geology, and other non-life-science areas? Or (wistfully) computer science? Read More ›

Software Carpentry Screencasts
Greg Wilson / 2007-02-07
Chris Lasher, who audited the Software Carpentry course long-distance last time around, has put together some screencasts to go with the first few lectures: The shell (screencast) Subversion (screencast) I think this is very cool — if you have feedback, or want to praise him (hint, hint), please drop him a line. I would also be interested in hearing how useful you find these — if there were twenty or thirty covering the whole course, would you actually watch them? And would they help you understand the course material? Read More ›

Software Carpentry Usage in December
Greg Wilson / 2007-01-18
Visits to the Software Carpentry site were down quite a bit in December; I had expected a drop because of the holiday, but not this large. On the bright side, 47 students have signed up for the course at U of T this term; about 1/3 are from Computer Science, while the rest span Civil Engineering, Ecology, Physics, Ecology, and the life sciences. Read More ›

YouTube for Data
Greg Wilson / 2006-12-05
TechCrunch is running an interesting report on Swivel, which bills itself as "YouTube for data". Anyone and everyone can upload data sets to share with the rest of the world, link together, and what have you. It's going to be a great resource for teaching... ...but I'm less enthusiastic about it for science, unless its creators have some verification and validation magic up their sleeves that they're not yet talking about. Still, it's another sign of what Richard Dawkins called "the evolution of evolvability": every new kind of thing (segmented bodies, eyes, web services) gives evolution a new affordance on which to act. Taggable public upload is turning out to be a very mutable affordance... Read More ›

Software Carpentry article in CiSE
Greg Wilson / 2006-11-28
The November 2006 issue of Computing in Science and Engineering includes an article I wrote titled "Software Carpentry: Getting Scientists to Write Better Code by making Them More Productive". It's available to subscribers only on the magazine's web site, but they have kindly given me permission to post it here. I think it's out too late to bring November's stats up above October's, but I could be surprised. Read More ›

Software Carpentry continues to grow
Greg Wilson / 2006-11-02
Traffic on the Software Carpentry site was up again in October, after a dip in September: The sys admin at Enthought also cleaned the comment spam out of the Trac they gave me to manage the project. I have two dozen more minor tickets to file from email I've received in the last few weeks (much of it from Germany). Still waiting for people to start contributing content — any volunteers? Read More ›

Computational Result Retracted
Greg Wilson / 2006-10-31
From the latest Nature (Vol 443, No 7114, p 1013): When a new, independent code is used for the calculations on which the conclusions of this Letter were based, the results reported for the evolution of obliquity cannot be reproduced. This code was written in the inertial frame and is more reliable than the one used in the Letter. In most runs, the obliquities can change by only a few degrees and attain large values in only a very few cases. In addition, the obliquity variation shown in the Supplementary Information, although correct, originates from changes to orbital inclination of the planet, and close encounters are not effective in causing large obliquities. In other words, the code was flaky, so the results we published were wrong. Kudos to the author for trying to verify his result with another program—I'm sure a lot of computational scientists would be very embarrassed if they had to do the same. But I wonder: why does he believe the "new code" is more reliable? Surely not just because it uses a more sophisticated method, or gives a less surprising answer... (Thanks to Andrew Straw for the pointer.) Read More ›

German Version of 'Bottleneck'
Greg Wilson / 2006-10-26
A German version of my article "Where's the Real Bottleneck in Scientific Computing?" has just appeared in Spektrum magazine. Pay-per-view, unfortunately, but the Software Carpentry site has had a flurry of hits from .de domains. Dec 1: I just received my copy in the mail—I sound so much...sterner...in German ;-) Gregory V. Wilson ist Professor für Computerwissenschaft an der University of Toronto. Sein Kurs ist erhältlich unter www.swc.scipy.org/. Read More ›

SciPy'06: First Morning
Greg Wilson / 2006-08-17
Guido van Rossum's Keynote Python 2.5 coming Real Soon (Sept 12) Python 3000 is a brand-new revision of the language Name chosen as a dig at Windows 2000, and so that it couldn't possibly be late Fix design bugs dating from 1990-91 + get rid of deprecated features First time Guido has allowed himself to be backward incompatible Need process, but don't want to become C++ or the next Perl 6 Alpha early 2007, final a year later (early 2008) Cares a lot about bringing users with him Will go as far as 2.9 (run out of digits) Changes: New keywords allowed dict.keys(), range(), zip() won't return lists All strings Unicode; mutable 'bytes' data type Binary file I/O redesign Drop as an alias for != Etc. See PEP 3099 for things that won't happen (e.g., programmable syntax) Can't do perfect mechanical translation (dynamic languages) Use pychecker-like tool to handle 80% of cases Create instrumented Python 2.x that warns about "doomed" constructs See PEP 3100 for the laundry list Small points Kill classic classes Exceptions must derive from BaseException int/int will return a float Remove last differences between int and long Absolute import by default Kill sys.exc_type and friends Kill dict.has_key, file.xreadlines() Kill apply(), input(), buffer(), coerce() Kill ancient library modules; more stdlib cleanup exec becomes a funciton again Kill `x` in favor of repr(x) Change except clause syntax to exception E1, E2, E3 as err Means "as" becomes a keyword [f(x) for x in S] becoms sugar for list(f(x) for x in S) General trend in Python away from lists toward more abstract structures Kill raise E, arg in favor of raise E(arg) zip becomes izip lambda lives! String types reform (bytes and str instead of str and unicode) All data s either ibnary or text (conversions happen at I/O time) Different APIs for binary and text streams New standard I/O stack C stdio has too many problems Borrow from Java streams API (bleah) Print becomes a function (boo) See mailing list thread for justification But I think that putting the output file at the end in print(x, y, file=z) is going to trip people up Dict views instead of lists dict.keys() and dict.items() return a set view dict.views() will return a bag (multiset) view Can delete from (but not add to) a view Modifies the dict accordingly Drop default implementations of comparison operators <, <=, etc., currently compare by address — will raise TypeError == and != should remain (useful) Generic and overloaded functions (see his blog — running out of time) Python sprints coming up (Aug 21-24) Q&A Py3K team is smaller than Perl6 — GvR optimistic that people will get the work done Taking advantage of multicore? GvR not a big fan of threads Prefers loose coupling (one process per core) Last attempt to get rid of the GIL slowed Python down by 2X But neither Jython nor IronPython have a GIL Will C-Python API change much? Yup — just like the language PyPy/type inference? Python 4.0 or a sibling language Travis Oliphant on the State of NumPy Chair thanked him for everything he's done to fix numerical Python — standing ovation (well deserved) NumPy 1.0 rc1 will be out in a few weeks Walked through design — tradeoffs between flexibility, portability, and performance very well thought through One part I enjoyed was the way he flipped back and forth between PowerPoint and the interpreter Clear that for him, Python is a tool for thinking with Showed off weave, an Enthought tool for embedding C in Python for array programming Also shows off Pyrex (another tool for the same purpose) Fernando Perez: Python for Modern Scientific Algorithm Development "Why is Python more than 'free MATLAB'?" Power of built-in datatypes, higher-level programming, etc. Michael Aivazis: "Building a Distributed Component Framework" Described a medium-sized framework called pyre 1200 classes, 75K lines of Python, 30K lines of C++ Has been running in various incarnations for almost ten years Good discussion of architectural issues — perfect example of the kind of researcher I'd like Software Carpentry to produce Read More ›

Oh My God It's Django!
Greg Wilson / 2006-08-17
Guido just pronounced: Django is the web framework Won't be part of the core, but will be as "standard" as PIL or NumPy This was not what I expected the outcome of my talk would be, but hey, I'll take it ;-) He hopes that Django and TurboGears will converge Eric Jones: Enthought Tool Suite Enthought does open source software and consulting for scientific computing with Python One of the sponsors of this conference Providing hosting for Software Carpentry Walked through traits and other offerings Chris Mueller: Synthetic Programming with Python A Python library for generating assembly code for the Power PC Get rid of the many layers between Python and the hardware Great performance for great effort (up-front about that) Q: How do you debug? A: insert an illegal instruction, back up a few bytes, single-step in GDB Interested in multi-language debugger, but interested in a lot of other things as well Prabhu Ramachandran: 3D Visualization Author of MayaVi, a better (more Pythonic) wrapper for VTK Impressive — hides as much of the guff in complex 3D scientific visualization as it can Andrew Straw: Realtime Computing with Python The Grand Unified Fly: a computational simulation of a fruit fly Use Python for high-level stuff, and real-time for sub-millisecond control of motors, etc. Program $20 microcontrollers with Python E.g., Flydra is a multi-headed camera+FireWire system to track fly motion Lightning Talks Mike Ressler: Prototyping Mid-Infrared Detector Data Processing Algorithms Classic data crunching with NumPy Brian Granger: The State of IPython Sales pitch Travis Oliphant: Array Interface BOF Please for people to help put together PEP for arrays in Python Travis Vaught: Enstaller Enhancements to Python Egg system (with GUI) Worth tracking Michel Sanner: The Current State of Vision Update on a visual builder for image processing pipelines Very cool — but lots of overlap (it seems) with MayaVi Peter Wang: Quick Overview of Chaco 2D plotting library Repeat of slides from yesterday's tutorial William Stein: Software for Algebra and Geometry Experimentation SAGE bundles together lots of other packages used for algebra and exact computation Very particular about getting his whole half hour, despite the late hour ;-) Alex Clemesha: Mathematica-like Plotting for SAGE Slow cruise through SAGE's graphics As with web frameworks, Python has too many plotting packages for its own good Diane Trout: BioHub There's a lotof sequence data out there And collection is accelerating rapidly BioHub is a Python interface for large-scale genomic analysis A database to link diverse annotation sources I didn't know that genes have version numbers... ;-) Greg Wilson: Software Carpentry Last talk of the day — some locals had already headed home, but there were about 70 people present Went well, but no one's offering to teach the course at their institution this fall Slides available online Read More ›

SciPy and Software Carpentry
Greg Wilson / 2006-08-16
I'm flying down to CalTech this evening to give a talk on Software Carpentry at SciPy'06. There's been a fair bit of traffic on the web site in the last couple of weeks, and I'm looking forward to hearing how else we can help scientists and engineers become more productive programmers. Total visits: 4354 in 15 days, from 731 distinct domains (excluding obvious robots). Read More ›

HPCWire Interview on Software Carpentry
Greg Wilson / 2006-08-04
HPCWire recently interviewed me about the Software Carpentry course. Coincidentally, I've started getting patches from new contributors (especially Ralph Corderoy, who did a very thorough review of the shell programming lectures). More would be welcome... Read More ›

Design Patterns in Scientific Software
Greg Wilson / 2006-07-30
This paper, via the OpenScience Project, is a couple of years old, but a good step in an interesting direction. Read More ›

The Parallel Tools Platform
Greg Wilson / 2006-07-20
The Parallel Tools Platform is an open source extension to Eclipse for writing, running, and debugging large parallel programs. This tutorial gives an overview of what it can do (with lots of pretty pictures); this article from CiSE goes into more detail. The lack of decent development tools (particularly debuggers) was one of the reasons I left parallel computing a decade ago; PTP almost looks nice enough to tempt me back into the pool... Read More ›

Software Carpentry 3.0
Greg Wilson / 2006-07-14
I am pleased to announce the release of Software Carpentry 3.0 (Final). The web site is up, the mailing lists have been populated, all the tickets have been moved into the project's Trac, and I've submitted a proposal for a wrap-up talk to SciPy'06. It's taken 18 months to complete instead of the 10 I originally planned, but I have a green light to teach the course again this winter at the University of Toronto, and hope that others will pick the course up elsewhere. Read More ›

Software Carpentry's new home
Greg Wilson / 2006-06-25
Thanks to the folks at Enthought, the Software Carpentry course notes have a new home at http://www.swc.scipy.org. I'll move the wiki, bug tracker, and mailing lists there over the next couple of days as well. I hope the community will find the material useful — it's certainly been a lot of fun putting it together. Read More ›

Revised Lecture on Teamware
Greg Wilson / 2006-05-05
I've revised the Software Carpentry lecture on using team tools. I'd be grateful for feedback. Read More ›

Software Carpentry 1111
Greg Wilson / 2006-05-03
Revision 1111 of Software Carpentry just went into the repository. All the images are now there (thanks, Nick), along with code fragments and exercises (not as many as I'd like, but enough to get people started). Printed, it comes to 346 pages, but don't do this at home—the supposedly-transparent PNGs are still solid black when printed. Things I'd like to do (or would like volunteers to contribute in the usual open source way) include: #5: complete the description of how to use the subprocess module. #14: add a lecture on numerical programming, and another one on how to test numerical code. #24, #25, #115, and #120: put material on eval, exec, code coverage, profiling, and other reflective ideas back in. #28: fix the markers around regular expressions so that they display on all platforms. #40: add a lecture on object-oriented analysis and design using the ICONIX process. #65: come up with a better way to display the evolution of code fragments on-line using JavaScript. #67: document the XML markup used in slides so that other people can easily contribute. #93: fix image backgrounds so that they print properly. Several: clean up the build process used to produce the notes. #105: automatically check that all Python source examples conform to style guidelines. #116: put material on time/date handling back into lectures. #121: add a second lecture on style that focuses on what makes a good (or bad) class. Any volunteers? Read More ›

Corrections Done
Greg Wilson / 2006-04-28
All the outstanding minor corrections to the Software Carpentry notes have now been made; there are still 21 diagrams outstanding, but they should be in by Sunday. My thanks to everyone who provided feedback—especially Adam Goucher and his very sharp eyes. Read More ›

Zipf's Law of Feedback
Greg Wilson / 2006-04-17
Zipf's Law says that frequency is inversely proportional to rank, i.e., the second most common word in a large body of text will occur half as many times as the most common. I have observed an even steeper curve for Software Carpentry feedback: of the 336 corrections I've received, 212 are from one person (Adam Goucher), 21 from Matthew Moelter, 12 from the next two people, and then we're down into the curve's long tail. Has anyone ever done similar stats on the volume or frequency of contributions to software projects? Read More ›

341 Words
Greg Wilson / 2006-04-09
The glossary for the Software Carpentry course now defines 341 terms. What may be more interesting (for those of you who have been following the course's development) is what I've taken out: Code coverage and execution profiling: they really should be in the course, but don't fit into any of the existing lectures. Date and time manipulation: it isn't part of software engineering per se, but like Unicode, floating-point roundoff, and a dozen other things, this is one of the subjects that everybody just ought to know about. Again, it doesn't fit neatly into any of the existing lectures. Cross-site scripting, and a few other security-related terms: the security lecture has been completely revamped. It's much less ambitious, but (I hope) more informative. Everything to do with UML: I've never used it outside of class, and have only ever worked with one person who did. I therefore feel like a bit of a fraud including it in a course on practical software development. Things that I want to add (eventually): Building desktop GUIs: yes, people still do this, and it's a great way to introduce some more OO concepts. Now that there's a book on wxPython, maybe I'll finally do this. User interface design, because I agree with Catherine Letondal (who has provided some very useful feedback): you shouldn't show someone how to build a GUI unless you show them how to build a good one. Numerical programming, because I agree with Tom Fairgrieve: people ought to need a license in order to use floating-point numbers. I've actually written this one a couple of times, but (a) Python's Numeric module is still in flux, and (b) I don't want to dive into this unless I have something concrete to say about how you test floating-point code. Extended examples: I'd like to write at least three or four mini-projects, each taking about an hour to describe, because I believe there are things you can only learn from examples. For now, though, I'm going to concentrate on getting this release out the door... Read More ›

New Security Lecture Up
Greg Wilson / 2006-04-05
A new lecture on security is up. It has changed a lot from the hash I presented in the fall; feedback would be very, very welcome. Read More ›

Integration and XML Lectures
Greg Wilson / 2006-04-04
Hot on the heels of yesterday's lecture on integration (which desperately needs iron-willed editing) comes a rewrite of the XML lecture. This is a mix of HTML formatting rules and tags, and DOM; I'd appreciate knowing if you think it hangs together, and whether there needs to be more on how to process XML programmatically. Read More ›

2020 Hype
Greg Wilson / 2006-03-26
A report from Microsoft Research called 2020 Science got a lot of press this week: Nature seems to think it's the biggest story of the year so far, and The Economist gave it three full columns. Sadly, amidst the gush about how computers are revolutionizing science, no one mentions that most scientists have no idea how reliable their programs are—in fact, most scientists don't even know how they would figure that out [1,2]. If someone submitting a paper to Nature said, "We didn't calibrate the equipment, we didn't write down the settings, and we have no idea what the error bars on our graphs should be," their work would be bounced without a second thought. Unless computational scientists decide to live up to those standards, the "revolution" that 2020 Science describes will be a long time coming. [1] "Where's the Real Bottleneck in Scientific Computing?" [2] "Computational Science Demands a New Paradigm". Read More ›

Web Server Programming Lecture Is Up
Greg Wilson / 2006-03-06
The lecture on server-side programming is up: if you think it should be split into two parts, please let me know. That leaves only three to go (XML, security, and integration). And by this time on Thursday, we'll be in Lima, Peru... Read More ›

Client-Side Web Programming Lecture
Greg Wilson / 2006-03-03
The Software Carpentry lecture on client-side web programming is now up (sans diagrams). Comments and corrections welcome. Read More ›

Last Two Lectures Are Up
Greg Wilson / 2006-03-02
The last two lectures in the Software Carpentry course are up: Teamware Summary (where "last" means "last in delivery order", not "last to be revised"). They're both fairly rough right now, so high-level feedback would be more useful than pointers to typos. Read More ›

Database Lecture is Up
Greg Wilson / 2006-02-23
The lecture on databases (actually an introduction to SQL) is now up. Comments and corrections welcome. Read More ›

Second Lecture on Testing Now Online
Greg Wilson / 2006-02-22
The second lecture on testing is now online. As always, comments and corrections are appreciated. Read More ›

What Else for Software Carpentry?
Greg Wilson / 2006-02-21
16 lectures are now in place (more or less), which means I have 8 more to do. The syllabus shows what I've covered already; my current plans include: unit testing XML SQL more SQL small-team development process What do you think the other three should cover (keeping in mind that this is supposed to be a course on basic software engineering, rather than scientific programming)? Options include: Basic web programming, with much-revised versions of: http://www.third-bit.com/swc1/www/client.html http://www.third-bit.com/swc1/www/server.html http://www.third-bit.com/swc1/www/security.html Integration, including: wrapping C code so that it can be called from Python using popen() and its ilk to run external programs (probably) something on refactoring to make code more testable (as per Feathers' excellent Working Effectively with Legacy Code Three lecture-length examples, building very simple versions of core tools that haven't been covered elsewhere: data lineage continuous integration data consistency checking Give in, and do the scientific programming stuff anyway: floating-point arithmetic Python's Numeric package data visualization Scrap the single lecture on development process, and put in four full lectures on the subject XP UML-based processes (probably ICONIX) something else (not entirely sure what) Something else entirely — suggestions would be very welcome. Please let me know what you think. Read More ›

Second Lecture on Object-Oriented Programming
Greg Wilson / 2006-02-21
The second lecture on object-oriented programming is now on the web. This describes operator overloading and static methods, and includes the design patterns material that was in the old design lecture (which has been removed—the general consensus was that it didn't work). As always, comments are welcome. Read More ›

AAAS Annual Meeting 2006
Greg Wilson / 2006-02-20
Wednesday, 11:10 p.m.: phone call from Air Can'tada saying that my Thursday morning flight to St Louis has been cancelled because of bad weather. Next available is 4:00 p.m. Friday afternoon—two and a half hours after my workshop is due to end. No, they can't help me find an alternative carrier. Expedia can, though, and by midnight, I have a ticket on Delta, via Cincinnati. Thursday, oh dark hundred: the cab's tires crunch through eight centimeters of fresh snow on the way to the airport. We're late getting off the ground, and even later leaving Cincinnati, but at least we're airborne. Tornado warnings over St Louis, though, so after circling over a spinning mass of clouds with a lightning-filled depression in the middle for about an hour, we head for Evansville, Indiana. I finally get to my hotel at 7:30 p.m., fifteen hours after starting my day. Friday: the Annual Meeting of the AAAS isn't really a scientific conference—it's a place for science advocates to gather and plot, stirred together with an extended series of press cuddles dolled up as seminars. (This is not a criticism: if the cosmetics industry, fast food vendors, and the military-industrial complex are smart enough to plot and cuddle, scientists should be too.) Some of the talks (particularly the medical ones) are Mojave-dry, but others are pretty cool: "The Demography of Black Holes" (with pictures!) "In Search of Genes that Influence Language" (without, but still interesting) "New Approaches to Paleontological Investigation" (use a CT scan of a fossil to drive a 3D lithography machine, and you can photocopy dinosaur bones at sub-millimeter resolution—oh, and check out www.digimorph.org) Friday noon: Andy Lumsdaine and Peter Gottschling arrive from Indiana University for our workshop on Essential Software Development Skills for Research Scientists. We covered the usual topics: Computational scientists don't pay as much attention to quality and reproducibility as experimental scientists (in fact, many of them don't pay any attention to these issues). Most scientific programmers are woefully inefficient compared to their industrial counterparts, largely because no one has ever taught them basic software engineering skills. A handful of tools and techniques can reliably improve scientific programmers' productivity by 20-25%: version control, test-driven development, continuous integration, issue tracking, use of a debugger, enforcing style, traceability, and behind them all, automation. There are many personal and institutional obstacles (ranging from "I have a degree in physics, so programming must be easy" to "journals and tenure committees don't care, so I can't afford to"). We either fix this ourselves, proactively, or someone else will legislate bad rules in the wake of a very public disaster. Randy Heiland's picture shows the three of us on stage; there weren't as many lab managers or funding directors as I'd hoped for, but lots of good questions and discussion. Friday evening: a recap of the 2005 Ig Nobel Prize awards for science that cannot, or should not, be repeated, including: Physics: John Mainstone and the late Thomas Parnell, for patiently conducting an experiment that began in the year 1927, in which a glob of congealed black tar has been slowly, slowly dripping through a funnel, at a rate of approximately one drop every nine years. Medicine: Gregg A. Miller, for inventing Neuticles—artificial replacement testicles for dogs, which are available in three sizes, and three degrees of firmness. Literature: the Internet entrepreneurs of Nigeria, for creating and then using e-mail to distribute a bold series of short stories, thus introducing millions of readers to a cast of rich characters, including General Sani Abacha, Mrs. Mariam Sanni Abacha, Barrister Jon A Mbeki Esq., and others. Peace: Claire Rind and Peter Simmons, for electrically monitoring the activity of a brain cell in a locust while that locust was watching selected highlights from the movie Star Wars. Economics: Gauri Nanda, for inventing an alarm clock that runs away and hides, repeatedly, thus ensuring that people DO get out of bed, and thus theoretically adding many productive hours to the workday. Biology: Benjamin Smith and others, for painstakingly smelling and cataloging the peculiar odors produced by 131 different species of frogs when the frogs were feeling stressed. Fluid Dynamics: Victor Benno Meyer-Rochow and Jozsef Gal, for using basic principles of physics to calculate the pressure that builds up inside a penguin, as detailed in their report "Pressures Produced When Penguins Pooh—Calculations on Avian Defaecation." Saturday: I smorgasboarded the seminars. The best was Latanya Sweeney's talk about information privacy—she was kind enough to chat with me for 45 minutes afterward about undergraduate curriculum reform, and the obstacles to it (did you know there isn't an undergrad course on software engineering at CMU?). The worst was an unrelated seminar on "Information Security in Public Databases". Aaron Emigh, of Radix Labs, did a great job of explaining the issues. Kevin Fu, of UMass, was also engaging, but Mike Szydlo (RSA) gave us a technical sales talk that I'm sure went over the heads of most of the audience. And then there was Markus Jakobsson, of Indiana University. He's the guy who conducted phishing attacks on IU students last year, without their prior consent (informed or otherwise), in order to get material for a paper. I think this was irresponsible: one of the obstacles to better security is that the public doesn't trust us (the professionals) to look out for their interests. Some of that is Hollywood's fault (how many positive portrayals of computer geeks have you seen recently? and how many portrayals of what hackers can and can't do are half as accurate as the average episode of a medical soap opera?), but conducting experiments on people who don't know they're being experimented on sure doesn't help. One telling moment came after the presentations, when Jakobsson asked the audience which of two "solutions" they thought would work better: educating the public, or better technology. I pointed out that what he was really offering users was a choice between paying more (hours) or paying more (dollars, to technology vendors). I then asked why he hadn't mentioned the third option, which is to shift the financial pain to the vendors (which is what brought the problem of credit card fraud under control). He dodged, but Aaron Emigh didn't, so I'm going to see if I can get Aaron's slide set, and post it here. Saturday afternoon: discover that there are no bookstores in downtown St Louis. I don't mean, "there are no good ones". I mean, "there are no bookstores in the downtown core of St Louis, at all". The nearest (according to both hotel staff and conference organizers) is a 15-minute drive away—in another county. Sunday: up at quarter to five to get to the airport for a 7:15 flight that didn't take off until 8:45, which meant that I missed my 10:59 in Cincinnati, and had to get the 1:10 instead, so I didn't get home until 3:20. Very happy to walk through the door; very happy to have someone else happy that I was walking through the door. Read More ›

Lecture on Binary Data
Greg Wilson / 2006-02-14
The Software Carpentry lecture on binary data is now up on the web. The content of this one has been fairly stable for a while, but that just means that all the bugs will be in the details—comments and corrections are greatly appreciated. Read More ›

Data Lineage
Greg Wilson / 2006-02-14
The January 2005 issue of ACM Computing Surveys (vol. 37, no. 1, if you prefer) has good review by Rajendra Bose and James Frew titled "Lineage Retrieval for Scientific Data Processing: A Survey". In it, they look at what scientists do to keep track of what data they have, where it came from, and what has been done to it. Some of my students last term were worrying about the same issues in the context of HL7 medical data. It seems like an ideal place for software engineers to apply their skills: I'd be interested in hearing from people who have home-grown or small-scale systems I could use as a starting point for a lecture in Software Carpentry. Read More ›

Regular Expressions Lecture is Up
Greg Wilson / 2006-02-12
The Software Carpentry lecture on regular expressions is now on the web. It's a bit of a jump from the previous lecture on design, but given how shaky the latter was, I needed the cuddly warm blanket of working on something I'm sure I understand. As always, comments and corrections to me, please. Read More ›

Software Carpentry Design Lecture
Greg Wilson / 2006-02-10
I've just posted a completely new lecture on object-oriented design ideas. I'd be very grateful for high-level comments: is it useful material? Do you think that relatively inexperienced programmers will get it? Etc. Read More ›

First Lecture on Object-Oriented Programming Is Up
Greg Wilson / 2006-02-06
I've split the lecture on object-oriented programming in two; the first one is now on the web (without diagrams). This ties with the rewrite of the style lecture for "most changes since the fall", so comments and corrections would be very welcome. Eleven days 'til the AAAS workshop... Read More ›

Debugging Lecture
Greg Wilson / 2006-02-02
The debugging lecture is now up for comments and corrections. It still has the old art work, and the code sample for logging is pending, and it still feels like I'm waving my hands more than in other lectures, so suggestions would be particularly welcome. Read More ›

Fourth Python Lecture for Software Carpentry
Greg Wilson / 2006-01-29
The fourth lecture on Python for the Software Carpentry course is now up. It covers sets, dictionaries, and a little bit of algorithmic complexity. Comments and criticisms are very welcome. And if that's too serious for year, check out the web site for the Waterfall 2006 conference ;-). Read More ›

Quality Assurance Lecture Now Available
Greg Wilson / 2006-01-24
It used to be the introductory lecture on testing; now it's more of an introduction to quality assurance from a programmer's point of view. It's changed even more than the style lecture; comments and corrections would be very welcome. (And yes, I know, the diagrams are missing...) Read More ›

Third Software Carpentry Python Lecture on the Web
Greg Wilson / 2006-01-23
The third lecture on Python for the Software Carpentry course has been posted; comments and criticisms are invited and appreciated. Read More ›

Programming Style Lecture Has Been Revised
Greg Wilson / 2006-01-23
The Software Carpentry lecture on programming style has been revised. Comments on this would be particularly welcome, as it's substantially different from what I delivered last fall. Read More ›

Second Python Lecture Now on the Web
Greg Wilson / 2006-01-18
The second revised lecture on Python for Software Carpentry is now on the web. As always, comments and corrections are welcome. Read More ›

Intro Python Lecture Available
Greg Wilson / 2006-01-15
The first Software Carpentry lecture on Python has been revised, and is up on the web. Comments and criticisms are very welcome. Read More ›

Build Lecture Is Up
Greg Wilson / 2006-01-11
The revised lecture on Make is now up for comments and feedback. Still no diagrams, I'm afraid... Read More ›

Two More Revised Software Carpentry Lectures
Greg Wilson / 2006-01-09
I have revised the second shell lecture and the lecture on version control for the Software Carpentry course, and added the necessary glossary definitions. As always, comments are greatly appreciated. Read More ›

First Shell Lecture for Software Carpentry is Up
Greg Wilson / 2006-01-04
The first lecture on using the shell is up on the new and improved Software Carpentry web site. Comments and feedback would be greatly appreciated. Read More ›

Software Carpentry Introduction revised and on the web
Greg Wilson / 2006-01-02
I have revised the Introduction to the Software Carpentry course. This was the easy one: no diagrams, no code fragments, no glossary entries; the rest will be harder. I'd be very grateful for feedback, comments, and corrections. (And yes, audio will follow, but not today—my nephews are playing Zombies downstairs, and juvenile shouts of, "Eat lead you undead scum!" aren't quite the background I have in mind. Read More ›

$67 million a year
Greg Wilson / 2005-12-28
The US Dept. of Energy has just announced the next round of funding for SciDAC, its flagship supercomputing program. US$67 million per year for three to five years. Supercomputing Online reports: Research proposals funded under the SciDAC program will help create a comprehensive, scientific computing software infrastructure that integrates applied mathematics, computer science and computational science in the physical, biological and environmental sciences for scientific discovery on petascale computers. My bet is that, once again, most projects will depend on heroic effort, rather than good development techniques, to reach their goals. I'm also willing to bet that anyone who wants to use most of the software these projects create will have to put in heroic effort of their own to get it built and deployed. I (obviously) believe that a little bit of training would go a long way, but I'm not optimistic that the people who need it most will listen: as is so often the case, those who know they need it are already halfway home, while those who need it most don't even know what they're missing. Read More ›

New Year's Schedule for Software Carpentry
Greg Wilson / 2005-12-27
I'm teaching a cut-down version of Software Carpentry at the IASSE in two and a half weeks. I'll have students half days for the weeks of January 16 and 23, and full days for the week of February 6. That's only 20 lectures (rather than 26), so the question is, what to cut? The answer has wider implications, since this will be the version of the course I take to the AAAS workshop. My plan is: Jan 16 Introduction Revised to be a forward summary of the whole course. 17 Shell 1 18 Shell 2 19 Version Control 20 Make Revised so that it doesn't depend on Python. Jan 23 Python 1 Basic features. 24 Python 2 Strings and lists. 25 Python 3 Functions and Libraries. 26 Testing Basic concepts. 27 Mini-Project 1 Build something useful with Python. Feb 06 Python 4 Dictionaries and exceptions. Debugging Deepened to include material from Zeller. 07 Python 5 Object-oriented programming. Unit Testing Use the unit test framework to show what good OO design looks like. 08 Coding Style Update to include an actual Python style guide. Reflection Complete rewrite: exec, eval, sub-processes, etc. 09 Regular Expressions XML and DOM 10 Development Process Describe how a good shop actually works (with nods to XP and RUP). Teamware Based on Trac. Client-side and CGI web programming, security, and databases have disappeared completely; the three lectures on process have been folded into one; and there's no end-of-course summary. I'm comfortable with those changes; what I don't like is the amount of time spent teaching Python-the-language. I'd rather spend those hours showing them how to use Python to automate development activities, but you can't cut trees 'til you have an ax. Second, there's no place in this new scheme for a lecture based on Paul Dubois's CiSE article on maintaining correctness. There really ought to be: it shows the jigsaw puzzle of which many good practices are pieces. Third, I'd like a second project lecture, showing students part of the build system for the course notes. This would let them see regular expressions and DOM in action, and would tie together many of the earlier ideas on automation. It's this or teamware, though, and I think the latter is more important. Having made that decision, I'm wavering on whether to pull out the material on regular expressions and DOM. Finally, everything I have to say about the development process is now squeezed into a single hour. It makes sense in this case, since IASSE students will get several more courses on the subject, but it's definitely under weight for the AAAS workshop. So: in order to pull this off, I'm going to have to revise one lecture per day from January 2 onward (including diagrams). I'll post the new materials here until they're polished, at which point I'll swap them into the standard location. I'll blog each time a lecture goes up: timely feedback would be greatly appreciated. Read More ›

Procrastination: One of the Few Things in Life Nicer Than Toast
Greg Wilson / 2005-12-23
I finished rewriting the build system for the Software Carpentry course notes yesterday. Doing so was an extended form of procrastination: the system I built over the summer and used through the fall was adequate, but I wanted to clean a few things up, and then, well, I might as well make it easier for other instructors to add site-specific content, and make tables inclusions instead of inlining them, and mumble mumble mumble type type type... Of course, none of this has actually advanced the content of the course one whit. I have over seventy tickets to close, ranging in size from making sure that a particular Make example does what I claim to rewriting the lecture on security. And diagrams: no one was happy with the isometric ones created this term (not least because they're kind of fuzzy), so I have over a hundred diagrams to re-do. In a perfect world, they'd be ready before I teach at the IASSE in mid-January. In this universe, I'll be happy if they're in place for the Essential Software Skills for Research Scientists workshop at the AAAS Annual Meeting on February 17. We all do this. We all fold laundry instead of paying bills, or invent an antigravity drive when we're supposed to be studying for an Economics final. (OK, maybe that was just me.) But it seems particularly common among software developers, many of whom would rather spend two hours creating a new (not better, just new) serialization class hierarchy than take five minutes to center-align the titles at the top of the product's help page. One of the characters in Mark Costello's Big If is a prime example: his company desperately needs him to add some new monsters to a video game, so he spends a week adding shadows to clouds. But back to the build system... What I have is a set of XML files marked up with a homegrown tag set, and what I want is some HTML pages. The files are organized into several directories: the main page is in the root, while all of the lectures are in lec/, and site-specific content is in sub-directories underneath sites/. Each directory that contains source XML files may also contain img/, inc/, and tbl/ sub-directories; in turn, each of those has one sub-directory for each of the source files, which holds images, sample code inclusions, and tables. The build system consists of the following tools: A 500-line Makefile in the root directory that drives everything else. Roughly half of those lines are comments (which can be extracted and formatted as a wiki page to create on-line documentation). This Makefile includes another file called config.mk, in which users must specify the lectures they want to include in the course. A Python script called linkages.py that scans the source files and builds a data structure that records such things as the order of lectures, where glossary terms are defined, the two-part numerical IDs of figures and tables, and so on. linkages.py writes this data structure directly to a file called tmp/linkages.tmp.py, which other tools then import. Persisting the data structure directly saved me from having to mess around with parsers or serializers. The clever bit (ahem) is that I only write it out if (a) the file doesn't already exist, or (b) the contents have changed. That way, if I change a source file in a way that doesn't affect cross-linkages, Make doesn't do a lot of unnecessary rebuilding. Once the linkages file is up to date, preprocess.py kicks in. This script creates copies of the source files under the tmp/ directory (preserving the directory structure), and adds information to those copies to make XSLT's job easier. Among other things, it: adds a unique file ID, and the path to the root of the build, to the lecture's root element; copies content from table files into the lectures; adds citation information to bibliography references; does multi-column layout of length tables; inserts figure and table counter values (the "4.2" in "Figure 4.2"); fills in cross-references between source files; replaces the <lecturelist/> element with a point-form list of links to lectures; fills in the <figlist> and <tbllist> tags with lists of figures and tables respectively; links terms in the glossary back to their first uses; inserts included program source files; links to external references; adds "previous" and "next" linkage information to lectures; generates a syllabus; and adds tracing information, such as file version numbers and the time the files were processed. Each stage ought to be a filter of its own, and in fact I wrote them all that way to begin with. However, launching fifteen or more copies of the Python interpreter for each source file made the build rather slow; doing the piping internally reduced the time per source file from eight or nine seconds to less than a second. util/individual.xsl is an XSL script that translates the filled-in XML lecture file into HTML. This script handles the outer skeleton directly, handing specific tasks like the bibliography and special lists to other XSL files that it includes. A Python script called util/unify.py and an XSL script called util/unified.xsl work together to create a single-page version of the whole course. unify.py stitches the filled-in lecture files together; unified.xsl then applies the same transformations as individual.xsl, but formats hyperlinks differently (since they're all in-file). I use another Python script called validate.py to check the internal consistency of the source files. Do any of them contain tabs or unprintable characters? Do all the required images, source files, and tables exist? I run this before checking in changes; it catches something about one time in five. And then there are the minor tools: util/fixentities.py replaces character entities with character codes (to work around a problem with Expat); util/wiki.py extracts specially-formatted comments from Makefiles and XSL files, and docstrings from Python, to create wiki documentation pages; and util/revdtd.py reverse engineers the actual DTD of either the source files, their filled-in counterparts, or the generated HTML files. It's a lot of code; it was a lot of work; I'm pleased with how smoothly it all runs; and most of the time I spent building it should probably have gone into upgrading the actual content of the course. But small(ish) tasks are seductive: you can start work at 8:30, confident that you'll have something to show (even if only to yourself) by noon. Editing course notes, well, the payoff is usually a long way away, and may not come at all: people who read through the first, flawed, version of the notes probably aren't going to come back and tell you how much better the second version is. That last observation is the key ingredient of my cure for procrastination: find some partners. I am always more productive when I'm working with people than I am on my own. Not only does a small team wander down fewer blind alleys than someone working alone, team members can keep each other honest, and give each other feedback and encouragement. They can also appreciate just how big an accomplishment it is to have replaced all the a's and b's in twenty-eight short examples of list manipulation with the names of minerals, beetles, and mathematicians. It's now ten to eleven, and I've managed to fend off productivity for almost an hour. Should I look on eBay for a WACOM Cintiq tablet that I can afford? It'd make drawing diagrams much more fun. Or maybe I should try Nose: Miles Thibault says it's much friendlier than the unit testing framework in the Python standard library. Hm... A cup of tea will probably help me decide. A cup of tea, and a slice of toast with strawberry jam... Read More ›

Maintaining Correctness
Greg Wilson / 2005-12-11
I'm re-thinking the lectures in the Software Carpentry course based on feedback from this term's students. I'm going to merge the three lectures on different development processes into one, and use the space that frees up to talk in more detail about programming style and software design—assuming, of course, I can think of something to say that isn't banal. I also want to talk about the material in an article by Paul Dubois in the May/June 2005 issue of Computing in Science & Engineering called "Maintaining Correctness in Scientific Programs". Here are a few key lines from the introduction: The more frequently a program is changed, the more difficult it is to maintain its correctness... Most programmers can reasonably tell when their programs are incorrect, but for scientific programmers, this is not the case. A bug that doesn't cause the program to fail in an obvious way could be indistinguishable from an error in modeling the real world with equations... Solving this problem must be the focus of our methodology, be it for a single person writing a 10,000-line program [or] a team of 20 or more writing half a million lines. Paul then outlines a strategy based on defense in depth which has the following layers: a protocol for source control; use of language-specific safety tools; design by contract; verification; reusing reliable components; automating testing; unit testing (which requires automation to be effective); to-main testing policy (i.e., code must be tested before being integrated from a branch into the main line); regression testing; release management; and bug tracking. This immediately struck me as an excellent way to organize and motivate several important parts of the course. It also points out some holes that I'll need to fill. Oh, to have more hours, and more hands... Read More ›

American Scientist article on Software Carpentry
Greg Wilson / 2005-12-09
The Jan/Feb 2006 issue of American Scientist, the magazine of Sigma Xi, contains an article on Software Carpentry. Read More ›

Executive Version of Software Carpentry Course
Greg Wilson / 2005-12-08
I'll be teaching a shortened version of the Software Carpentry course at the Institute for Advanced Studies in Software Engineering in Toronto early next year. For more information, or if you are interested in taking part, please see the IASSE's web site. The dates are: January 16-20: one lecture per day, plus practical, with a quiz on Friday. January 23-27: ditto. February 6-10: two lectures and practical per day, with a short final exam on Friday afternoon. I've got a lot to do between now and then... Read More ›

Workshop at AAAS '06
Greg Wilson / 2005-11-04
As I've mentioned before, I'm running a workshop on "Essential Software Skills for Research Scientists" at the Annual Meeting of the American Association for the Advancement of Science on Friday, February 17, in St Louis. I'm hoping to use the workshop to convince scientists, administrators, and policy makers that the things covered in the Software Carpentry course are essential to good science. If you know someone who ought to attend, please let them know, or pass them my contact information. Read More ›

Software Carpentry at the AAAS
Greg Wilson / 2005-09-21
I just received word that I'll be running a workshop on the aims, benefits, and curriculum of the Software Carpentry course at the Annual Meeting of the American Association for the Advancement of Science in St Louis on Friday, February 17. The AAAS AM is the biggest gathering of scientists in the world; I'm pretty excited. Meanwhile, Michael Hoffman is teaching a short course on Python at the European Bioinformatics Institute in Cambridge, England, based on the course notes. 24 people are now enrolled in the course at the University of Toronto, 47 are auditing locally, and 21 others are sitting in from other locations. (These number don't include the students enrolled in the course at Indiana University, or the study group at CalTech.) Oh, and 38 students have already submitted Exercise 1, even though it isn't due until 5:00 Friday ;-). Read More ›

Day 9
Greg Wilson / 2005-09-20
Week two of Software Carpentry, and things are starting to settle down. There are now 93 (!) people signed up for the course: U of T Elsewhere Auditing 18 21 Enrolled 25   Unknown 29   Of those actually at the University of Toronto (as opposed to the local hospitals—a few radiologists are sitting in) the breakdown is: 16 Computer Science 14 Physics 9 Civil Engineering   Mechanical and Industrial Engineering 5 Biochemistry   Institute for Aerospace Studies 2 Institute of Medical Science   Mathematics 1 Astronomy   Biomaterials and Biomedical Engineering   Botany   Geology   Medical Biophysics   Zoology   Non-degree student This week's lecture went much more smoothly than last week's, in part because we were in a larger room, with seating for everyone, and in part because the content was an introduction to Python, which I've given more times than I can count. There were still some glitches in the slides, though: a few things were out of order, and I really do need to choose more concrete examples. 33 students have completed the first exercise, which is already more than I'd expected—I'm feeling uncharacteristically optimistic right now ;-) The past week has also CSC49X, the fourth-year Computer Science project course. 23 students are working on 10 different projects (24 and 11 respectively, if you count Sean Dawson's paid-but-not-for-course-credit work on a sequence diagram debugger plugin for Eclipse). DrProject (our Trac-derived project management system) is holding up so far, and we're almost finished setting it up for two other courses to use this term as well. Not bad for Day 9 of term; not bad at all. Read More ›

Software Carpentry: First Meeting
Greg Wilson / 2005-09-14
The Toronto edition of the Software Carpentry course met for the first time on Monday. 51 people crammed into a room with seating for 34 (and no air conditioning, on an unseasonably warm day): medical biophysicists, computer scientists, civil engineers, and even a couple of faculty members. I was pleased with the turnout, but less pleased with my lectures. I don't think my introduction to the shell made sense to anyone who didn't already know the content—it was far too long, and as Andy Lumsdaine (who's teaching from these same notes at Indiana) said, this stuff really does need to be interactive. The version control lecture was more successful. While the notes talk about editing C files, I talked about co-authoring a paper written in LaTeX, which I hope was less intimidating. (A couple of people have suggested that there ought to be an entire lecture on LaTeX, but I'm still not convinced—it would be a good sample problem for the lecture on Make, but it isn't really software engineering itself.) I spent several hours yesterday (Tuesday) editing the recordings I made of my lectures using Audacity, then converted them to MP3's with CDex. I have a few verbal tics, and we spent more time shuffling chairs around than I thought, so each of the lectures reduced to under 30 minutes of real content. It could be that the biggest thing I'll get out of this course personally is better public speaking skills... ;-) The rest of yesterday was spent putting together some Python scripts to manage the class list, and to generate Subversion passwords and access control file entries from it. I also played with two GUI interfaces for Subversion (RapidSVN, which is cross-platform, but not really very rapid, and TortoiseSVN, which is Windows-only). I need to check out SmartSVN as well, so that Macintoids will have something to play with—it's clear Monday's lecture that unless they can start with a GUI, many students will go away thinking that version control is intrinsically hard. With another hour's work, I'd be ready to send mail to students in Toronto and elsewhere telling them how to access the Subversion repositories they'll be using in the course, and what their first exercise is. However, this morning is going to be taken up with the first batch o 49X project meetings. This term, 23 students will be working on: a collaborative grading tool; a combined land use and vehicle traffic simulator; neuroimaging algorithm performance; an on-line marking aid; visualizing rock planes in 3D; a new interface for the Bell Kids Help Phone; a lightweight requirements management tool; DrProject itself; a database for a food bank; and an electronic GFP (Green Fluorescence Protein) browser. It's going to be a busy, but rewarding, fall... Read More ›

Software Carpentry at Indiana University
Greg Wilson / 2005-08-22
Andy Lumsdaine is offering the Software Carpentry course at Indiana University starting on August 29, under the heading B649 Software Tools and Practices. Read More ›

Software Carpentry course in Nature
Greg Wilson / 2005-07-29
Nature has run a short blurb about the course I'm putting together on software development skills for scientists and engineers. Read More ›

Software Carpentry notes are up
Greg Wilson / 2005-07-08
I put an alpha version of the notes for the Software Carpentry course [1] on-line yesterday. Now, I'm looking for a way to convert them into a single PDF document, so that reviewers can download them in one shot. I've found some open source tools that don't do style sheets, and some commercial tools that'll do one page at a time, for money, but nothing so far that strikes me as being any better than "printing" with PDFCreator, one page at a time. Given that there are 31 pages, this is tedious enough that I'd like to find an alternative... [1] The course on basic software development skills for scientists and engineers that I'm writing for the Python Software Foundation. Read More ›

Python Software Foundation Grant
Greg Wilson / 2004-12-30
Greg Wilson has been awared a grant from the Python Software Foundation to revamp his course on "Software Engineering for Scientists and Engineers", and put it under an open license, so that it can be used at other institutions. W00t! Read More ›