Our sibling project Data Carpentry develops and teaches workshops on the fundamental data skills needed to conduct research. Its mission is to provide researchers high-quality, domain-specific training covering the full lifecycle of data-driven research. Where Software Carpentry teaches best practices in software development, Data Carpentry's focus is on the introductory computational skills needed for data management and analysis in all domains of research. Its lessons are domain-specific, from life and physical sciences to social and geospatial sciences and build on the existing knowledge of learners to enable them to quickly apply skills learned to their own research. Their target audience is learners who have little to no prior computational experience.
Software Carpentry and Data Carpentry have now merged their projects to form The Carpentries, but will continue to main their separate, individual lesson and workshop identities and development.
aims to make computational science more open and more reliable by
making computations reproducible and publishable. It is a file
format for storing computations, consisting of both code and data,
and it is also a software infrastructure for working with this
Contact: Konrad Hinsen
is a bioinformatics, computational genomics and biological data
analysis question and answer forum. It is most used by biological
computational scientists and it addresses all levels of questions
regarding biological and computational data analysis. Biostars was
started by Istvan Albert in 2010 and has since grown to more than
16,000 contributors with more than 120,000 posts on bioinformatics
Contact: Josh Herr
the Next Generation Nuclear Fuel Cycle Simulator, is the
next-generation agent-based nuclear fuel cycle simulator,
providing flexibility to users and developers through a dynamic
resource exchange solver and plug-in, user-developed agent
framework. The goal of Cyclus is to enable a broad spectrum of
fuel cycle simulation while providing a low barrier to entry for
new users and agent developers.
Contact: Katy Huff, Paul Wilson, Anthony Scopatz
Effective Computation in
Physics is a manual of programming and software skills aimed at
researchers in the physical sciences and engineering. This field guide to
effective scientific computing in Python covers: programming in Python,
important Python packages such as NumPy and Pandas, interaction with the
command line, software testing, version control, build systems,
documentation, publishing using LaTeX, how to manage collaborative
software development with GitHub, and even how to license your software.
Contact: Katy Huff, Anthony Scopatz
GAP is a free, open and
extensible system for discrete computational algebra, with
particular emphasis on Computational Group Theory. GAP provides a
programming language, a library of thousands of functions
implementing algebraic algorithms written in the GAP language as
well as large data libraries of algebraic objects.
Contact: Alexander Konovalov
The Genomics Virtual Lab
provides analysis tutorials and protocols, and scalable research
platforms on the cloud with easy access to reference data and
tools on demand. Tools include Galaxy, IPython Notebook, RStudio,
and environment modules for command-line users.
Contact: Clare Sloggett
is a system for managing generic content-addressable data in
Markle graph and mutable references to that data over a
distributed, peer-to-peer network. For example, you could use the
same system to manage both your source code and your huge binary
datasets, transferring changes efficiently between any set of
nodes, and archiving past versions for as long as you like.
Contact: W. Trevor King
The iPlant Collaborative
is cyberinfrastructure for scientists, educators, and students
working in all domains of life sciences. iPlant resources allow
users to analyze, manage, and store data and experiments, access
high-performance computing, and share results with colleagues.
iPlant is open source and funded by the National Science
Contact: Jason Williams
The Journal of Open Research Software is an Open
Access journal featuring peer reviewed software metapapers
describing research software with high reuse potential and
full-length papers that cover different aspects of
creating, maintaining and evaluating open source research
Contact: Neil Chue Hong
provides a comprehensive set of functions for analyzing empirical
patterns in ecological data, predicting patterns using theory and
models, and comparing empirical patterns to theory. Many major
macroecological patterns can be analyzed using this package,
including the species abundance distribution, the species and
endemics area relationships, several measures of beta diversity,
and many others.
Contact: Mark Wilber, Justin Kitzes
Given a set of anatomically labelled MR images (atlases) and
unlabeled images (subjects), MAGeT (Multiple Automatically
Generated Templates Brain Segmentation) produces a segmentation
(automatic labeling) for each subject using a multi-atlas voting
procedure based on a template library made up of images from the
Contact: Gabriel A. Devenyi
The Math Education Resources
is a volunteer-run project that provides substantial learning
resources to undergraduate students taking math courses at UBC or
elsewhere. Our wiki provides free hints and peer-reviewed
solutions to previous final exam questions from UBC's math
courses. Students can search questions by topic, watch mini video
lectures for each topic, or use the difficulty ratings from their
peers to guide their studies.
Contact: Bernhard Konrad
mothur is an open source and
portable software package that meets the bioinformatics needs of
the microbial ecology research community. It is particularly
useful for analyzing 16S rRNA gene sequences that have been
generated using a variety of sequencing platforms. We emphasize
making the software as easy to use as possible through our efforts
in maintaining a wiki, blog, and user forum.
Contact: Pat Schloss
Science Lab is helping a global network of researchers, tool
developers, librarians and publishers collaborate to further
science on the web, via collaboration, prototyping and educational
out projects they're
working on with the open science community, and join the
dicsussion on their
forum or their
Contact: Abby Cabunoc, Arliss Collins, Bill Mills
NIPY (Neuroimaging in
Python) is making it easier to do better brain imaging research, by designing
and implementing free and open-source algorithms and pipelines for the
analysis of data from neuroimaging experiments, and teaching neuroscientists
how to use them.
Contact: Ariel Rokem
The Open Tree of Life
is building a comprehensive tree of all life. By enabling
community contribution of phylogenies (estimates of species or
other group relationships), and combining these phylogenies into a
single tree-of-life we capturing the depth of knowledge about
biodiversity on Earth, encouraging community comment and
refinement and preserving phylogenetic data in a readily reusable
Contact: Emily Jane McTavish, Karen Cranston
The Predictive Ecosystem Analyzer (PEcAn)
is an integrated ecological bioinformatics toolbox that provides tools to synthesize
plant and ecosystem data with mechanistic understanding encoded in ecosystem models.
PEcAn is not a model itself, but a platform for model parameterization and calibration
that wraps simulation models within a Bayesian framework. The PEcAn Project is an open
community of scientists, programmers, and educators and we welcome new contributions
Contact: David LeBauer
Practical Computing for Biologists
aims to teach scientists of all types essential computational
skills for data analysis. It assumes no pre-requisites other than
motivation, features the most useful tools that can benefit
researchers almost immediately, and explains how to use these
tools together. With a background level of comfort, researchers
can then go on to reap even more benefit from Software Carpentry
and other educational sites and courses.
Contact: Steve Haddock
is a Python module that implements Bayesian statistical models and fitting algorithms,
including Markov chain Monte Carlo.
Its flexibility and extensibility make it applicable to a large suite of problems.
Along with core sampling functionality,
PyMC includes methods for summarizing output, plotting, goodness-of-fit and convergence diagnostics.
Contact: Chris Fonnesbeck
the Nuclear Engineering Toolkit, is a suite of tools to aid in
computational nuclear science and engineering. PyNE seeks to provide
native implementations of common nuclear algorithms, as well as
Python bindings and I/O support for other industry standard
Contact: Katy Huff, Rachel Slaybaugh, Paul Wilson, Anthony Scopatz
is a Python-based library for research data management (RDM).
It facilitates the automated management and publication of software
source code and data via online, citable repositories hosted by
Figshare, Zenodo, or a DSpace-based service. The library
and can be readily incorporated into scientific workflows to allow
research outputs to be curated and shared in a straight-forward manner.
Contact: Christian Jacobs
is a Python bioinformatics library based on the standard Python
scientific computing stack that implements core bioinformatics
data structures, algorithms, parsers, and formatters. scikit-bio
is the first bioinformatics-centric scikit, and arises from over
ten years of development efforts on PyCogent and QIIME, It is
intended to be useful both for students, who can use it in
conjunction with the accompanying interactive
An Introduction to Applied Bioinformatics,
and for real-world bioinformatics developers.
Contact: Greg Caporaso
is an online collaborative LaTeX editor designed to make LaTeX
more accessible to newcomers, and to make collaboration easier
between co-authors. Everyone can access the latest version, see a
full change history, and work with the same compatible LaTeX
environment. ShareLaTeX is available
and is open source.
Contact: James Allen
Dialogue & Discussion
You can review our commenting policy here.