Teaching basic lab skills
for research computing

Related Projects

Data Carpentry

Our sibling project Data Carpentry develops and teaches workshops on the fundamental data skills needed to conduct research. Its mission is to provide researchers high-quality, domain-specific training covering the full lifecycle of data-driven research. Where Software Carpentry teaches best practices in software development, Data Carpentry's focus is on the introductory computational skills needed for data management and analysis in all domains of research. Its lessons are domain-specific, from life and physical sciences to social and geospatial sciences and build on the existing knowledge of learners to enable them to quickly apply skills learned to their own research. Their target audience is learners who have little to no prior computational experience.

Software Carpentry and Data Carpentry have now merged their projects to form The Carpentries, but will continue to main their separate, individual lesson and workshop identities and development.

ActivePapers ActivePapers aims to make computational science more open and more reliable by making computations reproducible and publishable. It is a file format for storing computations, consisting of both code and data, and it is also a software infrastructure for working with this file format.
Contact: Konrad Hinsen

ARCHER ARCHER is the UK National Supercomputing Service, providing a world-class supercomputer and invaluable computational resources, consultancy and training for researchers who study problems with a global impact.
Contact: Mike Jackson

Astropy Astropy is a community-driven package intended to contain much of the core functionality and some common tools needed for performing astronomy and astrophysics with Python.
Contact: Erik Bray

Biostars Biostars is a bioinformatics, computational genomics and biological data analysis question and answer forum. It is most used by biological computational scientists and it addresses all levels of questions regarding biological and computational data analysis. Biostars was started by Istvan Albert in 2010 and has since grown to more than 16,000 contributors with more than 120,000 posts on bioinformatics and genomics.
Contact: Josh Herr

Cyclus Cyclus, the Next Generation Nuclear Fuel Cycle Simulator, is the next-generation agent-based nuclear fuel cycle simulator, providing flexibility to users and developers through a dynamic resource exchange solver and plug-in, user-developed agent framework. The goal of Cyclus is to enable a broad spectrum of fuel cycle simulation while providing a low barrier to entry for new users and agent developers.
Contact: Katy Huff, Paul Wilson, Anthony Scopatz

EcoDataRetriever The EcoData Retriever automates finding, downloading and cleaning up ecological datasets.
Contact: Ethan White, Ben Morris

Effective Computation in Physics Effective Computation in Physics is a manual of programming and software skills aimed at researchers in the physical sciences and engineering. This field guide to effective scientific computing in Python covers: programming in Python, important Python packages such as NumPy and Pandas, interaction with the command line, software testing, version control, build systems, documentation, publishing using LaTeX, how to manage collaborative software development with GitHub, and even how to license your software.
Contact: Katy Huff, Anthony Scopatz

GAP GAP is a free, open and extensible system for discrete computational algebra, with particular emphasis on Computational Group Theory. GAP provides a programming language, a library of thousands of functions implementing algebraic algorithms written in the GAP language as well as large data libraries of algebraic objects.
Contact: Alexander Konovalov

Genomics Virtual Lab The Genomics Virtual Lab provides analysis tutorials and protocols, and scalable research platforms on the cloud with easy access to reference data and tools on demand. Tools include Galaxy, IPython Notebook, RStudio, and environment modules for command-line users.
Contact: Clare Sloggett

Git-RDM Git-RDM is a Research Data Management (RDM) plugin for the Git version control system. It interfaces Git with data hosting services to manage the curation of version controlled files using persistent, citable repositories.
Contact: Christian Jacobs

IPFS IPFS is a system for managing generic content-addressable data in a generalized Markle graph and mutable references to that data over a distributed, peer-to-peer network. For example, you could use the same system to manage both your source code and your huge binary datasets, transferring changes efficiently between any set of nodes, and archiving past versions for as long as you like.
Contact: W. Trevor King

iPlant Collaborative The iPlant Collaborative is cyberinfrastructure for scientists, educators, and students working in all domains of life sciences. iPlant resources allow users to analyze, manage, and store data and experiments, access high-performance computing, and share results with colleagues. iPlant is open source and funded by the National Science Foundation.
Contact: Jason Williams

Journal of Open Research Software The Journal of Open Research Software is an Open Access journal featuring peer reviewed software metapapers describing research software with high reuse potential and full-length papers that cover different aspects of creating, maintaining and evaluating open source research software.
Contact: Neil Chue Hong

khmer project The khmer project provides scripts and a library for analyzing, filtering, and correcting DNA sequencing data. The project implements efficient probabilistic data structures and lossy compression algorithms for working with very large DNA sequencing data sets.
Contact: Titus Brown

Macroeco Macroeco provides a comprehensive set of functions for analyzing empirical patterns in ecological data, predicting patterns using theory and models, and comparing empirical patterns to theory. Many major macroecological patterns can be analyzed using this package, including the species abundance distribution, the species and endemics area relationships, several measures of beta diversity, and many others.
Contact: Mark Wilber, Justin Kitzes

MAGeTBrain MAGeTBrain Given a set of anatomically labelled MR images (atlases) and unlabeled images (subjects), MAGeT (Multiple Automatically Generated Templates Brain Segmentation) produces a segmentation (automatic labeling) for each subject using a multi-atlas voting procedure based on a template library made up of images from the subject set.
Contact: Gabriel A. Devenyi

Mahotas Mahotas is a computer vision and image processing library for Python based on NumPy.
Contact: Luis Pedro Coelho

Math Education Resources The Math Education Resources is a volunteer-run project that provides substantial learning resources to undergraduate students taking math courses at UBC or elsewhere. Our wiki provides free hints and peer-reviewed solutions to previous final exam questions from UBC's math courses. Students can search questions by topic, watch mini video lectures for each topic, or use the difficulty ratings from their peers to guide their studies.
Contact: Bernhard Konrad

mothur project mothur is an open source and portable software package that meets the bioinformatics needs of the microbial ecology research community. It is particularly useful for analyzing 16S rRNA gene sequences that have been generated using a variety of sequencing platforms. We emphasize making the software as easy to use as possible through our efforts in maintaining a wiki, blog, and user forum.
Contact: Pat Schloss

Mozilla Science Lab The Mozilla Science Lab is helping a global network of researchers, tool developers, librarians and publishers collaborate to further science on the web, via collaboration, prototyping and educational projects. Check out projects they're working on with the open science community, and join the dicsussion on their forum or their community calls.
Contact: Abby Cabunoc, Arliss Collins, Bill Mills

NIPY NIPY (Neuroimaging in Python) is making it easier to do better brain imaging research, by designing and implementing free and open-source algorithms and pipelines for the analysis of data from neuroimaging experiments, and teaching neuroscientists how to use them.
Contact: Ariel Rokem

Open Tree of Life The Open Tree of Life is building a comprehensive tree of all life. By enabling community contribution of phylogenies (estimates of species or other group relationships), and combining these phylogenies into a single tree-of-life we capturing the depth of knowledge about biodiversity on Earth, encouraging community comment and refinement and preserving phylogenetic data in a readily reusable way.
Contact: Emily Jane McTavish, Karen Cranston

PEcAn Project The Predictive Ecosystem Analyzer (PEcAn) is an integrated ecological bioinformatics toolbox that provides tools to synthesize plant and ecosystem data with mechanistic understanding encoded in ecosystem models. PEcAn is not a model itself, but a platform for model parameterization and calibration that wraps simulation models within a Bayesian framework. The PEcAn Project is an open community of scientists, programmers, and educators and we welcome new contributions and applications.
Contact: David LeBauer

Practical Computing for Biologists Practical Computing for Biologists aims to teach scientists of all types essential computational skills for data analysis. It assumes no pre-requisites other than motivation, features the most useful tools that can benefit researchers almost immediately, and explains how to use these tools together. With a background level of comfort, researchers can then go on to reap even more benefit from Software Carpentry and other educational sites and courses.
Contact: Steve Haddock

PyMC PyMC is a Python module that implements Bayesian statistical models and fitting algorithms, including Markov chain Monte Carlo. Its flexibility and extensibility make it applicable to a large suite of problems. Along with core sampling functionality, PyMC includes methods for summarizing output, plotting, goodness-of-fit and convergence diagnostics.
Contact: Chris Fonnesbeck

PyNE project PyNE, the Nuclear Engineering Toolkit, is a suite of tools to aid in computational nuclear science and engineering. PyNE seeks to provide native implementations of common nuclear algorithms, as well as Python bindings and I/O support for other industry standard nuclear codes.
Contact: Katy Huff, Rachel Slaybaugh, Paul Wilson, Anthony Scopatz

PyRDM PyRDM is a Python-based library for research data management (RDM). It facilitates the automated management and publication of software source code and data via online, citable repositories hosted by Figshare, Zenodo, or a DSpace-based service. The library is open-source and can be readily incorporated into scientific workflows to allow research outputs to be curated and shared in a straight-forward manner.
Contact: Christian Jacobs

rOpenSci project rOpenSci is a software collective that provides R based tools to enable access to scientific data repositories, full-text of articles, and science metrics and also facilitate a culture shift in the scientific community towards reproducible research practices.
Contact: Karthik Ram

scikit-bio scikit-bio is a Python bioinformatics library based on the standard Python scientific computing stack that implements core bioinformatics data structures, algorithms, parsers, and formatters. scikit-bio is the first bioinformatics-centric scikit, and arises from over ten years of development efforts on PyCogent and QIIME, It is intended to be useful both for students, who can use it in conjunction with the accompanying interactive textbook, An Introduction to Applied Bioinformatics, and for real-world bioinformatics developers.
Contact: Greg Caporaso

scikit-image scikit-image is a collection of algorithms for image processing. It is available free of charge and free of restriction. We pride ourselves on high-quality, peer-reviewed code, written by an active community of volunteers.
Contact: Juan Nunez-Iglesias

ARCHER ShareLaTeX is an online collaborative LaTeX editor designed to make LaTeX more accessible to newcomers, and to make collaboration easier between co-authors. Everyone can access the latest version, see a full change history, and work with the same compatible LaTeX environment. ShareLaTeX is available hosted, and is open source.
Contact: James Allen

STAT 545 project STAT 545 at the University of British Columbia teaches students how to explore, groom, visualize, and analyze data, and how to make all of that reproducible, reusable, and shareable using R.
Contact: Jenny Bryan

tec project tec provides utilities for a uniform and extensible API for easily simulating vacuum thermionic energy conversion (TEC) devices. A few models are supplied with the tec package, but others can be easily added.
Contact: Joshua Ryan Smith

Dialogue & Discussion

You can review our commenting policy here.