<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Software Carpentry</title>
	<atom:link href="http://software-carpentry.org/feed/" rel="self" type="application/rss+xml" />
	<link>http://software-carpentry.org</link>
	<description>Helping scientists make better software since 1997</description>
	<lastBuildDate>Tue, 22 May 2012 13:56:18 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Citing Versions</title>
		<link>http://software-carpentry.org/2012/05/citing-versions/</link>
		<comments>http://software-carpentry.org/2012/05/citing-versions/#comments</comments>
		<pubDate>Tue, 22 May 2012 13:56:18 +0000</pubDate>
		<dc:creator>Greg Wilson</dc:creator>
				<category><![CDATA[Opinion]]></category>

		<guid isPermaLink="false">http://software-carpentry.org/?p=4854</guid>
		<description><![CDATA[We got mail yesterday from a workshop participant saying, &#8220;My question is how does one show in a research paper that the underlying data and the software is version controlled?&#8221; Cameron Neylon&#8217;s answer, slightly edited, was: My approach in an idea world would be to have all of my data (or links to it) under [...]]]></description>
			<content:encoded><![CDATA[<p>We got mail yesterday from a workshop participant saying, &#8220;My question is how does one show in a research paper that the underlying data and the software is version controlled?&#8221; Cameron Neylon&#8217;s answer, slightly edited, was:</p>
<blockquote><p>My approach in an idea world would be to have all of my data (or links to it) under version control along with the code. When the version to be used for the publication is clear I would give it a tag (I&#8217;m a Git user but there is similarly functionality in all version control systems) and then push that to an online repository. You can then give a link or reference to the appropriate repository version online. If you don&#8217;t want to put your main repository online then you could just put up the version from the publication.</p>
<p>Of course this is not so easy if you are doing it in retrospect. Your data may be in other places in systems that aren&#8217;t under proper version control. If the data is small enough I would grab a copy and put it in the repository version you are using for publication. If its big and stored remotely then you are a bit limited. In that case I would try and refer to a specific version if it is possible, or if you can&#8217;t do that then you can try and get a checksum.</p>
<p>But basically the main thing is to create and refer to a specific version of your repository and make sure it is available in a useful form to people who want to check it out.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://software-carpentry.org/2012/05/citing-versions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Being More Systematic About Publicity</title>
		<link>http://software-carpentry.org/2012/05/being-more-systematic-about-publicity/</link>
		<comments>http://software-carpentry.org/2012/05/being-more-systematic-about-publicity/#comments</comments>
		<pubDate>Mon, 21 May 2012 15:43:39 +0000</pubDate>
		<dc:creator>Greg Wilson</dc:creator>
				<category><![CDATA[Community]]></category>

		<guid isPermaLink="false">http://software-carpentry.org/?p=4853</guid>
		<description><![CDATA[Several people have suggested that we need to be more systematic about publicizing workshops and other events: blogging and tweeting reaches people who already know about us, but doesn&#8217;t reach those who don&#8217;t. If you know of mailing lists and/or news aggregators aimed at researchers who might be interested in what we do, please mail [...]]]></description>
			<content:encoded><![CDATA[<p>Several people have suggested that we need to be more systematic about publicizing workshops and other events: blogging and tweeting reaches people who already know about us, but doesn&#8217;t reach those who don&#8217;t. If you know of mailing lists and/or news aggregators aimed at researchers who might be interested in what we do, please mail pointers to <a href="mailto:info@software-carpentry.org">info@software-carpentry.org</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://software-carpentry.org/2012/05/being-more-systematic-about-publicity/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>An Exercise With Matplotlib and Numpy</title>
		<link>http://software-carpentry.org/2012/05/an-exercise-with-matplotlib-and-numpy/</link>
		<comments>http://software-carpentry.org/2012/05/an-exercise-with-matplotlib-and-numpy/#comments</comments>
		<pubDate>Mon, 21 May 2012 02:42:38 +0000</pubDate>
		<dc:creator>Michael Hansen</dc:creator>
				<category><![CDATA[Tutorial]]></category>

		<guid isPermaLink="false">http://software-carpentry.org/?p=4835</guid>
		<description><![CDATA[[Code and Data] For this tutorial, we&#8217;ll be plotting some weather data from a site call Weather Underground. You can download temperature readings and weather events for your local area in a comma-separated file. I&#8217;ve put weather data for Bloomington, IN in a file called weather.csv. Each row is one day, and there are columns [...]]]></description>
			<content:encoded><![CDATA[<p>[<a href="http://software-carpentry.org/blog/wp-content/uploads/2012/05/weather_exercise.zip">Code and Data</a>]</p>
<p>For this tutorial, we&#8217;ll be plotting some weather data from a site call <a href="http://www.wunderground.com/">Weather Underground</a>. You can download temperature readings and weather events for your local area in a comma-separated file.</p>
<p>I&#8217;ve put weather data for Bloomington, IN in a file called <tt>weather.csv</tt>. Each row is one day, and there are columns for min/mean/max temperature, dew point, wind speed, etc. We&#8217;ll be plotting temperature and weather event data (e.g., rain, snow).</p>
<p><span id="more-4835"></span></p>
<p><strong>0. Installing matplotlib</strong></p>
<p>I covered installing matplotlib in <a href="http://software-carpentry.org/2012/05/an-exercise-with-functions-and-plotting/">a previous tutorial</a>. The matplotlib site also has <a href="http://matplotlib.sourceforge.net/users/installing.html">installation instructions</a>. I&#8217;ll assume for the rest of the tutorial that you have matplotlib installed and working. If you can type this code at a Python shell:</p>
<pre class="brush: python; gutter: false; first-line: 1;">from matplotlib import pyplot</pre>
<p>and not receive any errors, then you&#8217;re good to go.</p>
<p><strong>1. Numpy Crash Course</strong></p>
<p>The <a href="http://scipy.org/Getting_Started">numpy module</a> is how you do matrix-y stuff in Python. I&#8217;ll give a quick example of why we&#8217;ll need it. Imagine you were to type the following code into a Python shell:</p>
<pre class="brush: python; gutter: false; first-line: 1;">x = [1, 2, 3, 4]
print x * 5</pre>
<p>What does this print? Why, this of course:</p>
<pre class="brush: text; gutter: false; first-line: 1;">[1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]</pre>
<p>By default, the <tt>*</tt> operator in Python copies the contents of a list however many times you specify. So <tt>x * 5</tt> copied the contents of <tt>x</tt> five times and stuck them all together.</p>
<p>When we&#8217;re doing matrix math in Python, it would be nicer if <tt>x * 5</tt> produced <tt>[5, 10, 15, 20]</tt>. We could do this manually with a loop:</p>
<pre class="brush: python; gutter: false; first-line: 1;">for i in range(len(x)):
    x[i] = x[i] * 5</pre>
<p>We could even get fancy with Python&#8217;s <a href="http://docs.python.org/tutorial/datastructures.html#list-comprehensions">list comprehension</a> syntax:</p>
<pre class="brush: python; gutter: false; first-line: 1;">x = [x_i * 5 for x_i in x]</pre>
<p>For a list with only four elements, this won&#8217;t be so bad. For larger lists, however, it will be quite slow. Using <tt>numpy</tt> avoids the performance hit by doing the heavy lifting in C instead of in Python. Here&#8217;s how we&#8217;d do the previous example with <tt>numpy</tt>:</p>
<pre class="brush: python; gutter: true; first-line: 1;">import numpy as np
x = np.array([1, 2, 3, 4])
x = x * 5
print x</pre>
<p>This prints <tt>array([ 5, 10, 15, 20])</tt> which is what we would expect. The <tt>array(...)</tt> lets you know that <tt>x</tt> is a <tt>numpy</tt> array. Onward!</p>
<p><strong>2. Reading the Data</strong></p>
<p>As with many programming problems, our first step is to read the data into memory. I&#8217;ve started a script called <tt>plot_data.py</tt> with a few <tt>import</tt> statements and some utility functions. I&#8217;ll explain these functions in detail as we go forward.</p>
<pre class="brush: python; gutter: true; first-line: 1;">import numpy as np
import matplotlib.pyplot as pyplot
from datetime import datetime
import os

event_types = ['Rain', 'Thunderstorm', 'Snow', 'Fog']
num_events = len(event_types)

def event2int(event):
    return event_types.index(event)

def date2int(date_str):
    date = datetime.strptime(date_str, '%Y-%m-%d')
    return date.toordinal()

def r_squared(actual, ideal):
    actual_mean = np.mean(actual)
    ideal_dev = np.sum([(val - actual_mean)**2 for val in ideal])
    actual_dev = np.sum([(val - actual_mean)**2 for val in actual])

    return ideal_dev / actual_dev</pre>
<p>In past tutorials, we&#8217;ve either manually parsed our data file(s) or used Python&#8217;s <a href="http://docs.python.org/library/csv.html#csv.reader">csv reader</a>. Because of our focus on <tt>numpy</tt> here, we&#8217;re going to use the <a href="http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html">loadtxt function</a>. By passing in the right options, we can get <tt>loadtxt</tt> to parse our <tt>weather.csv</tt> file directly into a <tt>numpy</tt> array.</p>
<p>For a first pass, I&#8217;ve written the following function to read in the weather data:</p>
<pre class="brush: python; gutter: true; first-line: 1;">def read_weather(file_name):

    data = np.loadtxt(file_name, delimiter=',', skiprows=1,
            converters = { 0 : date2int },
            usecols=(0,1,2,3,21))

    return data

#-------------------------------------------------- 

data = read_weather('data/weather.csv')
print data</pre>
<p>The first two parameters, <tt>delmiter</tt> and <tt>skiprows</tt> tell <tt>loadtxt</tt> to split fields based on commas and skip the first row of the file (which contains column names). <tt>numpy</tt> doesn&#8217;t handle dates, so I&#8217;ve used the <tt>converters</tt> parameter to have have <tt>loadtxt</tt> convert column 0 (a date string) into an integer using my <tt>date2int</tt> function. The last parameter, <tt>usecols</tt>, tells <tt>loadtxt</tt> to ignore all columns in the file <em>except</em> the first, second, third, forth, and twenty-second column (the date, temperature, and weather event columns).</p>
<p>Unfortunately, running this code produces the following error:</p>
<pre class="brush: text; gutter: false;">$ python plot_data.py
Traceback (most recent call last):
  File &quot;plot_data-2.py&quot;, line 34, in &lt;module&gt;
    data = read_weather(&quot;data/weather.csv&quot;)
  File &quot;plot_data-2.py&quot;, line 28, in read_weather
    usecols=(0,1,2,3,21))
  File &quot;/usr/lib/python2.7/dist-packages/numpy/lib/npyio.py&quot;, line 796, in loadtxt
    items = [conv(val) for (conv, val) in zip(converters, vals)]
ValueError: could not convert string to float: Rain</pre>
<p>The final line tells us that <tt>numpy</tt> can&#8217;t convert the string &#8220;Rain&#8221; into a floating point number. This is from the weather events column in our data, which contains text like &#8220;Rain&#8221; or &#8220;Snow-Fog&#8221;. We could try and write a converter for this column too, but I&#8217;ve chosen to simply have <tt>numpy</tt> bring the column in as a string (which we&#8217;ll manually parse later).</p>
<p>To do this, we pass in a special object for the <tt>dtype</tt> parameter of <tt>loadtxt</tt>. This object can be constructed by giving a dictionary to the <a href="http://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.html">numpy.dtype</a> function. The code below provides names and data types for all of the columns we&#8217;ll be using:</p>
<pre class="brush: python; gutter: true; first-line: 1;">def read_weather(file_name):
    dtypes = np.dtype({ 'names' : ('timestamp', 'max temp', 'mean temp', 'min temp', 'events'),
                        'formats' : [np.int, np.float, np.float, np.float, 'S100'] })

    data = np.loadtxt(file_name, delimiter=',', skiprows=1,
            converters = { 0 : date2int },
            usecols=(0,1,2,3,21), dtype=dtypes)

    return data</pre>
<p>The last column format is given as &#8220;S100&#8243;, which means &#8220;string up to 100 characters in length.&#8221; <tt>numpy</tt> needs to know the maximum size of the column for efficiency, so I gave myself plenty of room with 100 characters. Running this new code produces the following output:</p>
<pre class="brush: text; gutter: false;">$ python plot_data.py
[(734503, 53.0, 43.0, 32.0, 'Rain') (734504, 32.0, 25.0, 18.0, 'Snow')
 (734505, 27.0, 20.0, 12.0, '') (734506, 42.0, 34.0, 26.0, '')
 (734509, 52.0, 40.0, 28.0, '') (734510, 47.0, 36.0, 24.0, '')
 (734511, 51.0, 38.0, 24.0, '') (734512, 57.0, 43.0, 28.0, '')
 (734513, 45.0, 43.0, 40.0, 'Rain') (734514, 43.0, 29.0, 15.0, 'Fog-Snow')
 (734515, 19.0, 17.0, 15.0, 'Snow') (734516, 27.0, 18.0, 9.0, 'Snow')
 ...</pre>
<p>We're finally in business. Each row in our data set consists of a timestamp (the date converted to an integer), the maximum, mean, and minimum temperature, and the weather events that occurred that day. We're going to start by plotting the mean temperature versus the day of the year. Since we've given names to each of our columns, we can pull them out easily:</p>
<pre class="brush: python; gutter: true; first-line: 1;">def read_weather(file_name):
    dtypes = np.dtype({ 'names' : ('timestamp', 'max temp', 'mean temp', 'min temp', 'events'),
                        'formats' : [np.int, np.float, np.float, np.float, 'S100'] })

    data = np.loadtxt(file_name, delimiter=',', skiprows=1,
            converters = { 0 : date2int },
            usecols=(0,1,2,3,21), dtype=dtypes)

    return data

#-------------------------------------------------- 

data = read_weather('data/weather.csv')
min_temps = data['min temp']
mean_temps = data['mean temp']
max_temps = data['max temp']
dates = [datetime.fromordinal(d) for d in data['timestamp']]
events = data['events']

for date, temp in zip(dates, mean_temps):
    print '{0:%b %d}: {1}'.format(date, temp)</pre>
<p>Each column can be extract individually from the <tt>data</tt> array by using <tt>data['column name']</tt>. I&#8217;ve used the <a href="http://docs.python.org/library/datetime.html#datetime.date.fromordinal">datetime.fromordinal</a> function on the <tt>timestamp</tt> column to convert the integers back into <a href="http://docs.python.org/library/datetime.html#datetime-objects">datetime objects</a>.</p>
<p>Using the handy built-in <a href="http://docs.python.org/library/functions.html#zip">zip</a> function, I&#8217;ve printed out pairs of dates and mean temperatures. I use <a href="http://www.python.org/dev/peps/pep-3101/">advanced string formatting</a> to print the month, day, and temperature (see the <a href="http://docs.python.org/library/datetime.html#strftime-and-strptime-behavior">datetime documentation</a> for date formatting information). The program now gives the following output:</p>
<pre class="brush: text; gutter: false;">Jan 01: 43.0
Jan 02: 25.0
Jan 03: 20.0
Jan 04: 34.0
...
May 11: 59.0
May 12: 62.0
May 14: 69.0</pre>
<p>Everything looks good, so let&#8217;s get started plotting.</p>
<p><strong>3. Temperature Plot</strong></p>
<p>We&#8217;re going to start with a simple line plot that has the day of the year on the x-axis and the mean temperature for that day on the y-axis. Our plotting function, called <tt>temp_plot</tt>, will take in dates and times, and give us back a <tt>matplotlib</tt> figure object. Here&#8217;s the code:</p>
<pre class="brush: python; gutter: true; first-line: 1;">def temp_plot(dates, mean_temps):

    year_start = datetime(2012, 1, 1)
    days = [(d - year_start).days + 1 for d in dates]

    fig = pyplot.figure()
    pyplot.title('Temperatures in Bloomington 2012')
    pyplot.ylabel('Mean Temperature (F)')
    pyplot.xlabel('Day of Year')

    pyplot.plot(days, mean_temps, marker='o')

    return fig</pre>
<p>We start by computing the day of the year for each date. The <tt>datetime</tt> module lets us subtract dates from each other, producing a <a href="http://docs.python.org/library/datetime.html#timedelta-objects">timedelta</a> object. We subtract each date from January 1st of 2012, adding 1 so that our count will start from 1 instead of 0. The <tt>days</tt> field on a <tt>timedelta</tt> object gives the total number of days (in this case, from January 1st).</p>
<p>Next, we create a new <tt>matplotlib</tt> figure. In between calls to <a href="http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.figure">pyplot.figure</a>, <tt>matplotlib's</tt> plotting functions will draw new plots on top of old ones. We&#8217;ll use this fact to add a trend line to our plot shortly.</p>
<p>After adding a title and some axis labels to our figure, we call <a href="http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.plot">pyplot.plot</a> with our <tt>days</tt> (x values) and <tt>mean_temps</tt> arrays (y values). I&#8217;ve also passed in &#8216;o&#8217; to the optional <tt>marker</tt> parameter so that small circles will be plotted for each data point.</p>
<p>In the main body of the program, we use the <tt>os</tt> module to create a &#8220;plots&#8221; directory (checking if it exists first). Next, we call our <tt>temp_plot</tt> function and then use <a href="http://matplotlib.sourceforge.net/api/figure_api.html#matplotlib.figure.Figure.savefig">savefig</a> to save the figure out to a png file:</p>
<pre class="brush: python; gutter: true; first-line: 1;">data = read_weather('data/weather.csv')
min_temps = data['min temp']
mean_temps = data['mean temp']
max_temps = data['max temp']
dates = [datetime.fromordinal(d) for d in data['timestamp']]
events = data['events']

if not os.path.exists('plots'):
    os.mkdir('plots')

fig = temp_plot(dates, mean_temps)
fig.savefig('plots/day_vs_temp.png')</pre>
<p>Running <tt>$ python plot_data.py</tt> should create a &#8220;plots&#8221; folder and put a file inside called &#8220;day_vs_temp.png&#8221; that looks like this:</p>
<p><a href="http://software-carpentry.org/blog/wp-content/uploads/2012/05/day_vs_temp-5.png"><img src="http://software-carpentry.org/blog/wp-content/uploads/2012/05/day_vs_temp-5.png" alt="" title="day_vs_temp (1)" width="600" class="aligncenter size-full wp-image-165" /></a></p>
<p>Not bad! Let&#8217;s add a trend line to the plot based on a simple linear model of the data.</p>
<p><strong>3.1 Adding a trend line</strong></p>
<p>By using <a href="http://docs.scipy.org/doc/numpy/reference/generated/numpy.polyfit.html">numpy&#8217;s polyfit function</a>, adding a trend line is a snap. This function takes our x and y values (<tt>days</tt> and <tt>mean_temps</tt>), and gives us back a slope and intercept (the final parameter is the degree of the fitted polynomial &#8212; we pass 1 for a linear fit).</p>
<pre class="brush: python; gutter: false; first-line: 1;">slope, intercept = np.polyfit(days, mean_temps, 1)</pre>
<p>Using the slope and intercept, we can plot a trend line by computing &#8220;ideal&#8221; temperatures for each day according to the old <tt>y = mx + b</tt> formula. With our variables below, this will be <tt>ideal_temps = (slope * days) + intercept</tt>. Note that I&#8217;ve changed the <tt>days = ...</tt> line to <tt>days = np.array(...)</tt> so that we can do mathematical operations directly on the array.</p>
<pre class="brush: python; gutter: true; first-line: 1;">def temp_plot(dates, mean_temps):

    year_start = datetime(2012, 1, 1)
    days = np.array([(d - year_start).days + 1 for d in dates])

    fig = pyplot.figure()
    pyplot.title('Temperatures in Bloomington 2012')
    pyplot.ylabel('Mean Temperature (F)')
    pyplot.xlabel('Day of Year')

    pyplot.plot(days, mean_temps, marker='o')

    slope, intercept = np.polyfit(days, mean_temps, 1)
    ideal_temps = intercept + (slope * days)
    r_sq = r_squared(mean_temps, ideal_temps)

    fit_label = 'Linear fit ({0:.2f})'.format(slope)
    pyplot.plot(days, ideal_temps, color='red', linestyle='--', label=fit_label)
    pyplot.annotate('r^2 = {0:.2f}'.format(r_sq), (0.05, 0.9), xycoords='axes fraction')
    pyplot.legend(loc='lower right')

    return fig</pre>
<p>To make the plot a little more useful, I&#8217;ve annotated the plot with the R-squared value of the fit. <a href="http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.annotate">pyplot.annotate</a> lets you put text on the figure in a variety of ways. Here, I&#8217;ve set the <tt>xycoords</tt> parameter to &#8220;axes fraction&#8221; so that <tt>annotate</tt> interprets my coordinates (0.05, 0.9) as fractions between 0 and 1 relative to the figure axes. The (0.05, 0.9) means to place the text horizontally 5% from the y-axis (left) and 90% from the x-axis (bottom).</p>
<p>The final call to <a href="http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.legend">pyplot.legend</a> places a legend on the figure. You <em>must</em> include a <tt>label</tt> parameter on at least one plot object for this to work (I&#8217;ve included it on the trend line <tt>plot</tt> call). By default, the legend will show up in the upper-right corner of the figure. This will get in the way on our current plot, so I moved the figure to the lower-right with the <tt>loc</tt> parameter.</p>
<p>With the changes above, here&#8217;s the new plot:</p>
<p><a href="http://software-carpentry.org/blog/wp-content/uploads/2012/05/day_vs_temp-6.png"><img src="http://software-carpentry.org/blog/wp-content/uploads/2012/05/day_vs_temp-6.png" alt="" title="day_vs_temp (2)" width="600" class="aligncenter size-full wp-image-166" /></a></p>
<p>Notice that the string formatting (<tt>{0:.3f}</tt>) has rounded the R-squared value and slope label for us to three decimal places.</p>
<p><strong>3.2 Adding &#8220;error&#8221; bars</strong></p>
<p>Since we also have the min and max temperatures in our data, let&#8217;s add &#8220;error&#8221; bars to our plot to show the temperature range on each day. We&#8217;ll modify <tt>temp_plot</tt> to take in two additional parameters (<tt>min_temps</tt> and <tt>max_temps</tt>), and plot the temperature range if they both have values (i.e., are not <tt>None</tt>).</p>
<p>Adding error bars requires us to use the <a href="http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.errorbar">pyplot.errorbar</a> function instead of <tt>pyplot.plot</tt>. It takes additional parameters (<tt>xerr</tt> and <tt>yerr</tt>) for the x and y errors. We will use <tt>yerr</tt>, and pass in an array with two rows: one for error above each data point, and one for the error below. This array is easily computed by subtracting the max and min temperatures from the mean, and then stacking the two arrays together row-wise with <a href="http://docs.scipy.org/doc/numpy/reference/generated/numpy.vstack.html">numpy.vstack</a>.</p>
<pre class="brush: python; gutter: true; first-line: 1;">def temp_plot(dates, mean_temps, min_temps = None, max_temps = None):

    year_start = datetime(2012, 1, 1)
    days = np.array([(d - year_start).days + 1 for d in dates])

    fig = pyplot.figure()
    pyplot.title('Temperatures in Bloomington 2012')
    pyplot.ylabel('Mean Temperature (F)')
    pyplot.xlabel('Day of Year')

    if (max_temps is None or min_temps is None):
        # Normal plot without error bars
        pyplot.plot(days, mean_temps, marker='o')
    else:
        # Compute min/max temperature difference from the mean
        temp_err = np.row_stack((mean_temps - min_temps,
                                 max_temps - mean_temps))

        # Make line plot with error bars to show temperature range
        pyplot.errorbar(days, mean_temps, marker='o', yerr=temp_err)
        pyplot.title('Temperatures in Bloomington 2012 (max/min)')

    slope, intercept = np.polyfit(days, mean_temps, 1)
    ideal_temps = intercept + (slope * days)
    r_sq = r_squared(mean_temps, ideal_temps)

    fit_label = 'Linear fit ({0:.2f})'.format(slope)
    pyplot.plot(days, ideal_temps, color='red', linestyle='--', label=fit_label)
    pyplot.annotate('r^2 = {0:2f}'.format(r_sq), (0.05, 0.9), xycoords='axes fraction')
    pyplot.legend(loc='lower right')

    return fig

#-------------------------------------------------- 

data = read_weather('data/weather.csv')
min_temps = data['min temp']
mean_temps = data['mean temp']
max_temps = data['max temp']
dates = [datetime.fromordinal(d) for d in data['timestamp']]
events = data['events']

if not os.path.exists('plots'):
    os.mkdir('plots')

# Plot without error bars
fig = temp_plot(dates, mean_temps)
fig.savefig('plots/day_vs_temp.png')

# Plot with error bars
fig = temp_plot(dates, mean_temps, min_temps, max_temps)
fig.savefig('plots/day_vs_temp-all.png')</pre>
<p>The new plot is saved to a file named <tt>day_vs_temp-all.png</tt> and looks like this:</p>
<p><a href="http://software-carpentry.org/blog/wp-content/uploads/2012/05/day_vs_temp-all.png"><img src="http://software-carpentry.org/blog/wp-content/uploads/2012/05/day_vs_temp-all.png" alt="" title="day_vs_temp-all" width="600" class="aligncenter size-full wp-image-167" /></a></p>
<p>If you need to compute standard error for your <tt>errorbar</tt> plot, you can use <a href="http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.sem.html#scipy.stats.sem">scipy.stats.sem</a> from the <a href="http://www.scipy.org/">scipy module</a>.</p>
<p>For our next plot, we&#8217;ll do a multi-part histogram of the weather events for each month.</p>
<p><strong>4. Event Histogram</strong></p>
<p>Histograms in <tt>matplotlib</tt> are generated using the <a href="http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.hist">pyplot.hist</a> function. This function takes an array of data, which can itself contain arrays (for a multi-part histogram). We want to count events per month, so we&#8217;ll need to create an array for each type of event. Inside these arrays will be observations like <tt>[1, 1, 2, 3, 3]</tt> for &#8220;January&#8221;, &#8220;January&#8221;, &#8220;February&#8221;, &#8220;March&#8221;, &#8220;March&#8221;. Here&#8217;s a diagram to help out:</p>
<p><a href="http://software-carpentry.org/blog/wp-content/uploads/2012/05/events.png"><img src="http://software-carpentry.org/blog/wp-content/uploads/2012/05/events.png" alt="" title="event histogram data" width="303" height="441" class="aligncenter size-full wp-image-210" /></a></p>
<p>When <tt>pyplot.hist</tt> receives our data, it will attempt to &#8220;bin&#8221; the month observations automatically. By default, it will break observations into 10 bins. We want a bin for each month instead, and we want the bins aligned properly to the month numbers (1 = January, 2 = February, etc.). The <tt>bins</tt> parameter to <tt>pyplot.hist</tt> takes either a number (representing the desired number of bins) or a sequence (representing the desired bin <em>edges</em>). In the code below, we pass <tt>range(1, 5 + 2)</tt> to ensure that our bins start at 1 (for January) and go <em>through</em> 5 (for May).</p>
<pre class="brush: python; gutter: true; first-line: 1;">def hist_events(dates, events):
    event_months = []

    for i in range(num_events):
        event_months.append([])

    # Build up lists of months where events occurred
    for date, event_str in zip(dates, events):
        if len(event_str) == 0:
            # Skip blank events
            continue

        month = date.month

        # Multiple events in a day are separated by '-'
        for event in event_str.split('-'):
            event_code = event2int(event)
            event_months[event_code].append(month)

    # Plot histogram
    fig = pyplot.figure()
    pyplot.title('Weather Events in Bloomington 2012')
    pyplot.xlabel('Month')
    pyplot.ylabel('Event Count')

    bins = np.arange(1, 5 + 2)
    pyplot.hist(event_months, bins=bins, label=event_types)

    pyplot.legend()

    return fig</pre>
<p>The main body of the program is updated to call <tt>hist_events</tt> and save the resulting figure to <tt>plots/event_histogram.png</tt>.</p>
<pre class="brush: python; gutter: true; first-line: 1;">data = read_weather('data/weather.csv')
min_temps = data['min temp']
mean_temps = data['mean temp']
max_temps = data['max temp']
dates = [datetime.fromordinal(d) for d in data['timestamp']]
events = data['events']

if not os.path.exists('plots'):
    os.mkdir('plots')

fig = temp_plot(dates, mean_temps)
fig.savefig('plots/day_vs_temp.png')

fig = temp_plot(dates, mean_temps, min_temps, max_temps)
fig.savefig('plots/day_vs_temp-all.png')

fig = hist_events(dates, events)
fig.savefig(os.path.join('plots', 'event_histogram.png'))</pre>
<p>When we run <tt>$ python plot_data.py</tt>, the new plot looks like this:</p>
<p><a href="http://software-carpentry.org/blog/wp-content/uploads/2012/05/event_histogram-8.png"><img src="http://software-carpentry.org/blog/wp-content/uploads/2012/05/event_histogram-8.png" alt="" title="event_histogram (1)" width="600" class="aligncenter size-full wp-image-168" /></a></p>
<p>Each collection of bars represents a month, and the individual bars represent the number of Rain, Thunderstorm, etc. events observed for that month. The figure&#8217;s legend was populated by passing <tt>event_types</tt> in for the <tt>label</tt> parameter of <tt>pyplot.hist</tt>.</p>
<p>The plot looks good, but it would be nice to properly label the months. We could do this manually with <a href="http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.xticks">pyplot.xticks</a> as follows:</p>
<pre class="brush: python; gutter: false; first-line: 1;">pyplot.xticks( (1.5, 2.5, 3.5, 4.5, 5.5), ('January', 'February', 'March', 'April', 'May') )</pre>
<p>This will label each bin in the center (hence the .5 added to each number) with the proper month name. If our data grows to include more months, however, we&#8217;ll have to manually extend the number of bins and our labels. Let&#8217;s change <tt>hist_events</tt> to keep track of the range of months in the data. Additionally, we&#8217;ll use Python&#8217;s <a href="http://docs.python.org/library/calendar.html">calendar</a> module to automatically get the month names.</p>
<p>At the top of the program, we&#8217;ll import the <tt>calendar</tt> module:</p>
<pre class="brush: python; gutter: false; first-line: 1;">import calendar</pre>
<p>and then redefine <tt>hist_events</tt> as follows:</p>
<pre class="brush: python; gutter: true; first-line: 1;">def hist_events(dates, events):
    event_months = []

    for i in range(num_events):
        event_months.append([])

    # Build up lists of months where events occurred
    min_month = 13
    max_month = 0

    for date, event_str in zip(dates, events):
        if len(event_str) == 0:
            # Skip blank events
            continue

        month = date.month
        min_month = min(month, min_month)
        max_month = max(month, max_month)

        # Multiple events in a day are separated by '-'
        for event in event_str.split('-'):
            event_code = event2int(event)
            event_months[event_code].append(month)

    # Plot histogram
    fig = pyplot.figure()
    pyplot.title('Weather Events in Bloomington 2012')
    pyplot.xlabel('Month')
    pyplot.ylabel('Event Count')
    pyplot.axes().yaxis.grid()

    num_months = max_month - min_month + 1;
    bins = np.arange(1, num_months + 2)  # Bin edges
    pyplot.hist(event_months, bins=bins, label=event_types)

    # Align month labels to bin centers
    month_names = calendar.month_name[min_month:max_month+1]
    pyplot.xticks(bins + 0.5, month_names)

    pyplot.legend()

    return fig</pre>
<p>During the process of building our observation arrays, we now track the minimum and maximum months observed. This allows us to automatically create our bin edges, and let&#8217;s us grab months names from the <tt>calendar</tt> module by indexing into the <tt>calendar.month_name</tt> list. </p>
<p>Note that the <tt>bins</tt> variables was created using <a href="http://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html">numpy.arange</a>, which is a shortcut for <tt>bins = numpy.array(range(1, num_months + 2))</tt>. Making <tt>bins</tt> a <tt>numpy</tt> array lets us call <tt>pyplot.xticks</tt> with <tt>bins + 0.5</tt>, centering <tt>month_names</tt> on each bin.</p>
<p>As a bonus, I&#8217;ve also added a horizontal grid using the <a href="http://matplotlib.sourceforge.net/api/axis_api.html#matplotlib.axis.Axis.grid">axis.grid</a> function. You can add both a horizontal and vertical grid at the same time by calling <a href="http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.grid">pyplot.grid</a>.</p>
<p>Here&#8217;s the updated plot:</p>
<p><a href="http://software-carpentry.org/blog/wp-content/uploads/2012/05/event_histogram.png"><img src="http://software-carpentry.org/blog/wp-content/uploads/2012/05/event_histogram.png" alt="" title="event_histogram (2)" width="600" class="aligncenter size-full wp-image-169" /></a></p>
<p>Looks ready for publication!</p>
<p>[<a href="http://software-carpentry.org/blog/wp-content/uploads/2012/05/weather_exercise.zip">Code and Data</a>]</p>
]]></content:encoded>
			<wfw:commentRss>http://software-carpentry.org/2012/05/an-exercise-with-matplotlib-and-numpy/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What&#8217;s Wrong With All This?</title>
		<link>http://software-carpentry.org/2012/05/whats-wrong-with-all-this/</link>
		<comments>http://software-carpentry.org/2012/05/whats-wrong-with-all-this/#comments</comments>
		<pubDate>Sun, 20 May 2012 17:59:25 +0000</pubDate>
		<dc:creator>Greg Wilson</dc:creator>
				<category><![CDATA[Opinion]]></category>
		<category><![CDATA[Tooling]]></category>

		<guid isPermaLink="false">http://software-carpentry.org/?p=4849</guid>
		<description><![CDATA[Titus Brown doesn&#8217;t like this web site. He&#8217;s OK with the content (I think), but he finds it awkward to use, and while I don&#8217;t feel as strongly as he does, I accept that we have outgrown WordPress. The question is, what should we use instead? We need a lot more than just a blog [...]]]></description>
			<content:encoded><![CDATA[<p>Titus Brown <a href="https://twitter.com/#!/ctitusbrown/status/200937511999123456">doesn&#8217;t like this web site</a>. He&#8217;s OK with the content (I think), but he finds it awkward to use, and while I don&#8217;t feel as strongly as he does, I accept that we have outgrown WordPress. The question is, what should we use instead? We need a lot more than just a blog and some static web pages, but learning management systems like <a href="http://moodle.org/">Moodle</a> weren&#8217;t built with our ad hoc model in mind (they&#8217;re really teaching administration systems), and newer tools like <a href="http://p2pu.org">P2PU</a> feel like a step backward. I started thinking about <a href="/2012/04/behind-the-scenes-or-the-ethics-of-cultivating-discontent/">requirements for a replacement</a> back in April, but got distracted. Here&#8217;s a longer look.</p>
<p><span id="more-4849"></span></p>
<h2>Who are we?</h2>
<ul>
<li>A <em>learner</em> learns new tools and skills.</li>
<li>A <em>tutor</em> passes on their knowledge.</li>
<li>A <em>workshop host</em> organizes and runs a boot camp.</li>
<li>An <em>author</em> creates content (lessons, blog posts, exercises, etc.).</li>
<li>An <em>admin</em> manages the web site.</li>
<li><em>Innocent bystanders</em> watch and comment from the sidelines <img src='http://software-carpentry.org/blog/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </li>
</ul>
<p>An individual might assume any of these roles at different times or in different contexts. For example, workshop hosts are often tutors, a tutor for one topic may be a learner for another, authors are often admins and vice versa, etc.</p>
<h2>What do we do?</h2>
<ul>
<li>A <em>workshop</em> is a live event, typically running all day for two days. A workshop is made up of several <em>lessons</em>, which may use the content we have online (or at least improvise around it), but which usually remix the order.</li>
<li>A <em>course</em> is a slower-paced event, typically running for a few hours once a week for several weeks. Courses use our online material (or don&#8217;t) just like workshops.</li>
<li>A <em>tutorial</em> is an ad hoc real-time session with one tutor and several learners. Tutorials can be online or live.</li>
<li>A <em>help session</em> is an ad hoc session between one tutor and one learner. Help sessions can be online or live.</li>
<li>A <em>content jam</em> is a live get-together to create or update content. We haven&#8217;t actually had one of these yet, but I&#8217;m hopeful&#8230;</li>
</ul>
<h2>What do we use to interact?</h2>
<ul>
<li><a href="http://www.skype.com">Skype</a> and desktop sharing for real-time online events. We&#8217;ve been using <a href="http://www.bluejeans.com">BlueJeans</a> for one-to-many tutorials; it works pretty well, but doesn&#8217;t seem to allow people to use Skype text chat while in a session, and we&#8217;ve never been able to make recording work. It&#8217;s also very expensive, but cheaper alternatives (WebEx, Google+ hangouts) haven&#8217;t scaled as well.</li>
<li>Our <a href="/blog/">WordPress blog</a>. We manually echo posts to <a href="https://twitter.com/#!/swcarpentry">Twitter</a>.</li>
<li>Twitter (tweets aren&#8217;t currently archived on our site, but should be).</li>
<li>Web pages in our WordPress site. This includes the online course material, ads for workshops, and a few bits of advertising.</li>
<li>Comments on the WordPress material. (People have suggested adding forums as well, but I don&#8217;t believe there would be enough traffic, and we all have too many places to pay attention to already.)</li>
<li>Videos (hosted at <a href="http://www.youtube.com/user/softwarecarpentry/feed">YouTube</a>, embedded in the WordPress site).</li>
<li>Point-to-point email. (This is usually from and to people&#8217;s personal accounts, so it isn&#8217;t archived.)</li>
<li>Our own mailing lists: one for workshop organizers and content developers, and others for various regions and workshops. These <em>are</em> archived, but since we use <a href="http://www.gnu.org/software/mailman/index.html">MailMan</a>, they&#8217;re not integrated with WordPress. (I&#8217;ve experimented with various mailing list plugins for WordPress, and haven&#8217;t been impressed by any of them.) We manage these lists through the <a href="http://dreamhost.com">Dreamhost</a> control panel.</li>
<li><a href="http://subversion.tigris.org/">Subversion</a>. We have a <a href="http://svn.software-carpentry.org/swc">publicly-readable repository</a> for the course material and a members-only repository for administrative stuff like grant proposals. We also set up one repository for each workshop group, which we keep live for a couple of months. We also manage repos through the Dreamhost control panel, but there&#8217;s no way to automatically keep their membership and permissions in sync with the group mailing lists.</li>
<li><a href="http://www.eventbrite.com">EventBrite</a> for event registration. We link to <a href="http://www.eventbrite.com/rss/user_list_events/16679544193">EventBrite sign-up pages</a> for events from the corresponding WordPress pages, but the linkage is done manually. EventBrite also gives us a mailing list for each event; we should use these to contact workshop participants immediately before and after workshops rather than our MailMan lists.</li>
<li>Google Calendar and Google Maps to show <a href="/calendar/">when and where upcoming workshops are</a>. Our calendar and map are linked into a page on the WordPress site, but updates have to be done manually. In particular, we have to remember to add events to both the calendar and the map, and when an event is over, we have to change the map as well as moving the event&#8217;s page to the &#8220;past&#8221; section of the site.</li>
<li><a href="http://www.doodle.com">Doodle</a> to schedule tutorials.</li>
</ul>
<p>One thing we <em>don&#8217;t</em> have yet is <a href="/2012/02/badges-finalized/">badges</a>. We&#8217;d like to issue these to people who have taken part in workshops and the follow-up tutorials (i.e., our &#8220;graduates&#8221;), and also to instructors and content creators. The <a href="http://openbadges.org/en-US/">Open Badges</a> team is working on a WordPress plugin to do this, which we hope to deploy in June.</p>
<h2>How do we interact?</h2>
<ul>
<li>Synchronously, i.e., taking part in or delivering workshops, courses, tutorials, help sessions, and content jams, both live and online.</li>
<li>Scheduling events using Doodle.</li>
<li>Registering for (and unregistering from) events using EventBrite.</li>
<li>Advertising events using MailMan lists, the blog, and Twitter.</li>
<li>Updating people on changes to workshops and courses using MailMan lists and EventBrite lists.</li>
<li>Writing blog posts.</li>
<li>Writing pages.</li>
<li>Commenting on blog posts and pages.</li>
<li>Tweeting.</li>
<li>Creating or updating content in the main Subversion repository (and then updating the web site if needed).</li>
<li>Creating and uploading videos, and then linking to them in a blog post or from a page.</li>
<li>Discussing things on the &#8220;dev&#8221; list. There&#8217;s almost never discussion on the per-workshop lists: I feel like there should be (or should be forums or something), but help sites need critical mass, and I doubt we&#8217;ll ever have it, so I&#8217;d rather put energy into teaching people <a href="http://www.ploscompbiol.org/article/info:doi%2F10.1371%2Fjournal.pcbi.1002202">how to use existing online Q&amp;A sites well</a>.</li>
<li>Giving feedback about events. Right now, we collect good and bad points from <a href="/2012/03/toronto-boot-camp-february-2012-how-we-did/">people</a> <a href="/2012/03/our-indiana-u-workshop-went-well/">at</a> <a href="/2012/03/the-trieste-workshop-one-week-later/">the</a> <a href="/2012/03/wrapping-up-the-stsci-course/">end</a> <a href="/2012/03/wrapping-up-mbari-workshop/">of</a> <a href="/2012/03/wrapping-up-in-oakland/">every</a> <a href="/2012/04/lessons-learned-at-the-university-of-chicago/">workshop</a>, <a href="/2012/04/utah-state-university-wrap-up/">then</a> <a href="/2012/05/the-good-and-the-bad-of-it/">post</a> <a href="/2012/05/ucl-bootcamp-version-control-wrap-up/">them</a> <a href="/2012/05/feedback-from-michigan-state/">to</a> <a href="/2012/05/feedback-from-newcastle-upon-tyne/">the</a> <a href="/2012/05/feedback-from-alberta/">blog</a>. We really need to collect feedback on tutorials, and to follow up with people <a href="/2012/04/three-years-later/">months or years later</a>.</li>
</ul>
<h2>What&#8217;s wrong with all this?</h2>
<ul>
<li><em>Speed</em> and <em>design</em>: the existing web site is slooooow, and no one would call the existing site beautiful&#8230;</li>
<li><em>Identity</em>: scheduling is separate from registration is separate from the mailing lists and from repositories. Badging will only make that more complicated. <a href="http://www.mozilla.org/en-US/persona/">Mozilla Persona</a> (formerly <a href="https://browserid.org/">BrowserID</a>, and not the same thing as <a href="http://openid.net/">OpenID</a>—are you confused yet?) isn&#8217;t a complete solution: it handles authentication, but not authorization, and &#8220;who can do what?&#8221; is an authorization issue. <a href="http://oauth.net/">OAuth</a> is supposed to take care of the latter, but it it&#8217;s a looong way from meeting our needs.</li>
<li><em>Integration</em>: connecting our blog to Twitter would be easy—I just haven&#8217;t bothered to set it up. But tweets should be archived on the web site (both the ones we make and mentions of us), the mailing list archives should be integrated into the site, and so on. Again, there&#8217;s a lot more to this than just managing identities.</li>
<li><em>Features</em>: I&#8217;d like a <a href="/2012/03/how-were-doing/">live</a> <a href="/2012/05/space-at-upcoming-events/">table</a> of registration stats (how many people have signed up for all upcoming events, and how many tickets remain) on the web site, but EventBrite doesn&#8217;t have embeddable HTML for that. I&#8217;d also like a person-by-list table showing who&#8217;s on which mailing list, and who has access to which repository, but Dreamhost and MailMan don&#8217;t offer that. And I&#8217;d like the colors of map pins to change automatically once a workshop is over, but—you get the picture. All of these things can be fixed with the right glue code, but I have bigger <a href="http://en.wiktionary.org/wiki/yak_shaving">yaks to shave</a>.</li>
<li><em>Conversation</em>: the most important missing element is regular back-and-forth with the people we&#8217;re trying to help. Again, I think that our goal should be to get them onto existing Q&amp;A sites like Stack Overflow; in particular, we should <a href="/2012/02/stack-underflow/">help them feel confident enough</a> to hang out there, so they don&#8217;t become part of the <a href="/2012/03/the-dark-matter-of-computational-science/">dark matter of computational science</a>.</li>
</ul>
<h2>What do I want?</h2>
<p>I&#8217;ve written before about the idea of a <a href="/2012/04/github-for-education/">GitHub for education</a>, but that wouldn&#8217;t address all of the issues laid out above. (Event registration, for example, doesn&#8217;t feel like a GitHub kind of thing; nor does scheduling tutorials.) If we had a truly programmable web, I could hire a summer student to assemble what I want, but that&#8217;s not a yak, it&#8217;s a herd of angry mammoths: managing identities and permissions for MailMan, EventBrite, Subversion, and the blog in a single place would require a <em>lot</em> of hacking (or a time machine—if I could go back to 1999 and persuade the startup I was part of to open source <a href="http://third-bit.com/articles/select-access-2002.pdf">SelectAccess</a>, we&#8217;d be done by now).</p>
<p>So that leaves me looking for an off-the-shelf solution which I don&#8217;t think exists. If I&#8217;m wrong, I&#8217;d welcome a pointer—and if there&#8217;s something we should be doing that isn&#8217;t in the discussion above, I&#8217;d welcome a pointer to that too.</p>
]]></content:encoded>
			<wfw:commentRss>http://software-carpentry.org/2012/05/whats-wrong-with-all-this/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Space at Upcoming Events</title>
		<link>http://software-carpentry.org/2012/05/space-at-upcoming-events/</link>
		<comments>http://software-carpentry.org/2012/05/space-at-upcoming-events/#comments</comments>
		<pubDate>Sat, 19 May 2012 20:24:42 +0000</pubDate>
		<dc:creator>Greg Wilson</dc:creator>
				<category><![CDATA[Boot Camp]]></category>
		<category><![CDATA[Version 5.0]]></category>

		<guid isPermaLink="false">http://software-carpentry.org/?p=4845</guid>
		<description><![CDATA[Here&#8217;s how registration is going for upcoming events: University of British Columbia May 22-23 39/40 Johns Hopkins University June 18-19 7/20 Paris June 28-29 9/25 Boston July 9-10 23/40 University of Waterloo July 12-13 1/40 Halifax July 16-17 8/40 University of Toronto (Scarborough) July 19-20 14/40 If you&#8217;d like to join us, there&#8217;s still plenty [...]]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s how registration is going for upcoming events:</p>
<table>
<tbody>
<tr>
<td>University of British Columbia</td>
<td>May 22-23</td>
<td>39/40</td>
</tr>
<tr>
<td>Johns Hopkins University</td>
<td>June 18-19</td>
<td>7/20</td>
</tr>
<tr>
<td>Paris</td>
<td>June 28-29</td>
<td>9/25</td>
</tr>
<tr>
<td>Boston</td>
<td>July 9-10</td>
<td>23/40</td>
</tr>
<tr>
<td>University of Waterloo</td>
<td>July 12-13</td>
<td>1/40</td>
</tr>
<tr>
<td>Halifax</td>
<td>July 16-17</td>
<td>8/40</td>
</tr>
<tr>
<td>University of Toronto (Scarborough)</td>
<td>July 19-20</td>
<td>14/40</td>
</tr>
</tbody>
</table>
<p>If you&#8217;d like to join us, there&#8217;s still plenty of space—and if you have friends who could use some training in basic software skills, please point them our way.</p>
]]></content:encoded>
			<wfw:commentRss>http://software-carpentry.org/2012/05/space-at-upcoming-events/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The Most Important Scientific Result Published in the Last Year</title>
		<link>http://software-carpentry.org/2012/05/the-most-important-scientific-result-published-in-the-last-year/</link>
		<comments>http://software-carpentry.org/2012/05/the-most-important-scientific-result-published-in-the-last-year/#comments</comments>
		<pubDate>Fri, 18 May 2012 15:02:46 +0000</pubDate>
		<dc:creator>Greg Wilson</dc:creator>
				<category><![CDATA[Noticed]]></category>

		<guid isPermaLink="false">http://software-carpentry.org/?p=4833</guid>
		<description><![CDATA[J.M. Wicherts, M. Bakker, and D. Molenaar: &#8220;Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results&#8220;. PLoS ONE, 6(11): e26828, 2011, doi:10.1371/journal.pone.0026828. Background The widespread reluctance to share published research data is often hypothesized to be due to the authors&#8217; fear that reanalysis [...]]]></description>
			<content:encoded><![CDATA[<p>J.M. Wicherts, M. Bakker, and D. Molenaar: &#8220;<a href="http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0026828">Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results</a>&#8220;. <cite>PLoS ONE</cite>, 6(11): e26828, 2011, doi:10.1371/journal.pone.0026828.</p>
<blockquote><p><strong>Background</strong></p>
<p>The widespread reluctance to share published research data is often hypothesized to be due to the authors&#8217; fear that reanalysis may expose errors in their work or may produce conclusions that contradict their own. However, these hypotheses have not previously been studied systematically.</p>
<p><strong>Methods and Findings</strong></p>
<p>We related the reluctance to share research data for reanalysis to 1148 statistically significant results reported in 49 papers published in two major psychology journals. We found the reluctance to share data to be associated with weaker evidence (against the null hypothesis of no effect) and a higher prevalence of apparent errors in the reporting of statistical results. The unwillingness to share data was particularly clear when reporting errors had a bearing on statistical significance.</p>
<p><strong>Conclusions</strong></p>
<p>Our findings on the basis of psychological papers suggest that statistical results are particularly hard to verify when reanalysis is more likely to lead to contrasting conclusions. This highlights the importance of establishing mandatory data archiving policies.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://software-carpentry.org/2012/05/the-most-important-scientific-result-published-in-the-last-year/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Feedback from Alberta</title>
		<link>http://software-carpentry.org/2012/05/feedback-from-alberta/</link>
		<comments>http://software-carpentry.org/2012/05/feedback-from-alberta/#comments</comments>
		<pubDate>Fri, 18 May 2012 02:14:08 +0000</pubDate>
		<dc:creator>Greg Wilson</dc:creator>
				<category><![CDATA[Boot Camp]]></category>
		<category><![CDATA[University of Alberta]]></category>
		<category><![CDATA[Version 5.0]]></category>

		<guid isPermaLink="false">http://software-carpentry.org/?p=4832</guid>
		<description><![CDATA[Our two-day workshop at the University of Alberta wound up a couple of hours ago. We had quite a few no-shows this time (which was annoying, given how many people were waitlisted), but those who did come seemed to get a lot out of it: Good Bad Room Mix of talking &#38; doing Stickies Version [...]]]></description>
			<content:encoded><![CDATA[<p>Our two-day workshop at the University of Alberta wound up a couple of hours ago.  We had quite a few no-shows this time (which was annoying, given how many people were waitlisted), but those who did come seemed to get a lot out of it:</p>
<table>
<tr>
<td><strong>Good</strong></td>
<td><strong>Bad</strong></td>
</tr>
<tr>
<td valign="top">
<ul>
<li>Room</li>
<li>Mix of talking &amp; doing</li>
<li>Stickies</li>
<li>Version control</li>
<li>Hands on</li>
<li>Link on online video</li>
<li>Python</li>
<li>Clear speaking</li>
<li>Computer in lab (using linux)</li>
<li>Automatic versioning</li>
<li>Programming in windows in Cygwin</li>
<li>Philosophy</li>
<li>Discussion of productivity</li>
<li>Good reading suggestions</li>
<li>Functional programming</li>
<li>Overall workflow</li>
<li>I feel more competent (morale boost)</li>
<li>Researched anectodes, backed with data</li>
<li>Website</li>
<li>TDD</li>
<li>Instructor&#8217;s body language</li>
<li>Helpers</li>
</ul>
</td>
<td valign="top">
<ul>
<li>Coffee hard</li>
<li>Need more projectors</li>
<li>Having to keep stickies</li>
<li>No testing</li>
<li>Not enough depth</li>
<li>Not convinced about version control</li>
<li>Too fast on day 1, too slow on day 2</li>
<li>Need levels</li>
<li>Came late</li>
<li>Not enough Python</li>
<li>No lunch</li>
<li>No time for notes</li>
<li>More version control</li>
<li>Too short break</li>
<li>No shows</li>
<li>Pace (a little fast)</li>
<li>Supervisor wasn&#8217;t here (need to convince her)</li>
<li>Where is the code (dropbox?)</li>
<li>Bad chairs</li>
<li>Windows alienation</li>
<li>Mailing list</li>
<li>Making DB (no info)</li>
</ul>
</td>
</tr>
</table>
<p>Many thanks to Rose, Neil, and Paul for making it possible.</p>
]]></content:encoded>
			<wfw:commentRss>http://software-carpentry.org/2012/05/feedback-from-alberta/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Halifax in July</title>
		<link>http://software-carpentry.org/2012/05/halifax-in-july/</link>
		<comments>http://software-carpentry.org/2012/05/halifax-in-july/#comments</comments>
		<pubDate>Thu, 17 May 2012 02:07:10 +0000</pubDate>
		<dc:creator>Greg Wilson</dc:creator>
				<category><![CDATA[Boot Camp]]></category>
		<category><![CDATA[Halifax]]></category>
		<category><![CDATA[Version 5.0]]></category>

		<guid isPermaLink="false">http://software-carpentry.org/?p=4831</guid>
		<description><![CDATA[We have just added another workshop to the summer&#8217;s list, this one at Saint Mary&#8217;s University in Halifax, Nova Scotia, on July 16-17. Please let friends and colleagues know—I look forward to meeting them.]]></description>
			<content:encoded><![CDATA[<p>We have just added another workshop to the summer&#8217;s list, this one at <a href="/boot-camps/halifax-july-2012/">Saint Mary&#8217;s University in Halifax, Nova Scotia, on July 16-17</a>. Please let friends and colleagues know—I look forward to meeting them.</p>
]]></content:encoded>
			<wfw:commentRss>http://software-carpentry.org/2012/05/halifax-in-july/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>And One More: Johns Hopkins in June</title>
		<link>http://software-carpentry.org/2012/05/and-one-more-johns-hopkins-in-june/</link>
		<comments>http://software-carpentry.org/2012/05/and-one-more-johns-hopkins-in-june/#comments</comments>
		<pubDate>Wed, 16 May 2012 02:58:49 +0000</pubDate>
		<dc:creator>Greg Wilson</dc:creator>
				<category><![CDATA[Boot Camp]]></category>
		<category><![CDATA[Johns Hopkins University]]></category>
		<category><![CDATA[Version 5.0]]></category>

		<guid isPermaLink="false">http://software-carpentry.org/?p=4826</guid>
		<description><![CDATA[We&#8217;re pleased to announce that we will be running a two-day boot camp at Johns Hopkins University in Baltimore on June 18-19, 2012. We only have space for 20 participants, so please register early.]]></description>
			<content:encoded><![CDATA[<p>We&#8217;re pleased to announce that we will be running a two-day boot camp at <a href="/boot-camps/johns-hopkins-university-june-2012/">Johns Hopkins University</a> in Baltimore on June 18-19, 2012. We only have space for 20 participants, so please register early.</p>
]]></content:encoded>
			<wfw:commentRss>http://software-carpentry.org/2012/05/and-one-more-johns-hopkins-in-june/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Feedback from Newcastle upon Tyne</title>
		<link>http://software-carpentry.org/2012/05/feedback-from-newcastle-upon-tyne/</link>
		<comments>http://software-carpentry.org/2012/05/feedback-from-newcastle-upon-tyne/#comments</comments>
		<pubDate>Tue, 15 May 2012 18:30:53 +0000</pubDate>
		<dc:creator>Chris Cannam</dc:creator>
				<category><![CDATA[Boot Camp]]></category>

		<guid isPermaLink="false">http://software-carpentry.org/?p=4821</guid>
		<description><![CDATA[This week&#8217;s Newcastle bootcamp, organised by the Digital Institute at Newcastle University with the Software Sustainability Institute and SoundSoftware, was the first Software Carpentry boot camp run entirely locally in the UK. For the organisers it was a slightly nervous experience, hoping we could get the material to hold together in presentation without Greg&#8217;s experience [...]]]></description>
			<content:encoded><![CDATA[<p>This week&#8217;s <a href="http://software-carpentry.org/boot-camps/newcastle-university-may-2012/">Newcastle bootcamp</a>, organised by the <a href="http://digitalinstitute.ncl.ac.uk/">Digital Institute</a> at Newcastle University with the <a href="http://software.ac.uk">Software Sustainability Institute</a> and <a href="http://soundsoftware.ac.uk">SoundSoftware</a>, was the first Software Carpentry boot camp run entirely locally in the UK. For the organisers it was a slightly nervous experience, hoping we could get the material to hold together in presentation without Greg&#8217;s experience at hand.</p>
<p>Feedback from the learners was generally good on the material, the venue and the structure. The most common complaint was that it was hard to follow along at times, and I think there are several areas where we&#8217;ll be able to improve the &#8220;flow&#8221; for future events.</p>
<p>Notably, this was the first bootcamp I&#8217;ve attended at which nobody found the room too crowded or the wrong temperature. Result, Newcastle!</p>
<p>Here are the good and bad feedback points. Some points were close duplicates, and I&#8217;ve put the additional ones in brackets (e.g. Python was cited three times).</p>
<table>
<tbody>
<tr>
<td>Good</td>
<td>Bad</td>
</tr>
<tr>
<td>
<ul>
<li>Python<br />(+ Choice of Python as easy scripting language)<br />(+ Gives me confidence to start using Python)</li>
<li>Use of coloured sticky notes<br />(+ coloured notes as an unobtrusive way to request help)</li>
<li>The &#8220;Bringing it together&#8221; section</li>
<li>Good mix of content</li>
<li>Version control<br />(+ integration with Bitbucket)<br />(+ version control tips e.g. archive, bisect)<br />(+ use of recipes as version control material)</li>
<li>Coding along with the presenters</li>
<li>Lots of helpers</li>
<li>Good temperature in room, open window</li>
<li>Arrangement of room into groups for collaborative work</li>
<li>Self-guided exercises spaced out through the presentations</li>
<li>Easy to ask the helpers for help</li>
<li>Use of open source software</li>
<li>Test-driven development</li>
<li>Online lecture content to back up teaching</li>
<li>Lots of breaks</li>
<li>Good course description</li>
<li>Inclusion of general advice for coding (as opposed to specific syntax)</li>
<li>SQL</li>
</ul>
</td>
<td>
<ul>
<li>Felt like we ran out of time at end of first day</li>
<li>Would have liked more about testing</li>
<li>Cygwin</li>
<li>Sometimes problem material got in the way of the subject<br />
<em>(more time worrying about overlapping rectangles than how to<br />
program a test)</em></li>
<li>No handouts, and screens difficult to read as forgotten my glasses</li>
<li>Should have introduced Python lists and other structures earlier<br />
<em>(presenters forgot to do this before using them in an exercise!)</em></li>
<li>Not enough window real-estate</li>
<li>Couldn&#8217;t always follow material before it disappeared off screen</li>
<li>Presenters sometimes forgot we were not necessarily interested in software engineering</li>
<li>Pace too intense for non-expert programmers</li>
<li>Interrupted by fire alarm</li>
<li>Coloured notes would have worked better in the other order<br />
<em>(that is, holding up &#8220;not OK&#8221; first &#8212; didn&#8217;t always dare if everyone else had just held up &#8220;OK&#8221;)</em></li>
<li>More use of microphones</li>
<li>Went a bit fast</li>
<li>Half the class was facing back wall!</li>
<li>Would have liked some harder exercises</li>
<li>More consistency of laptop presentation<br /><i>(i.e. always same laptop with same window layout)</i></li>
<li>Shell scripting section a little easy</li>
<li>Didn&#8217;t always notice when a presenter had started typing, they should read it out</li>
<li>More pointers to additional material online please</li>
<li>Some exercises had too much literal typing</li>
<li><em>(from a presenter)</em> Would like to have improved the presentation of functions</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>You can find links to the material we used on the <a href="http://software-carpentry.org/boot-camps/newcastle-university-may-2012/">page about the bootcamp</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://software-carpentry.org/2012/05/feedback-from-newcastle-upon-tyne/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic page generated in 0.844 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2012-05-22 13:56:29 -->

