Up: Databases

Selecting Data

Welcome

  • Welcome to the Software Carpentry screencast on Databases. This
    screencast is on retreiving data from tables in a database.

What is a database?

  • Databases come in many flavours. In these screencasts we will be using
    the SQLite Manager plugin for Firefox to interact with our database.
  • As we’ve seen, a database is a way to store information that is arranged
    as tables.
  • We will be using a database that has a single table named Experiment.
  • This table is a log of all of the work done on experiments in a research
    lab, broken down by project and scientist
  • The table has a column for:
    • the login id of the scientist
    • the name of their project
    • a numeric id for their experiment
    • how many hours they spent on it
    • and when it took place
  • Each row, or record, in this table describes one scientist’s work on a
    certain experiment on a given date.

Getting data out of a database: SELECT queries

  • To start, lets write an SQL query that retrieves the loginID and the
    Project name from the Experiment table.
  • We do that by using the SQL command, SELECT. We then list the columns we
    want to read from the database table.
  • We want the LoginID, and Project name, so we write those column names, and
    then we write FROM, and the name of the table we want the data from,
    Experiment:


    SELECT LoginID, Project FROM Experiment;

  • We put a semi-colon at the end to tell the database this is the end of the
    command.
  • I’ve capitalised the words SELECT, and FROM because they are SQL keywords.
    Capitalisation isn’t necessary, but we’ll continue this throughout the
    screencast so that it is clear what is a keyword and what is a table name
    or field name.
  • When we run the command, it shows us all of the data from the Experiment
    table for the two columns that we asked for: the LoginID, and the name of
    the Project.
  • If wanted the Hours column, we’d just add that to the list of columns in
    the SELECT clause:


    SELECT LoginID, Project, Hours FROM Experiment;

More on the SELECT clause

  • So, After the SELECT command you listed the fields you want returned. You
    can place them in any order. We could writen,


    SELECT Project, LoginID,

    And you can repeat field names if you’d like,


    SELECT Project, LoginID, LoginID, Project

Shortcut:

  • If you want to pull up all of the columns in a table, you can use the
    asterisk, or star, after SELECT. The asterisk means “all of the column
    names” — it is just a shortcut. So, if we run this query,


    SELECT * FROM Experiment;

    we see all of the columns from the Experiment table.

Removing Duplicates:

  • If the there are duplicate rows returned by your query, it is possible to
    remove the duplicates. For example, we can fetch only the Project column
    from the Experiment table,


    SELECT Project FROM Experiment;

    and if we just wanted to know which different Projects the scientists were
    working on , we put DISTINCT keyword right after the SELECT keyword,


    SELECT DISTINCT Project FROM Experiment;

  • This lists all of the projects, but only once.
  • If select more than one column name,


    SELECT DISTINCT Project, LoginID FROM Experiment;

    then only the distinct pairs of projects and login IDs are returned.

Computed Fields:

  • Suppose that 10% of the time spent on each experiment was prep. work which
    needs to be accounted for separately.
  • In our SELECT statement we can add expressions that do computations on
    each row.
  • So, to calculate 10% of the time spent on each experiment, we can add an
    expression to the list of columns in the SELECT field (in this case we’ll
    leave the * there to select all of the columns), but we’ll add the
    expression Hours * .1


    SELECT *, Hours * .1
    FROM Experiment

  • When we run the query, the expression Hours * .1 is evaluated for each row
    and appended to the output.
  • When appending expressions in the SELECT clause, you can use any of the
    fields, all of the arthimetic operators can be used here, as well as a
    certain built-in functions.
  • For instance, We could round these values to the first decimal place by
    using the ROUND function:


    SELECT *, ROUND(Hours * .1, 1)
    FROM Experiment

Conclusion:

  • In this screencast we’ve introduced the very basics of interacting with
    and retrieving data from database.
  • We’ve seen that you can select columns from a table to retreive, use the
    DISTINCT keyword to only return unique rows, as well as append calculated
    columns to output.