Databases
This lecture is a quick introduction to databases and SQL, the language used to get information out of databases.
Requires: Nothing
Introduces: Databases
Motivating questions:
- My data is in table format across several related tables. How can I explore and analyze it?
- I have a large amount of data that I want to access efficiently, how do I do that?
Lectures (using the SQLite database engine):
- Introduction
- A database is a way to store and manipulate information that is arranged as tables.
- You interact with a databases by posing queries (in SQL).
- There are six basic operations which you can use to extract or transform your data.
- Selecting Data
- The SQL SELECT command is used to extract data from the database.
- You can list column names to extract from a table, or use * to mean all columns.
- Along with column names, you can supply basic calculations on results.
- The DISTINCT keyword removes duplicates from the results.
- Exercises
- Filtering
- The WHERE clause lets you filter rows that match a boolean expression.
- The IN operator selects rows that match a list of values.
- Exercises
- Sorting
- The ORDER BY clause lets you sort the results.
- The DESC or ASC keyword specifies the direction of the sort.
- Exercises
- Aggregation
- The SUM, MAX, MIN, and AVG functions combine values across multiple rows in the results. This is called aggregation.
- The GROUP BY clause lets you combine together rows that match certain conditions.
- You can also filter (WHERE) and sort (ORDER BY) aggregated rows.
- Exercises
- Joining Tables
- The JOIN..ON clause lets you combine data from several tables into a single result.
- Exercises
- Missing Data
- Missing data can be represented by a special value in a database: NULL.
- Filtering, joining, and aggregating data with NULL values takes special attention.
- Exercises
- Nested Queries
- Queries can operate on the results of other queries by nesting one query in another.
- This allows you break down complex queries into simpler queries.
- Queries can be nested in the FROM or the WHERE clauses.
- Exercises
Exercises:
Lectures (using Microsoft Access as the database engine)
- Selecting Data with Microsoft Access
- Select columns by adding names to query, or use * to retrieve all of the columns.
- Set “Distinct Values” to “true” to eliminate duplicates.
- Use expressions to generate calculated columns.
- Filtering Data with Microsoft Access
- The criteria row lets you filter rows that match a boolean expression.
- The IN operator selects rows that match a list of values.
- Sorting Data with Microsoft Access
- The Sort option for each column lets you sort the results.
- Sorting by columns that aren’t displayed.
- Aggregating Data with Microsoft Access
- Totals let you count values and calculate sums, averages, minimum and maximum values
- Group By aggregates the data into groups so that you can calculate Totals for each group
- Combining Data with Microsoft Access
- Joins let you combine data from multiple tables
- Handling Missing Data with Microsoft Access
- Missing data can be represented by a special value in a database: NULL.
- Filtering, joining, and aggregating data with NULL values takes special attention.
- Nested Queries with Microsoft Access
- Queries can operate on the results of other queries by nesting one query in another.
- This allows you break down complex queries into simpler queries.
