Up: Lectures

Regular Expressions

September 18th, 2011 Leave a comment Go to comments

These episodes introduce regular expressions, a powerful set of tools for manipulating text.

Requires: data processing patterns

Introduces: regular expressions

Problem: Converting More Complex Data Files

Your supervisor wants you to convert some measurements of background evil levels in the Shire after the explosion of the Death Star into a uniform representation. The problem is, those files were typed in by hand by several different graduate students, and are in several slightly different formats. For example, some files use commas as field separators, while others use spaces, and still others use a mix of both. Similarly, some record the dates of observations as “2003-08-05″, while others use “Aug 5, 2008″, and so on.

  1. Introduction (pdf, ppt)
    • Motivating problem
    • Matching with ‘or’
    • Precedence
    • Extracting data with groups
    • Wildcards: ‘.’
  2. Operators (pdf, ppt)
    • Procedural vs. declarative programming
    • Zero or more: ‘*’
    • One or more: ‘+’
    • Zero or one: ‘?’
    • Enumerated matches: ‘{M,N}’
    • Character sets: ‘[...]‘
  3. Mechanics (pdf, ppt)
    • Finite state machines
    • Limits of regular expressions
  4. Patterns (pdf, ppt)
    • Using multiple regular expressions together
    • Escape sequences: ‘\’
    • Translation from text to string to regular expression
    • Abbreviations: ‘\s’, ‘\d’, ‘\w’, ‘\S’, and ‘\W’
    • Pseudo-characters: ‘^’, ‘$’, and ‘\b’
  5. More Tools (pdf, ppt)
    • New example: extracting citation labels
    • Negating character sets with ‘[^...]‘
    • Using re.findall instead of re.search
    • Using re.split
    • Compiling regular expressions

Reading

  1. No comments yet.