Sprees, a Finite-State Orthographic Learning System that Recognizes and Generates Phonologically Similar Spellings

by Latanya Sweeney

Abstract

We present a computer learning program named Sprees that "learns" by building a deterministic finite state automaton that represents orthographic generalizations of the spellings it encounters. It then uses the automaton to recognize and generate phonologically similar spellings beyond those in the initial training set. The saying, "i before e except after c," demonstrates the kind of spelling regularity Sprees would learn after being exposed to words like "receive" and "thief."

The Scrub program, which locates and replaces personally-identifying information in unrestricted text, uses Sprees for the recognition of unknown words and for generating made-up alternatives that sound like the original sensitive words.

Publications

  • Sweeney, L. Sprees, a Finite-State Orthographic Learning System that Recognizes and Generates Phonologically Similar Spellings. MIT Masters Thesis. Cambridge: May 1997.
    Postscript file (948 KB)
    Condensed paper version, postscript (400 KB)


    Click here for:
  • Latanya Sweeney's Home Page
  • Scrub program, an application of Sprees
  • Selected publications by Latanya Sweeney

    Last modified 8/15/97 by sweeney@mit.edu