We present a computer learning program named Sprees that "learns" by building a deterministic finite state automaton that represents orthographic generalizations of the spellings it encounters. It then uses the automaton to recognize and generate phonologically similar spellings beyond those in the initial training set. The saying, "i before e except after c," demonstrates the kind of spelling regularity Sprees would learn after being exposed to words like "receive" and "thief."
The Scrub program, which locates and replaces personally-identifying information in unrestricted text, uses Sprees for the recognition of unknown words and for generating made-up alternatives that sound like the original sensitive words.
Last modified 8/15/97 by sweeney@mit.edu