We define a new approach to speech recognition based on auditory perception and modeled after the human brain's tendency to automatically categorize speech sounds [House 1962; Liberman 1957]. As background, today's speech recognition systems are knowledge-driven since they require the existence of word and syntax-level knowledge to identify a word from the sound. In contrast, our system uses no higher-level knowledge. Its architecture consists of competing parallel detectors which in real-time identify phonemes in the analog waveform. Each detector, which is a simple algorithm, continuously samples the sound and reports the degree to which the samples contain its designated phoneme. The phoneme detector with the highest precedence and the greatest certainty above a minimal threshold prevails and its phoneme is added to an output queue. In preliminary experiments, four such detectors were tested and they properly identified 83-100% of their designated phonemes in both discrete and continuous speech, independent of the speaker, suggesting that an overall system which incorporates our approach would be much more robust and flexible than traditional systems.
Speech Perception using Real-time Phoneme Detection,
the BeBe System (with Patrick Thompson). Massachusetts
Institute of Technology, Laboratory for Computer Science:
Tech Report MIT-LCS-TR-736. 1998.
Postscript file (4 MB)
Postscript file, Compressed (335 KB)
Click here for Latanya Sweeney's Home Page
Last modified 3/6/97 by email@example.com