Speech Learning

Speech Perception Using Real-Time Phoneme Detection: The BeBe System

by Latanya Sweeney and Patrick Thompson

Abstract

We define a new approach to speech recognition based on auditory perception and modeled after the human brainÕs tendency to automatically categorize speech sounds [House 1962; Liberman 1957]. As background, todayÕs speech recognition systems are knowledge-driven since they require the existence of word and syntax-level knowledge to identify a word from the sound. In contrast, our system uses no higher-level knowledge. Its architecture consists of competing parallel detectors which in real time identify phonemes in the waveform. Each detector, which is a simple algorithm, continuously samples the sound and reports the degree to which the samples contain its designated phoneme. The phoneme detector with the highest precedence and the greatest certainty above a minimal threshold prevails and its phoneme is added to an output queue. In preliminary experiments, four such detectors were tested and they properly identified 83-100% of their designated phonemes in both discrete and continuous speech, independent of the speaker, suggesting that an overall system which incorporates our approach would be much more robust and flexible than traditional systems.
Citation:
Latanya Sweeney and Patrick Thompson. Speech Perception Using Real-Time Phoneme Detection: The BeBe System. Massachusetts Institute of Technology, Laboratory for Computer Science: Tech Report MIT-LCS-TR-736. 1998. (PDF).

Speech Learning

Speech Perception Using Real-Time Phoneme Detection: The BeBe System

Abstract

Related Links