We define a new approach to speech recognition based on auditory perception and modeled after
the human brainŐs tendency to automatically categorize speech sounds [House 1962; Liberman
1957]. As background, todayŐs speech recognition systems are knowledge-driven since they
require the existence of word and syntax-level knowledge to identify a word from the sound. In
contrast, our system uses no higher-level knowledge. Its architecture consists of competing
parallel detectors which in real time identify phonemes in the waveform. Each detector, which is
a simple algorithm, continuously samples the sound and reports the degree to which the samples
contain its designated phoneme. The phoneme detector with the highest precedence and the
greatest certainty above a minimal threshold prevails and its phoneme is added to an output
queue. In preliminary experiments, four such detectors were tested and they properly identified
83-100% of their designated phonemes in both discrete and continuous speech, independent of
the speaker, suggesting that an overall system which incorporates our approach would be much
more robust and flexible than traditional systems.
Latanya Sweeney and Patrick Thompson. Speech Perception Using Real-Time Phoneme Detection: The BeBe System. Massachusetts Institute of Technology, Laboratory for Computer Science: Tech Report MIT-LCS-TR-736. 1998. (PDF).