An Adaptive Psychoacoustic Model for Automatic Speech Recognition
Peng Dai, Xue Teng, Frank Rudzicz, Ing Yann Soon

TL;DR
This paper introduces an adaptive psychoacoustic model incorporating otoacoustic emissions into ASR, significantly enhancing noise robustness and improving word recognition accuracy in noisy environments.
Contribution
It presents a novel psychoacoustic model with a double-transform spectrum-analysis technique, advancing noise robustness in automatic speech recognition systems.
Findings
Achieved up to 85.39% word recognition accuracy on noisy data
Significant improvement over baseline in noisy environments
Validated effectiveness through experiments on AURORA2 database
Abstract
Compared with automatic speech recognition (ASR), the human auditory system is more adept at handling noise-adverse situations, including environmental noise and channel distortion. To mimic this adeptness, auditory models have been widely incorporated in ASR systems to improve their robustness. This paper proposes a novel auditory model which incorporates psychoacoustics and otoacoustic emissions (OAEs) into ASR. In particular, we successfully implement the frequency-dependent property of psychoacoustic models and effectively improve resulting system performance. We also present a novel double-transform spectrum-analysis technique, which can qualitatively predict ASR performance for different noise types. Detailed theoretical analysis is provided to show the effectiveness of the proposed algorithm. Experiments are carried out on the AURORA2 database and show that the word recognition…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
