An Adaptive Psychoacoustic Model for Automatic Speech Recognition

Peng Dai; Xue Teng; Frank Rudzicz; Ing Yann Soon

arXiv:1609.04417·cs.CL·September 16, 2016

An Adaptive Psychoacoustic Model for Automatic Speech Recognition

Peng Dai, Xue Teng, Frank Rudzicz, Ing Yann Soon

PDF

Open Access

TL;DR

This paper introduces an adaptive psychoacoustic model incorporating otoacoustic emissions into ASR, significantly enhancing noise robustness and improving word recognition accuracy in noisy environments.

Contribution

It presents a novel psychoacoustic model with a double-transform spectrum-analysis technique, advancing noise robustness in automatic speech recognition systems.

Findings

01

Achieved up to 85.39% word recognition accuracy on noisy data

02

Significant improvement over baseline in noisy environments

03

Validated effectiveness through experiments on AURORA2 database

Abstract

Compared with automatic speech recognition (ASR), the human auditory system is more adept at handling noise-adverse situations, including environmental noise and channel distortion. To mimic this adeptness, auditory models have been widely incorporated in ASR systems to improve their robustness. This paper proposes a novel auditory model which incorporates psychoacoustics and otoacoustic emissions (OAEs) into ASR. In particular, we successfully implement the frequency-dependent property of psychoacoustic models and effectively improve resulting system performance. We also present a novel double-transform spectrum-analysis technique, which can qualitatively predict ASR performance for different noise types. Detailed theoretical analysis is provided to show the effectiveness of the proposed algorithm. Experiments are carried out on the AURORA2 database and show that the word recognition…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing