Adaptive Frequency Cepstral Coefficients for Word Mispronunciation Detection
Zhenhao Ge, Sudhendu R. Sharma, Mark J. T. Smith

TL;DR
This paper introduces an adaptive frequency warping technique for cepstral coefficients that improves the accuracy of detecting non-native pronunciation errors in speech recognition-based language learning tools.
Contribution
It proposes a novel adaptive frequency warping method that enhances feature representation for better mispronunciation detection in speech recognition systems.
Findings
Adaptive frequency scale improves classification accuracy.
Higher detection rates compared to conventional methods.
Effective in distinguishing native and non-native pronunciations.
Abstract
Systems based on automatic speech recognition (ASR) technology can provide important functionality in computer assisted language learning applications. This is a young but growing area of research motivated by the large number of students studying foreign languages. Here we propose a Hidden Markov Model (HMM)-based method to detect mispronunciations. Exploiting the specific dialog scripting employed in language learning software, HMMs are trained for different pronunciations. New adaptive features have been developed and obtained through an adaptive warping of the frequency scale prior to computing the cepstral coefficients. The optimization criterion used for the warping function is to maximize separation of two major groups of pronunciations (native and non-native) in terms of classification rate. Experimental results show that the adaptive frequency scale yields a better coefficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
