Combined Acoustic and Pronunciation Modelling for Non-Native Speech Recognition
Ghazi Bouselmi (INRIA Lorraine - LORIA), Dominique Fohr (INRIA, Lorraine - LORIA), Irina Illina (INRIA Lorraine - LORIA)

TL;DR
This paper introduces combined acoustic and pronunciation adaptation techniques for non-native speech recognition, significantly improving accuracy on foreign accented English speech.
Contribution
It presents a novel phonetic confusion scheme and demonstrates the effectiveness of combining multiple adaptation methods for non-native speech recognition.
Findings
Up to 71% relative word error reduction
Pronunciation modelling combined with acoustic adaptation improves accuracy
Effective adaptation techniques for foreign accented English speech
Abstract
In this paper, we present several adaptation methods for non-native speech recognition. We have tested pronunciation modelling, MLLR and MAP non-native pronunciation adaptation and HMM models retraining on the HIWIRE foreign accented English speech database. The ``phonetic confusion'' scheme we have developed consists in associating to each spoken phone several sequences of confused phones. In our experiments, we have used different combinations of acoustic models representing the canonical and the foreign pronunciations: spoken and native models, models adapted to the non-native accent with MAP and MLLR. The joint use of pronunciation modelling and acoustic adaptation led to further improvements in recognition accuracy. The best combination of the above mentioned techniques resulted in a relative word error reduction ranging from 46% to 71%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
