Data-Driven Mispronunciation Pattern Discovery for Robust Speech Recognition
Anna Seo Gyeong Choi, Jonghyeon Park, Myungwoo Oh

TL;DR
This paper introduces two data-driven methods for detecting mispronunciation patterns in speech, significantly enhancing recognition accuracy for non-native speakers without relying on linguistic rules.
Contribution
It presents novel approaches using speech corpora and attention maps to automatically identify mispronunciations, improving robustness of ASR systems for non-native speech.
Findings
5.7% improvement on native English speech recognition
12.8% improvement on non-native English speakers, especially Korean
Practical methods for robust ASR without prior linguistic knowledge
Abstract
Recent advancements in machine learning have significantly improved speech recognition, but recognizing speech from non-fluent or accented speakers remains a challenge. Previous efforts, relying on rule-based pronunciation patterns, have struggled to fully capture non-native errors. We propose two data-driven approaches using speech corpora to automatically detect mispronunciation patterns. By aligning non-native phones with their native counterparts using attention maps, we achieved a 5.7% improvement in speech recognition on native English datasets and a 12.8% improvement for non-native English speakers, particularly Korean speakers. Our method offers practical advancements for robust Automatic Speech Recognition (ASR) systems particularly for situations where prior linguistic knowledge is not applicable.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
MethodsSoftmax · Attention Is All You Need
