Data-Driven Mispronunciation Pattern Discovery for Robust Speech Recognition

Anna Seo Gyeong Choi; Jonghyeon Park; Myungwoo Oh

arXiv:2502.00583·cs.CL·June 4, 2025

Data-Driven Mispronunciation Pattern Discovery for Robust Speech Recognition

Anna Seo Gyeong Choi, Jonghyeon Park, Myungwoo Oh

PDF

Open Access

TL;DR

This paper introduces two data-driven methods for detecting mispronunciation patterns in speech, significantly enhancing recognition accuracy for non-native speakers without relying on linguistic rules.

Contribution

It presents novel approaches using speech corpora and attention maps to automatically identify mispronunciations, improving robustness of ASR systems for non-native speech.

Findings

01

5.7% improvement on native English speech recognition

02

12.8% improvement on non-native English speakers, especially Korean

03

Practical methods for robust ASR without prior linguistic knowledge

Abstract

Recent advancements in machine learning have significantly improved speech recognition, but recognizing speech from non-fluent or accented speakers remains a challenge. Previous efforts, relying on rule-based pronunciation patterns, have struggled to fully capture non-native errors. We propose two data-driven approaches using speech corpora to automatically detect mispronunciation patterns. By aligning non-native phones with their native counterparts using attention maps, we achieved a 5.7% improvement in speech recognition on native English datasets and a 12.8% improvement for non-native English speakers, particularly Korean speakers. Our method offers practical advancements for robust Automatic Speech Recognition (ASR) systems particularly for situations where prior linguistic knowledge is not applicable.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing

MethodsSoftmax · Attention Is All You Need