Mispronunciation Detection and Diagnosis Without Model Training: A Retrieval-Based Approach
Huu Tuong Tu, Ha Viet Khanh, Tran Tien Dat, Vu Huan, Thien Van Luong, Nguyen Tien Cuong, Nguyen Thi Thu Trang

TL;DR
This paper introduces a training-free, retrieval-based method for mispronunciation detection and diagnosis that leverages pretrained ASR models, achieving high accuracy without additional training.
Contribution
It presents a novel retrieval-based framework that detects pronunciation errors without phoneme-specific models or task-specific training.
Findings
Achieved a 69.60% F1 score on L2-ARCTIC dataset.
Avoided the complexity of model training.
Demonstrated effectiveness of retrieval techniques with pretrained ASR models.
Abstract
Mispronunciation Detection and Diagnosis (MDD) is crucial for language learning and speech therapy. Unlike conventional methods that require scoring models or training phoneme-level models, we propose a novel training-free framework that leverages retrieval techniques with a pretrained Automatic Speech Recognition model. Our method avoids phoneme-specific modeling or additional task-specific training, while still achieving accurate detection and diagnosis of pronunciation errors. Experiments on the L2-ARCTIC dataset show that our method achieves a superior F1 score of 69.60% while avoiding the complexity of model training.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage Development and Disorders · Speech Recognition and Synthesis · Voice and Speech Disorders
