Beyond Acoustic Sparsity and Linguistic Bias: A Prompt-Free Paradigm for Mispronunciation Detection and Diagnosis

Haopeng Geng; Longfei Yang; Xi Chen; Haitong Sun; Daisuke Saito; Nobuaki Minematsu

arXiv:2604.22133·eess.AS·April 27, 2026

Beyond Acoustic Sparsity and Linguistic Bias: A Prompt-Free Paradigm for Mispronunciation Detection and Diagnosis

Haopeng Geng, Longfei Yang, Xi Chen, Haitong Sun, Daisuke Saito, Nobuaki Minematsu

PDF

TL;DR

This paper introduces a novel prompt-free framework for mispronunciation detection that decouples acoustic fidelity from canonical guidance, improving robustness and accuracy over existing methods.

Contribution

The authors propose CROTTC and IF strategies to better model pronunciation deviations without relying on explicit priors or sequence-level alignments.

Findings

01

Achieved 71.77% F1-score on L2-ARCTIC

02

Achieved 71.70% F1-score on Iqra'Eval2 leaderboard

03

Decoupling acoustics from priors enhances robustness of MDD

Abstract

Mispronunciation Detection and Diagnosis (MDD) requires modeling fine-grained acoustic deviations. However, current ASR-derived MDD systems often face inherent limitations. In particular, CTC-based models favor sequence-level alignments that neglect transient mispronunciation cues, while explicit canonical priors bias predictions toward intended targets. To address these bottlenecks, we propose a prompt-free framework decoupling acoustic fidelity from canonical guidance. First, we introduce CROTTC, an acoustic model enforcing monotonic, frame-level alignment to accurately capture pronunciation deviations. Second, we implicitly inject mispronunciation information via the IF strategy under the knowledge transfer principle. Experiments show CROTTC-IF achieves a 71.77% F1-score on L2-ARCTIC and 71.70% F1-score on the Iqra'Eval2 leaderboard. With empirical analysis, we demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.