Towards Accurate Phonetic Error Detection Through Phoneme Similarity Modeling

Xuanru Zhou; Jiachen Lian; Cheol Jun Cho; Tejas Prabhune; Shuhe Li; William Li; Rodrigo Ortiz; Zoe Ezzes; Jet Vonk; Brittany Morin; Rian Bogley; Lisa Wauters; Zachary Miller; Maria Gorno-Tempini; Gopala Anumanchipalli

arXiv:2507.14346·eess.AS·July 22, 2025

Towards Accurate Phonetic Error Detection Through Phoneme Similarity Modeling

Xuanru Zhou, Jiachen Lian, Cheol Jun Cho, Tejas Prabhune, Shuhe Li, William Li, Rodrigo Ortiz, Zoe Ezzes, Jet Vonk, Brittany Morin, Rian Bogley, Lisa Wauters, Zachary Miller, Maria Gorno-Tempini, Gopala Anumanchipalli

PDF

TL;DR

This paper introduces a phoneme recognition framework that models phoneme similarity to improve phonetic error detection, addressing variability from accents and dysfluencies, and establishes a new benchmark with a novel dataset and metrics.

Contribution

It proposes a novel phoneme similarity modeling approach with multi-task training for more accurate phonetic error detection and introduces a new dataset and evaluation metrics.

Findings

01

Improved phonetic error detection accuracy.

02

Open-sourced VCTK-accent dataset with phonetic errors.

03

New metrics for pronunciation difference assessment.

Abstract

Phonetic error detection, a core subtask of automatic pronunciation assessment, identifies pronunciation deviations at the phoneme level. Speech variability from accents and dysfluencies challenges accurate phoneme recognition, with current models failing to capture these discrepancies effectively. We propose a verbatim phoneme recognition framework using multi-task training with novel phoneme similarity modeling that transcribes what speakers actually say rather than what they're supposed to say. We develop and open-source \textit{VCTK-accent}, a simulated dataset containing phonetic errors, and propose two novel metrics for assessing pronunciation differences. Our work establishes a new benchmark for phonetic error detection.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.