Pronunciation Deviation Analysis Through Voice Cloning and Acoustic Comparison

Andrew Valdivia; Yueming Zhang; Hailu Xu; Amir Ghasemkhani; Xin Qin

arXiv:2507.10985·cs.SD·July 16, 2025

Pronunciation Deviation Analysis Through Voice Cloning and Acoustic Comparison

Andrew Valdivia, Yueming Zhang, Hailu Xu, Amir Ghasemkhani, Xin Qin

PDF

Open Access

TL;DR

This paper introduces a new method for detecting pronunciation errors by comparing original speech with voice-cloned versions that have correct pronunciation, using acoustic analysis to identify deviations.

Contribution

It presents a novel voice cloning-based approach for mispronunciation detection that does not rely on predefined phonetic rules or large training datasets.

Findings

01

Effective in identifying pronunciation errors

02

Does not require extensive language-specific training

03

Pinpoints problematic speech segments

Abstract

This paper presents a novel approach for detecting mispronunciations by analyzing deviations between a user's original speech and their voice-cloned counterpart with corrected pronunciation. We hypothesize that regions with maximal acoustic deviation between the original and cloned utterances indicate potential mispronunciations. Our method leverages recent advances in voice cloning to generate a synthetic version of the user's voice with proper pronunciation, then performs frame-by-frame comparisons to identify problematic segments. Experimental results demonstrate the effectiveness of this approach in pinpointing specific pronunciation errors without requiring predefined phonetic rules or extensive training data for each target language.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhonetics and Phonology Research · Speech Recognition and Synthesis