Pronunciation Deviation Analysis Through Voice Cloning and Acoustic Comparison
Andrew Valdivia, Yueming Zhang, Hailu Xu, Amir Ghasemkhani, Xin Qin

TL;DR
This paper introduces a new method for detecting pronunciation errors by comparing original speech with voice-cloned versions that have correct pronunciation, using acoustic analysis to identify deviations.
Contribution
It presents a novel voice cloning-based approach for mispronunciation detection that does not rely on predefined phonetic rules or large training datasets.
Findings
Effective in identifying pronunciation errors
Does not require extensive language-specific training
Pinpoints problematic speech segments
Abstract
This paper presents a novel approach for detecting mispronunciations by analyzing deviations between a user's original speech and their voice-cloned counterpart with corrected pronunciation. We hypothesize that regions with maximal acoustic deviation between the original and cloned utterances indicate potential mispronunciations. Our method leverages recent advances in voice cloning to generate a synthetic version of the user's voice with proper pronunciation, then performs frame-by-frame comparisons to identify problematic segments. Experimental results demonstrate the effectiveness of this approach in pinpointing specific pronunciation errors without requiring predefined phonetic rules or extensive training data for each target language.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhonetics and Phonology Research · Speech Recognition and Synthesis
