Leveraging Allophony in Self-Supervised Speech Models for Atypical Pronunciation Assessment
Kwanghee Choi, Eunjung Yeo, Kalvin Chang, Shinji Watanabe, David, Mortensen

TL;DR
This paper introduces MixGoP, a novel approach using Gaussian mixture models with self-supervised speech features to better model allophonic variation, significantly improving atypical pronunciation assessment across diverse speech datasets.
Contribution
The paper presents MixGoP, a new method that effectively models allophonic variation using Gaussian mixtures and self-supervised features, advancing pronunciation assessment accuracy.
Findings
MixGoP achieves state-of-the-art results on multiple datasets.
S3M features better capture allophonic variation than traditional features.
Integrating MixGoP with S3M features enhances pronunciation assessment.
Abstract
Allophony refers to the variation in the phonetic realization of a phoneme based on its phonetic environment. Modeling allophones is crucial for atypical pronunciation assessment, which involves distinguishing atypical from typical pronunciations. However, recent phoneme classifier-based approaches often simplify this by treating various realizations as a single phoneme, bypassing the complexity of modeling allophonic variation. Motivated by the acoustic modeling capabilities of frozen self-supervised speech model (S3M) features, we propose MixGoP, a novel approach that leverages Gaussian mixture models to model phoneme distributions with multiple subclusters. Our experiments show that MixGoP achieves state-of-the-art performance across four out of five datasets, including dysarthric and non-native speech. Our analysis further suggests that S3M features capture allophonic variation more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research
