Leveraging Allophony in Self-Supervised Speech Models for Atypical   Pronunciation Assessment

Kwanghee Choi; Eunjung Yeo; Kalvin Chang; Shinji Watanabe; David; Mortensen

arXiv:2502.07029·cs.CL·March 25, 2025

Leveraging Allophony in Self-Supervised Speech Models for Atypical Pronunciation Assessment

Kwanghee Choi, Eunjung Yeo, Kalvin Chang, Shinji Watanabe, David, Mortensen

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces MixGoP, a novel approach using Gaussian mixture models with self-supervised speech features to better model allophonic variation, significantly improving atypical pronunciation assessment across diverse speech datasets.

Contribution

The paper presents MixGoP, a new method that effectively models allophonic variation using Gaussian mixtures and self-supervised features, advancing pronunciation assessment accuracy.

Findings

01

MixGoP achieves state-of-the-art results on multiple datasets.

02

S3M features better capture allophonic variation than traditional features.

03

Integrating MixGoP with S3M features enhances pronunciation assessment.

Abstract

Allophony refers to the variation in the phonetic realization of a phoneme based on its phonetic environment. Modeling allophones is crucial for atypical pronunciation assessment, which involves distinguishing atypical from typical pronunciations. However, recent phoneme classifier-based approaches often simplify this by treating various realizations as a single phoneme, bypassing the complexity of modeling allophonic variation. Motivated by the acoustic modeling capabilities of frozen self-supervised speech model (S3M) features, we propose MixGoP, a novel approach that leverages Gaussian mixture models to model phoneme distributions with multiple subclusters. Our experiments show that MixGoP achieves state-of-the-art performance across four out of five datasets, including dysarthric and non-native speech. Our analysis further suggests that S3M features capture allophonic variation more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

juice500ml/acoustic-units-for-ood
pytorchOfficial

Videos

Leveraging Allophony in Self-Supervised Speech Models for Atypical Pronunciation Assessment· underline

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research