Predicting non-native speech perception using the Perceptual Assimilation Model and state-of-the-art acoustic models
Juliette Millet, Ioana Chitoran, Ewan Dunbar

TL;DR
This study compares the Perceptual Assimilation Model and advanced acoustic models in predicting non-native speech perception, finding that phoneme assimilation better predicts discrimination, while wav2vec 2.0 offers complementary phonetic insights.
Contribution
It introduces a new dataset and demonstrates that phoneme assimilation models outperform fine-grained phonetic models in predicting non-native speech perception.
Findings
Phoneme assimilation predicts speech discrimination better than phonetic models.
Wav2vec 2.0 captures low-level phonetic features but less native language influence.
Combining both approaches offers a comprehensive understanding of speech perception.
Abstract
Our native language influences the way we perceive speech sounds, affecting our ability to discriminate non-native sounds. We compare two ideas about the influence of the native language on speech perception: the Perceptual Assimilation Model, which appeals to a mental classification of sounds into native phoneme categories, versus the idea that rich, fine-grained phonetic representations tuned to the statistics of the native language, are sufficient. We operationalize this idea using representations from two state-of-the-art speech models, a Dirichlet process Gaussian mixture model and the more recent wav2vec 2.0 model. We present a new, open dataset of French- and English-speaking participants' speech perception behaviour for 61 vowel sounds from six languages. We show that phoneme assimilation is a better predictor than fine-grained phonetic modelling, both for the discrimination…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
