Probing phoneme, language and speaker information in unsupervised speech representations
Maureen de Seyssel, Marvin Lavechin, Yossi Adi, Emmanuel Dupoux and, Guillaume Wisniewski

TL;DR
This paper investigates the types of linguistic and speaker information encoded in unsupervised CPC speech representations, revealing that bilingual models better discriminate languages and that gender information increases with more clusters, with some trade-offs in phoneme discrimination.
Contribution
It provides a comprehensive analysis of phoneme, language, and gender information in CPC speech representations, highlighting differences between monolingual and bilingual models and their downstream task implications.
Findings
Gender and phone class information are present in both models.
Language information is more salient in bilingual models.
More clusters encode more gender information.
Abstract
Unsupervised models of representations based on Contrastive Predictive Coding (CPC)[1] are primarily used in spoken language modelling in that they encode phonetic information. In this study, we ask what other types of information are present in CPC speech representations. We focus on three categories: phone class, gender and language, and compare monolingual and bilingual models. Using qualitative and quantitative tools, we find that both gender and phone class information are present in both types of models. Language information, however, is very salient in the bilingual model only, suggesting CPC models learn to discriminate languages when trained on multiple languages. Some language information can also be retrieved from monolingual models, but it is more diffused across all features. These patterns hold when analyses are carried on the discrete units from a downstream clustering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsInfoNCE · Contrastive Predictive Coding
