Probing self-supervised speech models for phonetic and phonemic   information: a case study in aspiration

Kinan Martin; Jon Gauthier; Canaan Breiss; Roger Levy

arXiv:2306.06232·cs.CL·June 13, 2023·1 cites

Probing self-supervised speech models for phonetic and phonemic information: a case study in aspiration

Kinan Martin, Jon Gauthier, Canaan Breiss, Roger Levy

PDF

Open Access

TL;DR

This study investigates how self-supervised speech models encode phonetic and phonemic information, revealing that early layers capture these distinctions and that speech training enhances their representation in high-dimensional models.

Contribution

It provides a detailed analysis of the linguistic information encoded in self-supervised speech models, highlighting the emergence of phonetic and phonemic distinctions in early layers.

Findings

01

Robust phonetic and phonemic representations emerge in early layers.

02

Deeper layer representations preserve these distinctions in principal components.

03

Speech training and high-dimensional architecture both contribute to these representations.

Abstract

Textless self-supervised speech models have grown in capabilities in recent years, but the nature of the linguistic information they encode has not yet been thoroughly examined. We evaluate the extent to which these models' learned representations align with basic representational distinctions made by humans, focusing on a set of phonetic (low-level) and phonemic (more abstract) contrasts instantiated in word-initial stops. We find that robust representations of both phonetic and phonemic distinctions emerge in early layers of these models' architectures, and are preserved in the principal components of deeper layer representations. Our analyses suggest two sources for this success: some can only be explained by the optimization of the models on speech data, while some can be attributed to these models' high-dimensional architectures. Our findings show that speech-trained HuBERT derives…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques

MethodsALIGN