A layer-wise analysis of Mandarin and English suprasegmentals in SSL speech models
Ant\'on de la Fuente, Dan Jurafsky

TL;DR
This paper investigates how self-supervised speech models represent suprasegmental features like tones and stress in Mandarin and English, revealing layer-wise and language-specific differences in their internal representations.
Contribution
It provides a detailed layer-wise comparison of Mandarin and English speech models, highlighting how they encode suprasegmental features and how fine-tuning affects their representations.
Findings
Models learn abstract suprasegmental categories mainly in middle layers.
Models are better at features present in their training language.
Fine-tuning enhances representation of lexically contrastive features.
Abstract
This study asks how self-supervised speech models represent suprasegmental categories like Mandarin lexical tone, English lexical stress, and English phrasal accents. Through a series of probing tasks, we make layer-wise comparisons of English and Mandarin 12 layer monolingual models. Our findings suggest that 1) English and Mandarin wav2vec 2.0 models learn contextual representations of abstract suprasegmental categories which are strongest in the middle third of the network. 2) Models are better at representing features that exist in the language of their training data, and this difference is driven by enriched context in transformer blocks, not local acoustic representation. 3) Fine-tuned wav2vec 2.0 improves performance in later layers compared to pre-trained models mainly for lexically contrastive features like tone and stress, 4) HuBERT and WavLM learn similar representations to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques
