What Do Self-Supervised Speech Models Know About Words?

Ankita Pasad; Chung-Ming Chien; Shane Settle; Karen Livescu

arXiv:2307.00162·cs.CL·February 1, 2024·2 cites

What Do Self-Supervised Speech Models Know About Words?

Ankita Pasad, Chung-Ming Chien, Shane Settle, Karen Livescu

PDF

Open Access 1 Repo

TL;DR

This study investigates what linguistic information self-supervised speech models encode at the word level, revealing how training objectives and model size affect their understanding of words, boundaries, and semantics.

Contribution

The paper provides a comparative analysis of layer-wise representations in ten S3Ms, highlighting the influence of training objectives and model size on linguistic knowledge encoding.

Findings

01

Layer-wise representations vary in informativeness across layers.

02

Training objectives and model size significantly affect information distribution.

03

S3Ms with visual grounding outperform speech-only models on certain tasks.

Abstract

Many self-supervised speech models (S3Ms) have been introduced over the last few years, improving performance and data efficiency on various speech tasks. However, these empirical successes alone do not give a complete picture of what is learned during pre-training. Recent work has begun analyzing how S3Ms encode certain properties, such as phonetic and speaker information, but we still lack a proper understanding of knowledge encoded at the word level and beyond. In this work, we use lightweight analysis methods to study segment-level linguistic properties -- word identity, boundaries, pronunciation, syntactic features, and semantic features -- encoded in S3Ms. We present a comparative study of layer-wise representations from ten S3Ms and find that (i) the frame-level representations within each word segment are not all equally informative, and (ii) the pre-training objective and model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ankitapasad/layerwise-analysis
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Speech and dialogue systems