Speech Representation Analysis based on Inter- and Intra-Model   Similarities

Yassine El Kheir; Ahmed Ali; Shammur Absar Chowdhury

arXiv:2406.16099·cs.SD·June 25, 2024

Speech Representation Analysis based on Inter- and Intra-Model Similarities

Yassine El Kheir, Ahmed Ali, Shammur Absar Chowdhury

PDF

Open Access 1 Repo

TL;DR

This paper analyzes self-supervised speech models by examining their internal representations through inter- and intra-model similarities, revealing convergence in representation spaces but differences in neuron-specific concepts.

Contribution

It introduces a novel analysis method based on similarity measures to understand the internal representations of SSL speech models without external annotations.

Findings

01

Models converge to similar representation subspaces.

02

Neuron-localized concepts differ across models.

03

Analysis is independent of external task-specific constraints.

Abstract

Self-supervised models have revolutionized speech processing, achieving new levels of performance in a wide variety of tasks with limited resources. However, the inner workings of these models are still opaque. In this paper, we aim to analyze the encoded contextual representation of these foundation models based on their inter- and intra-model similarity, independent of any external annotation and task-specific constraint. We examine different SSL models varying their training paradigm -- Contrastive (Wav2Vec2.0) and Predictive models (HuBERT); and model sizes (base and large). We explore these models on different levels of localization/distributivity of information including (i) individual neurons; (ii) layer representation; (iii) attention weights and (iv) compare the representations with their finetuned counterparts.Our results highlight that these models converge to similar…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

QCRIVoice/XSSL_speech
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Advanced Computational Techniques and Applications