Understanding Self-Supervised Learning of Speech Representation via   Invariance and Redundancy Reduction

Yusuf Brima; Ulf Krumnack; Simone Pika; Gunther Heidemann

arXiv:2309.03619·cs.SD·January 25, 2024

Understanding Self-Supervised Learning of Speech Representation via Invariance and Redundancy Reduction

Yusuf Brima, Ulf Krumnack, Simone Pika, Gunther Heidemann

PDF

Open Access

TL;DR

This paper empirically analyzes Barlow Twins, a self-supervised learning method for speech, highlighting its benefits in transferability and sample efficiency, while discussing its limitations in disentangling learned representations.

Contribution

It provides an empirical evaluation of Barlow Twins for speech, revealing its strengths and limitations, and suggests directions for incorporating additional priors to improve hierarchical representations.

Findings

01

Barlow Twins accelerates learning in downstream speech tasks.

02

Representations transfer effectively across domains.

03

Redundancy reduction alone is insufficient for full factorization.

Abstract

Self-supervised learning (SSL) has emerged as a promising paradigm for learning flexible speech representations from unlabeled data. By designing pretext tasks that exploit statistical regularities, SSL models can capture useful representations that are transferable to downstream tasks. This study provides an empirical analysis of Barlow Twins (BT), an SSL technique inspired by theories of redundancy reduction in human perception. On downstream tasks, BT representations accelerated learning and transferred across domains. However, limitations exist in disentangling key explanatory factors, with redundancy reduction and invariance alone insufficient for factorization of learned latents into modular, compact, and informative codes. Our ablations study isolated gains from invariance constraints, but the gains were context-dependent. Overall, this work substantiates the potential of Barlow…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis

MethodsBarlow Twins