Similarity Analysis of Self-Supervised Speech Representations

Yu-An Chung; Yonatan Belinkov; James Glass

arXiv:2010.11481·eess.AS·February 3, 2021·1 cites

Similarity Analysis of Self-Supervised Speech Representations

Yu-An Chung, Yonatan Belinkov, James Glass

PDF

Open Access

TL;DR

This paper provides a comparative analysis of self-supervised speech representations, examining their similarities, properties, and the impact of training objectives versus architecture on their effectiveness.

Contribution

It introduces a systematic comparison of prominent self-supervised speech models, highlighting the influence of training objectives over architectural differences.

Findings

01

Training objectives significantly affect representation similarity.

02

Pre-training loss correlates with downstream performance.

03

Architectural choices have less impact than training objectives.

Abstract

Self-supervised speech representation learning has recently been a prosperous research topic. Many algorithms have been proposed for learning useful representations from large-scale unlabeled data, and their applications to a wide range of speech tasks have also been investigated. However, there has been little research focusing on understanding the properties of existing approaches. In this work, we aim to provide a comparative study of some of the most representative self-supervised algorithms. Specifically, we quantify the similarities between different self-supervised representations using existing similarity measures. We also design probing tasks to study the correlation between the models' pre-training loss and the amount of specific speech information contained in their learned representations. In addition to showing how various self-supervised models behave differently given the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Topic Modeling