data2vec-aqc: Search for the right Teaching Assistant in the   Teacher-Student training setup

Vasista Sai Lodagala; Sreyan Ghosh; S. Umesh

arXiv:2211.01246·eess.AS·May 16, 2023

data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student training setup

Vasista Sai Lodagala, Sreyan Ghosh, S. Umesh

PDF

Open Access 1 Repo

TL;DR

This paper introduces data2vec-aqc, a self-supervised learning algorithm for speech representation that enhances performance in low-resource settings by incorporating data augmentation, quantized representations, and clustering, leading to significant WER improvements.

Contribution

The paper presents a novel extension of data2vec, integrating new modules for data augmentation, quantization, and clustering to improve speech SSL in limited data scenarios.

Findings

01

Up to 14.1% WER reduction on LibriSpeech test-clean

02

Up to 20.9% WER reduction on LibriSpeech test-other

03

Up to 17.8% WER improvement on Switchboard

Abstract

In this paper, we propose a new Self-Supervised Learning (SSL) algorithm called data2vec-aqc, for speech representation learning from unlabeled speech data. Our goal is to improve SSL for speech in domains where both unlabeled and labeled data are limited. Building on the recently introduced data2vec, we introduce additional modules to the data2vec framework that leverage the benefit of data augmentations, quantized representations, and clustering. The interaction between these modules helps solve the cross-contrastive loss as an additional self-supervised objective. data2vec-aqc achieves up to 14.1% and 20.9% relative WER improvement over the existing state-of-the-art data2vec system over the test-clean and test-other sets, respectively of LibriSpeech, without the use of any language model (LM). Our proposed model also achieves up to 17.8\% relative WER gains over the baseline data2vec…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

speech-lab-iitm/data2vec-aqc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems