CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised   learning of speech representations

Vasista Sai Lodagala; Sreyan Ghosh; S. Umesh

arXiv:2210.02592·cs.CL·May 16, 2023

CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised learning of speech representations

Vasista Sai Lodagala, Sreyan Ghosh, S. Umesh

PDF

Open Access 1 Repo 7 Models

TL;DR

The paper introduces ccc-wav2vec 2.0, a novel self-supervised speech representation learning method that leverages clustering and cross-contrastive loss to improve robustness and accuracy, achieving significant WER reductions on LibriSpeech and Switchboard datasets.

Contribution

It proposes a new pre-training strategy combining clustering and augmentation-based cross-contrastive loss, enhancing speech representation learning over existing methods.

Findings

01

Up to 15.6% WER reduction on LibriSpeech test-clean

02

Up to 12.7% WER reduction on LibriSpeech test-other

03

Up to 14.9% WER reduction on Switchboard

Abstract

While Self-Supervised Learning has helped reap the benefit of the scale from the available unlabeled data, the learning paradigms are continuously being bettered. We present a new pre-training strategy named ccc-wav2vec 2.0, which uses clustering and an augmentation-based cross-contrastive loss as its self-supervised objective. Through the clustering module, we scale down the influence of those negative examples that are highly similar to the positive. The Cross-Contrastive loss is computed between the encoder output of the original sample and the quantizer output of its augmentation and vice-versa, bringing robustness to the pre-training strategy. ccc-wav2vec 2.0 achieves up to 15.6% and 12.7% relative WER improvement over the baseline wav2vec 2.0 on the test-clean and test-other sets, respectively, of LibriSpeech, without the use of any language model. The proposed method also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

speech-lab-iitm/ccc-wav2vec-2.0
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing