Data Augmenting Contrastive Learning of Speech Representations in the   Time Domain

Eugene Kharitonov; Morgane Rivi\`ere; Gabriel Synnaeve; Lior; Wolf; Pierre-Emmanuel Mazar\'e; Matthijs Douze; Emmanuel Dupoux

arXiv:2007.00991·eess.AS·July 3, 2020

Data Augmenting Contrastive Learning of Speech Representations in the Time Domain

Eugene Kharitonov, Morgane Rivi\`ere, Gabriel Synnaeve, Lior, Wolf, Pierre-Emmanuel Mazar\'e, Matthijs Douze, Emmanuel Dupoux

PDF

1 Repo

TL;DR

This paper introduces WavAugment, a time-domain data augmentation library that significantly improves contrastive speech representation learning, outperforming previous methods and reducing data requirements.

Contribution

The paper presents WavAugment, a novel time-domain data augmentation approach that enhances contrastive predictive coding for speech representations, achieving state-of-the-art results with less data.

Findings

01

Augmentation in the past improves performance more than other methods.

02

Combining pitch, noise, and reverberation boosts CPC by 18-22%.

03

Outperforms Libri-light with 600x less data and matches state-of-the-art on Zero Speech Benchmark.

Abstract

Contrastive Predictive Coding (CPC), based on predicting future segments of speech based on past segments is emerging as a powerful algorithm for representation learning of speech signal. However, it still under-performs other methods on unsupervised evaluation benchmarks. Here, we introduce WavAugment, a time-domain data augmentation library and find that applying augmentation in the past is generally more efficient and yields better performances than other methods. We find that a combination of pitch modification, additive noise and reverberation substantially increase the performance of CPC (relative improvement of 18-22%), beating the reference Libri-light results with 600 times less data. Using an out-of-domain dataset, time-domain data augmentation can push CPC to be on par with the state of the art on the Zero Speech Benchmark 2017. We also show that time-domain data augmentation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/WavAugment
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.