vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations

Alexei Baevski; Steffen Schneider; Michael Auli

arXiv:1910.05453·cs.CL·February 18, 2020·311 cites

vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations

Alexei Baevski, Steffen Schneider, Michael Auli

PDF

Open Access 3 Repos

TL;DR

The paper introduces vq-wav2vec, a self-supervised method for learning discrete speech representations using quantization techniques, improving phoneme classification and speech recognition performance.

Contribution

It presents a novel approach combining self-supervised learning with discrete representation learning for speech, enabling NLP algorithms to be applied directly to audio.

Findings

01

Achieved state-of-the-art results on TIMIT phoneme classification.

02

Improved WSJ speech recognition performance.

03

Demonstrated effectiveness of discretization in speech tasks.

Abstract

We propose vq-wav2vec to learn discrete representations of audio segments through a wav2vec-style self-supervised context prediction task. The algorithm uses either a gumbel softmax or online k-means clustering to quantize the dense representations. Discretization enables the direct application of algorithms from the NLP community which require discrete inputs. Experiments show that BERT pre-training achieves a new state of the art on TIMIT phoneme classification and WSJ speech recognition.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques

MethodsLinear Layer · Gumbel Softmax · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece