Open Implementation and Study of BEST-RQ for Speech Processing

Ryan Whetten; Titouan Parcollet; Marco Dinarelli; Yannick Est\`eve

arXiv:2405.04296·cs.CL·September 5, 2024·1 cites

Open Implementation and Study of BEST-RQ for Speech Processing

Ryan Whetten, Titouan Parcollet, Marco Dinarelli, Yannick Est\`eve

PDF

Open Access 1 Repo

TL;DR

This paper re-implements and evaluates BEST-RQ, a simpler SSL method for speech processing, demonstrating comparable performance to wav2vec 2.0 while significantly reducing training time across multiple tasks.

Contribution

It provides an open-source implementation of BEST-RQ and compares its performance and efficiency to wav2vec 2.0 on various speech tasks.

Findings

01

BEST-RQ achieves similar downstream task performance as wav2vec 2.0.

02

Training time for BEST-RQ is over twice as fast as wav2vec 2.0.

03

The study offers detailed implementation insights and a preliminary evaluation across four tasks.

Abstract

Self-Supervised Learning (SSL) has proven to be useful in various speech tasks. However, these methods are generally very demanding in terms of data, memory, and computational resources. BERT-based Speech pre-Training with Random-projection Quantizer (BEST-RQ), is an SSL method that has shown great performance on Automatic Speech Recognition (ASR) while being simpler than other SSL methods, such as wav2vec 2.0. Despite BEST-RQ's great performance, details are lacking in the original paper, such as the amount of GPU/TPU hours used in pre-training, and there is no official easy-to-use open-source implementation. Furthermore, BEST-RQ has not been evaluated on other downstream tasks aside from ASR and speech translation. In this work, we describe a re-implementation of a Random-projection quantizer and perform a preliminary study with a comparison to wav2vec 2.0 on four downstream tasks. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

speechbrain/speechbrain
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis