LeBenchmark: A Reproducible Framework for Assessing Self-Supervised   Representation Learning from Speech

Solene Evain; Ha Nguyen; Hang Le; Marcely Zanon Boito; Salima; Mdhaffar; Sina Alisamir; Ziyi Tong; Natalia Tomashenko; Marco Dinarelli,; Titouan Parcollet; Alexandre Allauzen; Yannick Esteve; Benjamin Lecouteux,; Francois Portet; Solange Rossato; Fabien Ringeval; Didier Schwab; Laurent; Besacier

arXiv:2104.11462·cs.CL·July 19, 2022

LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

Solene Evain, Ha Nguyen, Hang Le, Marcely Zanon Boito, Salima, Mdhaffar, Sina Alisamir, Ziyi Tong, Natalia Tomashenko, Marco Dinarelli,, Titouan Parcollet, Alexandre Allauzen, Yannick Esteve, Benjamin Lecouteux,, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab

PDF

1 Repo

TL;DR

LeBenchmark provides a comprehensive, reproducible framework to evaluate self-supervised learning methods across multiple speech tasks and languages, addressing previous evaluation inconsistencies.

Contribution

It introduces a standardized benchmark for SSL in speech, including diverse tasks and languages, enabling fair comparison and reproducibility of results.

Findings

01

SSL benefits most speech tasks tested

02

Evaluation shows SSL is not universally advantageous

03

Framework supports reproducible research in speech SSL

Abstract

Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these works suggest it is possible to reduce dependence on labeled data for building efficient speech systems, their evaluation was mostly made on ASR and using multiple and heterogeneous experimental settings (most of them for English). This questions the objective comparison of SSL approaches and the evaluation of their impact on building speech systems. In this paper, we propose LeBenchmark: a reproducible framework for assessing SSL from speech. It not only includes ASR (high and low resource) tasks but also spoken language understanding, speech translation and emotion recognition. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LeBenchmark/Interspeech2021
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.