LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech
Solene Evain, Ha Nguyen, Hang Le, Marcely Zanon Boito, Salima, Mdhaffar, Sina Alisamir, Ziyi Tong, Natalia Tomashenko, Marco Dinarelli,, Titouan Parcollet, Alexandre Allauzen, Yannick Esteve, Benjamin Lecouteux,, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab

TL;DR
LeBenchmark provides a comprehensive, reproducible framework to evaluate self-supervised learning methods across multiple speech tasks and languages, addressing previous evaluation inconsistencies.
Contribution
It introduces a standardized benchmark for SSL in speech, including diverse tasks and languages, enabling fair comparison and reproducibility of results.
Findings
SSL benefits most speech tasks tested
Evaluation shows SSL is not universally advantageous
Framework supports reproducible research in speech SSL
Abstract
Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these works suggest it is possible to reduce dependence on labeled data for building efficient speech systems, their evaluation was mostly made on ASR and using multiple and heterogeneous experimental settings (most of them for English). This questions the objective comparison of SSL approaches and the evaluation of their impact on building speech systems. In this paper, we propose LeBenchmark: a reproducible framework for assessing SSL from speech. It not only includes ASR (high and low resource) tasks but also spoken language understanding, speech translation and emotion recognition. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
