LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech
Titouan Parcollet, Ha Nguyen, Solene Evain, Marcely Zanon Boito,, Adrien Pupier, Salima Mdhaffar, Hang Le, Sina Alisamir, Natalia Tomashenko,, Marco Dinarelli, Shucong Zhang, Alexandre Allauzen, Maximin Coavoux, Yannick, Esteve, Mickael Rouvier, Jerome Goulian

TL;DR
LeBenchmark 2.0 is a comprehensive, open-source framework for evaluating and developing French speech SSL models, featuring large datasets, multiple pre-trained models, and diverse downstream tasks, highlighting performance and energy considerations.
Contribution
It introduces a standardized, replicable framework with extensive datasets, multiple pre-trained models, and evaluation protocols for French speech SSL, advancing research and benchmarking.
Findings
Pre-trained models outperform previous models and multilingual benchmarks.
Models trained on 14,000 hours of speech show improved performance.
Training larger models increases energy consumption up to fourfold.
Abstract
Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-equipped French speech technologies. It includes documented, large-scale and heterogeneous corpora with up to 14,000 hours of heterogeneous speech, ten pre-trained SSL wav2vec 2.0 models containing from 26 million to one billion learnable parameters shared with the community, and an evaluation protocol made of six downstream tasks to complement existing benchmarks. LeBenchmark 2.0 also presents unique perspectives on pre-trained SSL models for speech with the investigation of frozen versus…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗LeBenchmark/wav2vec2-FR-1K-basemodel· 15 dl· ♡ 115 dl♡ 1
- 🤗LeBenchmark/wav2vec2-FR-1K-largemodel· 6 dl6 dl
- 🤗LeBenchmark/wav2vec2-FR-2.6K-basemodel· 26 dl26 dl
- 🤗LeBenchmark/wav2vec2-FR-3K-basemodel· 7 dl7 dl
- 🤗LeBenchmark/wav2vec2-FR-3K-largemodel· 196 dl· ♡ 1196 dl♡ 1
- 🤗LeBenchmark/wav2vec2-FR-7K-basemodel· 448 dl· ♡ 1448 dl♡ 1
- 🤗LeBenchmark/wav2vec2-FR-7K-largemodel· 8.9k dl· ♡ 128.9k dl♡ 12
- 🤗LeBenchmark/wav2vec2-FR-14K-largemodel· 23 dl· ♡ 223 dl♡ 2
- 🤗LeBenchmark/wav2vec2-FR-14K-xlargemodel· 4 dl· ♡ 14 dl♡ 1
- 🤗LeBenchmark/wav2vec2-FR-14K-lightmodel· 4 dl4 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
