Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning
William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang,, Yifan Peng, Xuankai Chang, Soumi Maiti, Shinji Watanabe

TL;DR
This paper introduces WavLabLM, a scalable multilingual self-supervised learning model that achieves high performance with significantly less data and computational resources, making SSL more accessible for research groups.
Contribution
The paper presents WavLabLM with a novel multi-stage pre-training method for multilingual SSL, enabling efficient training across 136 languages with limited resources.
Findings
WavLabLM matches XLS-R performance on ML-SUPERB with less than 10% data.
A vanilla HuBERT Base model retains 94% of XLS-R's performance with only 3% data.
All code and models are open-sourced in ESPnet.
Abstract
Multilingual self-supervised learning (SSL) has often lagged behind state-of-the-art (SOTA) methods due to the expenses and complexity required to handle many languages. This further harms the reproducibility of SSL, which is already limited to few research groups due to its resource usage. We show that more powerful techniques can actually lead to more efficient pre-training, opening SSL to more research groups. We propose WavLabLM, which extends WavLM's joint prediction and denoising to 40k hours of data across 136 languages. To build WavLabLM, we devise a novel multi-stage pre-training method, designed to address the language imbalance of multilingual data. WavLabLM achieves comparable performance to XLS-R on ML-SUPERB with less than 10% of the training data, making SSL realizable with academic compute. We show that further efficiency can be achieved with a vanilla HuBERT Base model,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsBalanced Selection
