Joint Prediction and Denoising for Large-scale Multilingual   Self-supervised Learning

William Chen; Jiatong Shi; Brian Yan; Dan Berrebbi; Wangyou Zhang,; Yifan Peng; Xuankai Chang; Soumi Maiti; Shinji Watanabe

arXiv:2309.15317·cs.CL·September 29, 2023

Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning

William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang,, Yifan Peng, Xuankai Chang, Soumi Maiti, Shinji Watanabe

PDF

Open Access 4 Models

TL;DR

This paper introduces WavLabLM, a scalable multilingual self-supervised learning model that achieves high performance with significantly less data and computational resources, making SSL more accessible for research groups.

Contribution

The paper presents WavLabLM with a novel multi-stage pre-training method for multilingual SSL, enabling efficient training across 136 languages with limited resources.

Findings

01

WavLabLM matches XLS-R performance on ML-SUPERB with less than 10% data.

02

A vanilla HuBERT Base model retains 94% of XLS-R's performance with only 3% data.

03

All code and models are open-sourced in ESPnet.

Abstract

Multilingual self-supervised learning (SSL) has often lagged behind state-of-the-art (SOTA) methods due to the expenses and complexity required to handle many languages. This further harms the reproducibility of SSL, which is already limited to few research groups due to its resource usage. We show that more powerful techniques can actually lead to more efficient pre-training, opening SSL to more research groups. We propose WavLabLM, which extends WavLM's joint prediction and denoising to 40k hours of data across 136 languages. To build WavLabLM, we devise a novel multi-stage pre-training method, designed to address the language imbalance of multilingual data. WavLabLM achieves comparable performance to XLS-R on ML-SUPERB with less than 10% of the training data, making SSL realizable with academic compute. We show that further efficiency can be achieved with a vanilla HuBERT Base model,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsBalanced Selection