Pseudo-Labeling for Massively Multilingual Speech Recognition

Loren Lugosch; Tatiana Likhomanenko; Gabriel Synnaeve; Ronan Collobert

arXiv:2111.00161·cs.CL·March 9, 2022

Pseudo-Labeling for Massively Multilingual Speech Recognition

Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert

PDF

Open Access 5 Models

TL;DR

This paper extends pseudo-labeling techniques to massively multilingual speech recognition involving 60 languages, demonstrating effective semi-supervised learning for low-resource languages and improved performance across multiple datasets.

Contribution

Introduces a simple pseudo-labeling recipe for multilingual speech recognition that enhances performance, especially for low-resource languages, and demonstrates transferability to other datasets.

Findings

01

Improved recognition accuracy for many languages.

02

Effective semi-supervised training with pseudo-labels.

03

Good transferability to LibriSpeech dataset.

Abstract

Semi-supervised learning through pseudo-labeling has become a staple of state-of-the-art monolingual speech recognition systems. In this work, we extend pseudo-labeling to massively multilingual speech recognition with 60 languages. We propose a simple pseudo-labeling recipe that works well even with low-resource languages: train a supervised multilingual model, fine-tune it with semi-supervised learning on a target language, generate pseudo-labels for that language, and train a final model using pseudo-labels for all languages, either from scratch or by fine-tuning. Experiments on the labeled Common Voice and unlabeled VoxPopuli datasets show that our recipe can yield a model with better performance for many languages that also transfers well to LibriSpeech.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing