Pseudo-Labeling for Massively Multilingual Speech Recognition
Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert

TL;DR
This paper extends pseudo-labeling techniques to massively multilingual speech recognition involving 60 languages, demonstrating effective semi-supervised learning for low-resource languages and improved performance across multiple datasets.
Contribution
Introduces a simple pseudo-labeling recipe for multilingual speech recognition that enhances performance, especially for low-resource languages, and demonstrates transferability to other datasets.
Findings
Improved recognition accuracy for many languages.
Effective semi-supervised training with pseudo-labels.
Good transferability to LibriSpeech dataset.
Abstract
Semi-supervised learning through pseudo-labeling has become a staple of state-of-the-art monolingual speech recognition systems. In this work, we extend pseudo-labeling to massively multilingual speech recognition with 60 languages. We propose a simple pseudo-labeling recipe that works well even with low-resource languages: train a supervised multilingual model, fine-tune it with semi-supervised learning on a target language, generate pseudo-labels for that language, and train a final model using pseudo-labels for all languages, either from scratch or by fine-tuning. Experiments on the labeled Common Voice and unlabeled VoxPopuli datasets show that our recipe can yield a model with better performance for many languages that also transfers well to LibriSpeech.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
