Unified Semi-Supervised Pipeline for Automatic Speech Recognition
Nune Tadevosyan, Nikolay Karpov, Andrei Andrusenko, Vitaly Lavrukhin, Ante Jukic

TL;DR
This paper presents an open-source semi-supervised speech recognition framework that enables scalable dataset creation across languages and introduces a new pseudo-labeling algorithm, TopIPL, improving recognition accuracy in multiple languages.
Contribution
The work provides a complete, scalable semi-supervised training pipeline and a novel pseudo-labeling algorithm, TopIPL, for improved speech recognition across diverse languages.
Findings
TopIPL achieves up to 40% relative WER reduction in Portuguese.
Framework enables large-scale dataset creation using publicly available data.
Improvements observed in both low-resource and high-resource language settings.
Abstract
Automatic Speech Recognition has been a longstanding research area, with substantial efforts dedicated to integrating semi-supervised learning due to the scarcity of labeled datasets. However, most prior work has focused on improving learning algorithms using existing datasets, without providing a complete public framework for large-scale semi-supervised training across new datasets or languages. In this work, we introduce a fully open-source semi-supervised training framework encompassing the entire pipeline: from unlabeled data collection to pseudo-labeling and model training. Our approach enables scalable dataset creation for any language using publicly available speech data under Creative Commons licenses. We also propose a novel pseudo-labeling algorithm, TopIPL, and evaluate it in both low-resource (Portuguese, Armenian) and high-resource (Spanish) settings. Notably, TopIPL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Machine Learning and Data Classification
