Unified Semi-Supervised Pipeline for Automatic Speech Recognition

Nune Tadevosyan; Nikolay Karpov; Andrei Andrusenko; Vitaly Lavrukhin; Ante Jukic

arXiv:2506.07659·eess.AS·June 10, 2025·Interspeech

Unified Semi-Supervised Pipeline for Automatic Speech Recognition

Nune Tadevosyan, Nikolay Karpov, Andrei Andrusenko, Vitaly Lavrukhin, Ante Jukic

PDF

Open Access

TL;DR

This paper presents an open-source semi-supervised speech recognition framework that enables scalable dataset creation across languages and introduces a new pseudo-labeling algorithm, TopIPL, improving recognition accuracy in multiple languages.

Contribution

The work provides a complete, scalable semi-supervised training pipeline and a novel pseudo-labeling algorithm, TopIPL, for improved speech recognition across diverse languages.

Findings

01

TopIPL achieves up to 40% relative WER reduction in Portuguese.

02

Framework enables large-scale dataset creation using publicly available data.

03

Improvements observed in both low-resource and high-resource language settings.

Abstract

Automatic Speech Recognition has been a longstanding research area, with substantial efforts dedicated to integrating semi-supervised learning due to the scarcity of labeled datasets. However, most prior work has focused on improving learning algorithms using existing datasets, without providing a complete public framework for large-scale semi-supervised training across new datasets or languages. In this work, we introduce a fully open-source semi-supervised training framework encompassing the entire pipeline: from unlabeled data collection to pseudo-labeling and model training. Our approach enables scalable dataset creation for any language using publicly available speech data under Creative Commons licenses. We also propose a novel pseudo-labeling algorithm, TopIPL, and evaluate it in both low-resource (Portuguese, Armenian) and high-resource (Spanish) settings. Notably, TopIPL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Machine Learning and Data Classification