Better Pseudo-labeling with Multi-ASR Fusion and Error Correction by SpeechLLM

Jeena Prakash; Blessingh Kumar; Kadri Hacioglu; Bidisha Sharma; Sindhuja Gopalan; Malolan Chetlur; Shankar Venkatesan; Andreas Stolcke

arXiv:2506.11089·eess.AS·October 6, 2025

Better Pseudo-labeling with Multi-ASR Fusion and Error Correction by SpeechLLM

Jeena Prakash, Blessingh Kumar, Kadri Hacioglu, Bidisha Sharma, Sindhuja Gopalan, Malolan Chetlur, Shankar Venkatesan, Andreas Stolcke

PDF

TL;DR

This paper introduces a unified multi-ASR framework utilizing large language models for postprocessing, significantly improving pseudo-label quality and semi-supervised ASR performance over traditional multi-stage methods.

Contribution

It proposes a novel prompt-driven multi-ASR fusion approach with LLM-based error correction, replacing traditional voting and arbitration methods.

Findings

01

Significant transcription accuracy improvements over traditional methods.

02

Enhanced semi-supervised ASR performance using LLM-generated pseudo-labels.

03

Effective error correction with speechLLM in pseudo-labeling pipeline.

Abstract

Automatic speech recognition (ASR) models rely on high-quality transcribed data for effective training. Generating pseudo-labels for large unlabeled audio datasets often relies on complex pipelines that combine multiple ASR outputs through multi-stage processing, leading to error propagation, information loss and disjoint optimization. We propose a unified multi-ASR prompt-driven framework using postprocessing by either textual or speech-based large language models (LLMs), replacing voting or other arbitration logic for reconciling the ensemble outputs. We perform a comparative study of multiple architectures with and without LLMs, showing significant improvements in transcription accuracy compared to traditional methods. Furthermore, we use the pseudo-labels generated by the various approaches to train semi-supervised ASR models for different datasets, again showing improved…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.