ReHear: Iterative Pseudo-Label Refinement for Semi-Supervised Speech Recognition via Audio Large Language Models

Zefang Liu; Chenyang Zhu; Sangwoo Cho; Shi-Xiong Zhang

arXiv:2602.18721·cs.CL·February 24, 2026

ReHear: Iterative Pseudo-Label Refinement for Semi-Supervised Speech Recognition via Audio Large Language Models

Zefang Liu, Chenyang Zhu, Sangwoo Cho, Shi-Xiong Zhang

PDF

Open Access

TL;DR

ReHear introduces an iterative pseudo-label refinement framework that leverages an audio-aware large language model to improve semi-supervised speech recognition by reducing errors and confirmation bias.

Contribution

The paper presents ReHear, a novel method integrating an instruction-tuned, audio-aware LLM into semi-supervised ASR to refine pseudo-labels iteratively, enhancing recognition accuracy.

Findings

01

ReHear outperforms baseline models on multiple benchmarks.

02

The approach reduces error propagation in semi-supervised learning.

03

Refined pseudo-labels lead to more accurate ASR models.

Abstract

Semi-supervised learning in automatic speech recognition (ASR) typically relies on pseudo-labeling, which often suffers from confirmation bias and error accumulation due to noisy supervision. To address this limitation, we propose ReHear, a framework for iterative pseudo-label refinement that integrates an instruction-tuned, audio-aware large language model (LLM) into the self-training loop. Unlike conventional text-based correctors, our approach conditions the LLM on both the ASR hypothesis and the source audio, allowing it to recover phonetically accurate transcripts even from severe recognition errors. These refined pseudo-labels serve as high-fidelity targets for fine-tuning the ASR model in an iterative cycle. Experimental results across diverse benchmarks demonstrate that ReHear effectively mitigates error propagation, consistently outperforming both supervised and pseudo-labeling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing