Training-Free Intelligibility-Guided Observation Addition for Noisy ASR
Haoyang Li, Changsong Liu, Wei Rao, Hao Shi, Sakriani Sakti, Eng Siong Chng

TL;DR
This paper introduces a training-free, intelligibility-guided observation addition method for noisy ASR that improves recognition accuracy by fusing noisy and enhanced speech based on intelligibility estimates derived directly from the ASR backend.
Contribution
It proposes a novel, training-free observation addition technique guided by intelligibility estimates from the ASR, enhancing robustness without additional model training.
Findings
Significant improvements over existing OA baselines across multiple datasets.
Robustness demonstrated through diverse SE-ASR combinations.
Validation of intelligibility-guided switching and frame-level OA approaches.
Abstract
Automatic speech recognition (ASR) degrades severely in noisy environments. Although speech enhancement (SE) front-ends effectively suppress background noise, they often introduce artifacts that harm recognition. Observation addition (OA) addressed this issue by fusing noisy and SE enhanced speech, improving recognition without modifying the parameters of the SE or ASR models. This paper proposes an intelligibility-guided OA method, where fusion weights are derived from intelligibility estimates obtained directly from the backend ASR. Unlike prior OA methods based on trained neural predictors, the proposed method is training-free, reducing complexity and enhances generalization. Extensive experiments across diverse SE-ASR combinations and datasets demonstrate strong robustness and improvements over existing OA baselines. Additional analyses of intelligibility-guided switching-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders
