Non-Intrusive Automatic Speech Recognition Refinement: A Survey

Mohammad Reza Peyghan; Saman Soleimani Roudi; Saeedreza Zouashkiani; Sajjad Amini; Fatemeh Rajabi; Shahrokh Ghaemmaghami

arXiv:2508.07285·eess.AS·May 20, 2026

Non-Intrusive Automatic Speech Recognition Refinement: A Survey

Mohammad Reza Peyghan, Saman Soleimani Roudi, Saeedreza Zouashkiani, Sajjad Amini, Fatemeh Rajabi, Shahrokh Ghaemmaghami

PDF

TL;DR

This survey reviews non-intrusive methods for refining automatic speech recognition systems, categorizing approaches, discussing evaluation metrics, and highlighting future research directions.

Contribution

It provides a comprehensive classification and analysis of non-intrusive ASR refinement techniques, along with evaluation standards and research gaps.

Findings

01

Five main classes of refinement methods identified: fusion, re-scoring, correction, distillation, training adjustment.

02

Evaluation datasets and metrics are reviewed to standardize comparison of methods.

03

Open research gaps and future directions are proposed for improving ASR refinement.

Abstract

Automatic Speech Recognition (ASR) is an integral component of modern technology, powering applications such as voice-activated assistants, transcription services, and accessibility tools. Yet ASR systems continue to struggle with the inherent variability of human speech, such as accents, dialects, and speaking styles, as well as environmental interference, including background noise. Moreover, domain-specific conversations often employ specialized terminology, which can exacerbate transcription errors. These shortcomings not only degrade raw ASR accuracy but also propagate mistakes through subsequent natural language processing pipelines. Because redesigning an ASR model is costly and time-consuming, non-intrusive refinement techniques that leave the model's architecture intact have become increasingly popular. In this survey, we review current non-intrusive refinement approaches and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Speech and Audio Processing