RAS: a Reliability Oriented Metric for Automatic Speech Recognition
Wenbin Huang, Yuhang Qiu, Bohan Li, Yiwei Guo, Jing Peng, Hankun Wang, Xie Chen, Kai Yu

TL;DR
This paper introduces RAS, a new reliability-oriented metric for evaluating and improving automatic speech recognition systems by enabling abstention from uncertain segments, leading to more trustworthy transcriptions.
Contribution
The paper proposes RAS, a novel metric for ASR reliability, and develops an abstention-aware model trained with supervised bootstrapping and reinforcement learning.
Findings
RAS improves transcription reliability significantly.
The abstention-aware model maintains accuracy while reducing errors.
Human preference calibrates the trade-off parameter effectively.
Abstract
Automatic speech recognition systems often produce confident yet incorrect transcriptions under noisy or ambiguous conditions, which can be misleading for both users and downstream applications. Standard evaluation based on Word Error Rate focuses solely on accuracy and fails to capture transcription reliability. We introduce an abstention-aware transcription framework that enables ASR models to explicitly abstain from uncertain segments. To evaluate reliability under abstention, we propose RAS, a reliability-oriented metric that balances transcription informativeness and error aversion, with its trade-off parameter calibrated by human preference. We then train an abstention-aware ASR model through supervised bootstrapping followed by reinforcement learning. Our experiments demonstrate substantial improvements in transcription reliability while maintaining competitive accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
