Adopting Whisper for Confidence Estimation

Vaibhav Aggarwal; Shabari S Nair; Yash Verma; Yash Jogi

arXiv:2502.13446·eess.AS·February 20, 2025

Adopting Whisper for Confidence Estimation

Vaibhav Aggarwal, Shabari S Nair, Yash Verma, Yash Jogi

PDF

Open Access

TL;DR

This paper introduces a novel end-to-end method using the Whisper speech recognition model to generate word-level confidence scores, outperforming traditional lightweight CEMs especially on out-of-domain datasets.

Contribution

It presents a fine-tuning approach for Whisper models to produce confidence scores, demonstrating superior performance over existing CEMs across multiple datasets.

Findings

01

Fine-tuned Whisper models match or surpass CEM performance.

02

Large Whisper model outperforms CEM on all datasets.

03

Out-of-domain performance significantly improved.

Abstract

Recent research on word-level confidence estimation for speech recognition systems has primarily focused on lightweight models known as Confidence Estimation Modules (CEMs), which rely on hand-engineered features derived from Automatic Speech Recognition (ASR) outputs. In contrast, we propose a novel end-to-end approach that leverages the ASR model itself (Whisper) to generate word-level confidence scores. Specifically, we introduce a method in which the Whisper model is fine-tuned to produce scalar confidence scores given an audio input and its corresponding hypothesis transcript. Our experiments demonstrate that the fine-tuned Whisper-tiny model, comparable in size to a strong CEM baseline, achieves similar performance on the in-domain dataset and surpasses the CEM baseline on eight out-of-domain datasets, whereas the fine-tuned Whisper-large model consistently outperforms the CEM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning