On the Robust Approximation of ASR Metrics

Abdul Waheed; Hanin Atwany; Rita Singh; Bhiksha Raj

arXiv:2502.12408·cs.CL·June 6, 2025

On the Robust Approximation of ASR Metrics

Abdul Waheed, Hanin Atwany, Rita Singh, Bhiksha Raj

PDF

Open Access

TL;DR

This paper introduces a label-free method that uses multimodal embeddings and a proxy model to accurately approximate ASR metrics like WER and CER across diverse datasets, reducing reliance on costly ground truth labels.

Contribution

The paper presents a novel approach combining multimodal embeddings and proxy models to estimate ASR metrics without ground truth labels, improving accuracy and generalization.

Findings

01

Achieves single-digit absolute difference in metric approximation across datasets

02

Outperforms recent baseline by over 50% in accuracy

03

Works effectively across standard and in-the-wild testing conditions

Abstract

Recent advances in speech foundation models are largely driven by scaling both model size and data, enabling them to perform a wide range of tasks, including speech recognition. Traditionally, ASR models are evaluated using metrics like Word Error Rate (WER) and Character Error Rate (CER), which depend on ground truth labels. As a result of limited labeled data from diverse domains and testing conditions, the true generalization capabilities of these models beyond standard benchmarks remain unclear. Moreover, labeling data is both costly and time-consuming. To address this, we propose a novel label-free approach for approximating ASR performance metrics, eliminating the need for ground truth labels. Our method utilizes multimodal embeddings in a unified space for speech and transcription representations, combined with a high-quality proxy model to compute proxy metrics. These features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems · Ultrasonics and Acoustic Wave Propagation