A Paradigm for Interpreting Metrics and Identifying Critical Errors in Automatic Speech Recognition

Thibault Ba\~neras-Roux; Mickael Rouvier; Jane Wottawa; Richard Dufour

arXiv:2605.03671·cs.CL·May 6, 2026

A Paradigm for Interpreting Metrics and Identifying Critical Errors in Automatic Speech Recognition

Thibault Ba\~neras-Roux, Mickael Rouvier, Jane Wottawa, Richard Dufour

PDF

TL;DR

This paper proposes a new paradigm that integrates existing metrics into a Minimum Edit Distance framework to better interpret errors in automatic speech recognition by aligning them with human perception.

Contribution

It introduces a novel approach that combines traditional metrics with human perception modeling to improve error interpretation in speech recognition evaluation.

Findings

01

The proposed paradigm aligns error severity with human perception.

02

It offers a more interpretable error measure than traditional WER and CER.

03

The approach facilitates studying the severity of transcription errors from a human perspective.

Abstract

The most commonly used metrics for evaluating automatic speech transcriptions, namely Word Error Rate (WER) and Character Error Rate (CER), have been heavily criticized for their poor correlation to human perception and their inability to take into account linguistic and semantic information. While metric-based embeddings, seeking to approximate human perception, have been proposed, their scores remain difficult to interpret, unlike WER and CER. In this article, we overcome this problem by proposing a paradigm that consists in incorporating a chosen metric into it in order to obtain an equivalent of the error rate: a Minimum Edit Distance (minED). This approach parallels transcription errors with their human perception, also allowing an original study of the severity of these errors from a human perspective.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.