An approach to measuring the performance of Automatic Speech Recognition (ASR) models in the context of Large Language Model (LLM) powered applications

Sujith Pulikodan; Sahapthan K; Prasanta Kumar Ghosh; Visruth Sanka; Nihar Desai

arXiv:2507.16456·eess.AS·July 23, 2025

An approach to measuring the performance of Automatic Speech Recognition (ASR) models in the context of Large Language Model (LLM) powered applications

Sujith Pulikodan, Sahapthan K, Prasanta Kumar Ghosh, Visruth Sanka, Nihar Desai

PDF

Open Access

TL;DR

This paper explores how to evaluate ASR models specifically for applications powered by Large Language Models, emphasizing the importance of error types and correction capabilities in downstream tasks.

Contribution

It introduces a new evaluation measure for ASR performance tailored to LLM-based applications, considering the correction abilities of LLMs.

Findings

01

LLMs can effectively correct certain ASR errors

02

Traditional WER may not fully capture ASR performance in LLM contexts

03

Proposed measure better predicts downstream task success

Abstract

Automatic Speech Recognition (ASR) plays a crucial role in human-machine interaction and serves as an interface for a wide range of applications. Traditionally, ASR performance has been evaluated using Word Error Rate (WER), a metric that quantifies the number of insertions, deletions, and substitutions in the generated transcriptions. However, with the increasing adoption of large and powerful Large Language Models (LLMs) as the core processing component in various applications, the significance of different types of ASR errors in downstream tasks warrants further exploration. In this work, we analyze the capabilities of LLMs to correct errors introduced by ASRs and propose a new measure to evaluate ASR performance for LLM-powered applications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis