ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark

He Wang; Linhan Ma; Dake Guo; Xiong Wang; Lei Xie; Jin Xu; Junyang Lin

arXiv:2507.05727·eess.AS·August 7, 2025

ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark

He Wang, Linhan Ma, Dake Guo, Xiong Wang, Lei Xie, Jin Xu, Junyang Lin

PDF

Open Access 1 Datasets

TL;DR

This paper introduces ContextASR-Bench, a large-scale benchmark for evaluating the linguistic and contextual capabilities of speech recognition systems across diverse domains, highlighting the superior performance of Large Audio Language Models over traditional models.

Contribution

The paper presents a new comprehensive benchmark dataset and evaluation framework for assessing contextual and linguistic understanding in ASR systems, addressing a gap in existing benchmarks.

Findings

01

LALMs outperform conventional ASR models significantly.

02

The benchmark includes over 40,000 entries with 300,000+ named entities.

03

Evaluation reveals room for further improvements in context-aware ASR.

Abstract

Automatic Speech Recognition (ASR) has been extensively investigated, yet prior benchmarks have largely focused on assessing the acoustic robustness of ASR models, leaving evaluations of their linguistic capabilities relatively underexplored. This largely stems from the limited parameter sizes and training corpora of conventional ASR models, leaving them with insufficient world knowledge, which is crucial for accurately recognizing named entities across diverse domains. For instance, drug and treatment names in medicine or specialized technical terms in engineering. Recent breakthroughs in Large Language Models (LLMs) and corresponding Large Audio Language Models (LALMs) have markedly enhanced the visibility of advanced context modeling and general artificial intelligence capabilities. Leveraging LLMs, we envision a unified system capable of robust speech recognition across diverse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

MrSupW/ContextASR-Bench
dataset· 1.3k dl
1.3k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques