ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark
He Wang, Linhan Ma, Dake Guo, Xiong Wang, Lei Xie, Jin Xu, Junyang Lin

TL;DR
This paper introduces ContextASR-Bench, a large-scale benchmark for evaluating the linguistic and contextual capabilities of speech recognition systems across diverse domains, highlighting the superior performance of Large Audio Language Models over traditional models.
Contribution
The paper presents a new comprehensive benchmark dataset and evaluation framework for assessing contextual and linguistic understanding in ASR systems, addressing a gap in existing benchmarks.
Findings
LALMs outperform conventional ASR models significantly.
The benchmark includes over 40,000 entries with 300,000+ named entities.
Evaluation reveals room for further improvements in context-aware ASR.
Abstract
Automatic Speech Recognition (ASR) has been extensively investigated, yet prior benchmarks have largely focused on assessing the acoustic robustness of ASR models, leaving evaluations of their linguistic capabilities relatively underexplored. This largely stems from the limited parameter sizes and training corpora of conventional ASR models, leaving them with insufficient world knowledge, which is crucial for accurately recognizing named entities across diverse domains. For instance, drug and treatment names in medicine or specialized technical terms in engineering. Recent breakthroughs in Large Language Models (LLMs) and corresponding Large Audio Language Models (LALMs) have markedly enhanced the visibility of advanced context modeling and general artificial intelligence capabilities. Leveraging LLMs, we envision a unified system capable of robust speech recognition across diverse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques
