AudioRAG: A Challenging Benchmark for Audio Reasoning and Information Retrieval

Jingru Lin; Chen Zhang; Tianrui Wang; Haizhou Li

arXiv:2602.10656·eess.AS·February 12, 2026

AudioRAG: A Challenging Benchmark for Audio Reasoning and Information Retrieval

Jingru Lin, Chen Zhang, Tianrui Wang, Haizhou Li

PDF

Open Access 1 Video

TL;DR

AudioRAG introduces a new benchmark for evaluating audio reasoning combined with external information retrieval, highlighting current model limitations and proposing an integrated retrieval-augmented approach for improvement.

Contribution

The paper presents AudioRAG, a novel benchmark for audio reasoning with external knowledge grounding, and proposes an agentic pipeline to enhance model performance.

Findings

01

State-of-the-art LALMs struggle with the benchmark questions.

02

AudioRAG reveals gaps in current audio reasoning capabilities.

03

Retrieval-augmented methods improve reasoning accuracy.

Abstract

Due to recent advancements in Large Audio-Language Models (LALMs) that demonstrate remarkable performance across a range of sound-, speech- and music-related tasks, there is a growing interest in proposing benchmarks to assess these models. Existing benchmarks generally focus only on reasoning with internal knowledge, neglecting real-world scenarios that require external information grounding. To bridge this gap, we introduce AudioRAG, a novel benchmark designed to evaluate audio-based reasoning augmented by information retrieval in realistic web environments. This benchmark comprises both LLM-generated and manually curated question-answer pairs. Our evaluations reveal that even the state-of-the-art LALMs struggle to answer these questions. We therefore propose an agentic pipeline that integrates audio reasoning with retrieval-augmented generation, providing a stronger baseline for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

AudioRAG: A Challenging Benchmark for Audio Reasoning and Information Retrieval· underline

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Topic Modeling