AQUA-Bench: Beyond Finding Answers to Knowing When There Are None in Audio Question Answering

Chun-Yi Kuan; Hung-yi Lee

arXiv:2601.12248·eess.AS·May 12, 2026

AQUA-Bench: Beyond Finding Answers to Knowing When There Are None in Audio Question Answering

Chun-Yi Kuan, Hung-yi Lee

PDF

TL;DR

AQUA-Bench introduces a comprehensive benchmark for assessing the ability of audio question answering models to detect unanswerable questions, addressing a critical gap in current evaluation methods.

Contribution

It systematically evaluates unanswerability in audio question answering through three scenarios, promoting more robust and trustworthy audio-language systems.

Findings

01

Models perform well on answerable questions but struggle with unanswerable cases.

02

Current benchmarks overlook the challenge of unanswerable questions in audio QA.

03

AQUA-Bench provides a rigorous measure of model reliability in real-world settings.

Abstract

Recent advances in audio-aware large language models have shown strong performance on audio question answering. However, existing benchmarks mainly cover answerable questions and overlook the challenge of unanswerable ones, where no reliable answer can be inferred from the audio. Such cases are common in real-world settings, where questions may be misleading, ill-posed, or incompatible with the information. To address this gap, we present AQUA-Bench, a benchmark for Audio Question Unanswerability Assessment. It systematically evaluates three scenarios: Absent Answer Detection (the correct option is missing), Incompatible Answer Set Detection (choices are categorically mismatched with the question), and Incompatible Audio Question Detection (the question is irrelevant or lacks sufficient grounding in the audio). By assessing these cases, AQUA-Bench offers a rigorous measure of model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.