MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence

Sonal Kumar; \v{S}imon Sedl\'a\v{c}ek; Vaibhavi Lokegaonkar; Fernando L\'opez; Wenyi Yu; Nishit Anand; Hyeonggon Ryu; Lichang Chen; Maxim Pli\v{c}ka; Miroslav Hlav\'a\v{c}ek; William Fineas Ellingwood; Sathvik Udupa; Siyuan Hou; Allison Ferner; Sara Barahona; Cecilia Bola\~nos; Satish Rahi; Laura Herrera-Alarc\'on; Satvik Dixit; Siddhi Patil; Soham Deshmukh; Lasha Koroshinadze; Yao Liu; Leibny Paola Garcia Perera; Eleni Zanou; Themos Stafylakis; Joon Son Chung; David Harwath; Chao Zhang; Dinesh Manocha; Alicia Lozano-Diez; Santosh Kesiraju; Sreyan Ghosh; Ramani Duraiswami

arXiv:2508.13992·eess.AS·August 20, 2025

MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence

Sonal Kumar, \v{S}imon Sedl\'a\v{c}ek, Vaibhavi Lokegaonkar, Fernando L\'opez, Wenyi Yu, Nishit Anand, Hyeonggon Ryu, Lichang Chen, Maxim Pli\v{c}ka, Miroslav Hlav\'a\v{c}ek, William Fineas Ellingwood, Sathvik Udupa, Siyuan Hou, Allison Ferner, Sara Barahona, Cecilia Bola\~nos

PDF

2 Datasets

TL;DR

MMAU-Pro is a comprehensive benchmark designed to evaluate AI systems' holistic audio understanding across diverse skills, challenging models with complex reasoning tasks using real-world audio data.

Contribution

The paper introduces MMAU-Pro, the most extensive and rigorously curated benchmark for assessing general audio intelligence in AI, covering 49 skills and complex reasoning with real-world audio.

Findings

01

State-of-the-art models perform poorly, with accuracy near random in many categories.

02

Existing models struggle with complex, multi-hop reasoning tasks.

03

The benchmark reveals specific weaknesses in current audio AI systems.

Abstract

Audio comprehension-including speech, non-speech sounds, and music-is essential for achieving human-level intelligence. Consequently, AI agents must demonstrate holistic audio understanding to qualify as generally intelligent. However, evaluating auditory intelligence comprehensively remains challenging. To address this gap, we introduce MMAU-Pro, the most comprehensive and rigorously curated benchmark for assessing audio intelligence in AI systems. MMAU-Pro contains 5,305 instances, where each instance has one or more audios paired with human expert-generated question-answer pairs, spanning speech, sound, music, and their combinations. Unlike existing benchmarks, MMAU-Pro evaluates auditory intelligence across 49 unique skills and multiple complex dimensions, including long-form audio comprehension, spatial audio reasoning, multi-audio understanding, among others. All questions are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.