MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence
Sonal Kumar, \v{S}imon Sedl\'a\v{c}ek, Vaibhavi Lokegaonkar, Fernando L\'opez, Wenyi Yu, Nishit Anand, Hyeonggon Ryu, Lichang Chen, Maxim Pli\v{c}ka, Miroslav Hlav\'a\v{c}ek, William Fineas Ellingwood, Sathvik Udupa, Siyuan Hou, Allison Ferner, Sara Barahona, Cecilia Bola\~nos

TL;DR
MMAU-Pro is a comprehensive benchmark designed to evaluate AI systems' holistic audio understanding across diverse skills, challenging models with complex reasoning tasks using real-world audio data.
Contribution
The paper introduces MMAU-Pro, the most extensive and rigorously curated benchmark for assessing general audio intelligence in AI, covering 49 skills and complex reasoning with real-world audio.
Findings
State-of-the-art models perform poorly, with accuracy near random in many categories.
Existing models struggle with complex, multi-hop reasoning tasks.
The benchmark reveals specific weaknesses in current audio AI systems.
Abstract
Audio comprehension-including speech, non-speech sounds, and music-is essential for achieving human-level intelligence. Consequently, AI agents must demonstrate holistic audio understanding to qualify as generally intelligent. However, evaluating auditory intelligence comprehensively remains challenging. To address this gap, we introduce MMAU-Pro, the most comprehensive and rigorously curated benchmark for assessing audio intelligence in AI systems. MMAU-Pro contains 5,305 instances, where each instance has one or more audios paired with human expert-generated question-answer pairs, spanning speech, sound, music, and their combinations. Unlike existing benchmarks, MMAU-Pro evaluates auditory intelligence across 49 unique skills and multiple complex dimensions, including long-form audio comprehension, spatial audio reasoning, multi-audio understanding, among others. All questions are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
