Mind the Gap! Static and Interactive Evaluations of Large Audio Models
Minzhi Li, William Barr Held, Michael J Ryan, Kunat Pipatanakul,, Potsawee Manakul, Hao Zhu, Diyi Yang

TL;DR
This paper introduces an interactive evaluation method for Large Audio Models (LAMs), analyzing user interactions and preferences to better align model development with user needs, revealing limited correlation of static benchmarks with real-world performance.
Contribution
It presents a novel interactive evaluation approach for LAMs, including user interaction data collection and analysis, highlighting the gap between static benchmarks and actual user preferences.
Findings
Static benchmarks show weak correlation with user preferences (max τ ≤ 0.33).
Combining multiple features modestly predicts interactive performance (R^2=0.30).
Only two datasets showed significant positive correlations with user preferences.
Abstract
As AI chatbots become ubiquitous, voice interaction presents a compelling way to enable rapid, high-bandwidth communication for both semantic and social signals. This has driven research into Large Audio Models (LAMs) to power voice-native experiences. However, aligning LAM development with user goals requires a clear understanding of user needs and preferences to establish reliable progress metrics. This study addresses these challenges by introducing an interactive approach to evaluate LAMs and collecting 7,500 LAM interactions from 484 participants. Through topic modeling of user queries, we identify primary use cases for audio interfaces. We then analyze user preference rankings and qualitative feedback to determine which models best align with user needs. Finally, we evaluate how static benchmarks predict interactive performance - our analysis reveals no individual benchmark…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMusic Technology and Sound Studies · Music and Audio Processing
MethodsALIGN
