A Sensitivity Analysis of Multi-Event Audio Grounding in Audio LLMs

Taehan Lee; Jaehan Jung; Hyukjun Lee

arXiv:2603.03855·cs.SD·March 5, 2026

A Sensitivity Analysis of Multi-Event Audio Grounding in Audio LLMs

Taehan Lee, Jaehan Jung, Hyukjun Lee

PDF

Open Access

TL;DR

This study evaluates how increasing complexity in audio scenes affects the accuracy and reliability of state-of-the-art Audio LLMs in event grounding, revealing significant challenges and areas for improvement.

Contribution

It provides the first large-scale, systematic analysis of multi-event audio grounding in Audio LLMs, highlighting their limitations and the impact of prompt design on performance.

Findings

01

Increasing event count reduces true-positive rate.

02

More events increase false-positive rate.

03

Prompt design significantly affects model trade-offs.

Abstract

Audio LLMs have shown a strong ability to understand audio samples, yet their reliability in complex acoustic scenes remains under-explored. Unlike prior work limited to small scale or less controlled query construction, we present a large-scale evaluation of event grounding and false alarms as auditory scene complexity increases. Using 71K AudioCapsV2 clips, we extract normalized (source, attribute) events and build two query types: present-event queries for ground-truth detection and absent-event queries to probe hallucinations, using similarity-filtered negative sampling in an audio-aligned text embedding space. We evaluate four SOTA Audio LLMs with 12 prompt variants over 500K yes/no queries per model. Across models, increasing event count consistently lowers true-positive rate and raises false-positive rate, while prompts induce a strong trade-off between the two. Our confidence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis