ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models
Bohan Li, Wenbin Huang, Yuhang Qiu, Yiwei Guo, Hankun Wang, Zhihan Li, Jing Peng, Ziyang Ma, Xie Chen, Kai Yu

TL;DR
This paper introduces ISA-Bench, a comprehensive benchmark for evaluating how sensitive large audio language models are to variations in instructions, revealing significant performance issues and proposing fine-tuning solutions with trade-offs.
Contribution
It presents ISA-Bench, the first systematic benchmark for instruction sensitivity in LALMs, and demonstrates the impact of instruction variations on model performance and robustness.
Findings
State-of-the-art LALMs show high instruction sensitivity.
Fine-tuning improves instruction-following but causes catastrophic forgetting.
Benchmark enables standardized assessment of instruction robustness.
Abstract
Large Audio Language Models (LALMs), which couple acoustic perception with large language models (LLMs) to extract and understand diverse information from audio, have attracted intense interest from both academic and industrial communities. However, existing LALMs are highly sensitive to how instructions are phrased, affecting both (i) instruction-following rates and (ii) task performance. Yet, no existing benchmarks offer a systematic and comprehensive evaluation of this sensitivity. We introduce ISA-Bench, a dynamic benchmark evaluating instruction sensitivity for LALMs along three axes: instruction description, output format, and task composition. We assess recent open-source and proprietary LALMs using ISA-Bench, profiling both compliance and accuracy under controlled instruction variations. Experimental results reveal that even state-of-the-art LALMs suffer significant instruction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
