ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models

Bohan Li; Wenbin Huang; Yuhang Qiu; Yiwei Guo; Hankun Wang; Zhihan Li; Jing Peng; Ziyang Ma; Xie Chen; Kai Yu

arXiv:2510.23558·cs.SD·October 28, 2025

ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models

Bohan Li, Wenbin Huang, Yuhang Qiu, Yiwei Guo, Hankun Wang, Zhihan Li, Jing Peng, Ziyang Ma, Xie Chen, Kai Yu

PDF

TL;DR

This paper introduces ISA-Bench, a comprehensive benchmark for evaluating how sensitive large audio language models are to variations in instructions, revealing significant performance issues and proposing fine-tuning solutions with trade-offs.

Contribution

It presents ISA-Bench, the first systematic benchmark for instruction sensitivity in LALMs, and demonstrates the impact of instruction variations on model performance and robustness.

Findings

01

State-of-the-art LALMs show high instruction sensitivity.

02

Fine-tuning improves instruction-following but causes catastrophic forgetting.

03

Benchmark enables standardized assessment of instruction robustness.

Abstract

Large Audio Language Models (LALMs), which couple acoustic perception with large language models (LLMs) to extract and understand diverse information from audio, have attracted intense interest from both academic and industrial communities. However, existing LALMs are highly sensitive to how instructions are phrased, affecting both (i) instruction-following rates and (ii) task performance. Yet, no existing benchmarks offer a systematic and comprehensive evaluation of this sensitivity. We introduce ISA-Bench, a dynamic benchmark evaluating instruction sensitivity for LALMs along three axes: instruction description, output format, and task composition. We assess recent open-source and proprietary LALMs using ISA-Bench, profiling both compliance and accuracy under controlled instruction variations. Experimental results reveal that even state-of-the-art LALMs suffer significant instruction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.