Speech-Audio Compositional Attacks on Multimodal LLMs and Their Mitigation with SALMONN-Guard
Yudong Yang, Xuezhen Zhang, Zhifeng Han, Siyin Wang, Jimin Zhuang, Zengrui Jin, Jing Shao, Guangzhi Sun, Chao Zhang

TL;DR
This paper introduces SACRED-Bench, a novel evaluation framework for testing the robustness of multimodal LLMs against complex speech-audio composition attacks, and proposes SALMONN-Guard, a guard model that significantly reduces attack success rates.
Contribution
The paper presents SACRED-Bench for black-box audio attack evaluation and introduces SALMONN-Guard, the first safety guard model that jointly inspects speech, audio, and text.
Findings
Gemini 2.5 Pro has a 66% attack success rate without defenses.
SALMONN-Guard reduces attack success to 20%.
Complex audio compositions pose significant safety challenges for LLMs.
Abstract
Recent progress in LLMs has enabled understanding of audio signals, but has also exposed new safety risks arising from complex audio inputs that are inadequately handled by current safeguards. We introduce SACRED-Bench (Speech-Audio Composition for RED-teaming) to evaluate the robustness of LLMs under complex audio-based attacks. Unlike existing perturbation-based methods that rely on noise optimization or white-box access, SACRED-Bench exploits speech-audio composition to enable effective black-box attacks. SACRED-Bench adopts three composition mechanisms: (a) overlap of harmful and benign speech, (b) mixture of benign speech with harmful non-speech audio, and (c) multi-speaker dialogue. These mechanisms focus on evaluating safety in settings where benign and harmful intents co-occur within a single auditory scene. Moreover, questions in SACRED-Bench are designed to implicitly refer to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · User Authentication and Security Systems · Music and Audio Processing
