Speech-Audio Compositional Attacks on Multimodal LLMs and Their Mitigation with SALMONN-Guard

Yudong Yang; Xuezhen Zhang; Zhifeng Han; Siyin Wang; Jimin Zhuang; Zengrui Jin; Jing Shao; Guangzhi Sun; Chao Zhang

arXiv:2511.10222·cs.SD·February 12, 2026

Speech-Audio Compositional Attacks on Multimodal LLMs and Their Mitigation with SALMONN-Guard

Yudong Yang, Xuezhen Zhang, Zhifeng Han, Siyin Wang, Jimin Zhuang, Zengrui Jin, Jing Shao, Guangzhi Sun, Chao Zhang

PDF

Open Access 1 Datasets

TL;DR

This paper introduces SACRED-Bench, a novel evaluation framework for testing the robustness of multimodal LLMs against complex speech-audio composition attacks, and proposes SALMONN-Guard, a guard model that significantly reduces attack success rates.

Contribution

The paper presents SACRED-Bench for black-box audio attack evaluation and introduces SALMONN-Guard, the first safety guard model that jointly inspects speech, audio, and text.

Findings

01

Gemini 2.5 Pro has a 66% attack success rate without defenses.

02

SALMONN-Guard reduces attack success to 20%.

03

Complex audio compositions pose significant safety challenges for LLMs.

Abstract

Recent progress in LLMs has enabled understanding of audio signals, but has also exposed new safety risks arising from complex audio inputs that are inadequately handled by current safeguards. We introduce SACRED-Bench (Speech-Audio Composition for RED-teaming) to evaluate the robustness of LLMs under complex audio-based attacks. Unlike existing perturbation-based methods that rely on noise optimization or white-box access, SACRED-Bench exploits speech-audio composition to enable effective black-box attacks. SACRED-Bench adopts three composition mechanisms: (a) overlap of harmful and benign speech, (b) mixture of benign speech with harmful non-speech audio, and (c) multi-speaker dialogue. These mechanisms focus on evaluating safety in settings where benign and harmful intents co-occur within a single auditory scene. Moreover, questions in SACRED-Bench are designed to implicitly refer to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

tsinghua-ee/SACRED-Bench
dataset· 242 dl
242 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · User Authentication and Security Systems · Music and Audio Processing