AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models
Guangke Chen, Fu Song, Zhe Zhao, Xiaojun Jia, Yang Liu, Yanchen Qiao, Weizhe Zhang, Weiping Tu, Yuhong Yang, Bo Du

TL;DR
This paper introduces AUDIOJAILBREAK, a novel audio attack against large audio-language models that is asynchronous, universal, stealthy, and robust over-the-air, significantly advancing the effectiveness and practicality of jailbreak attacks.
Contribution
The paper proposes AUDIOJAILBREAK, a comprehensive audio jailbreak method with new features like asynchrony, universality, stealthiness, and over-the-air robustness, outperforming prior attacks.
Findings
AUDIOJAILBREAK effectively bypasses GPT-4o-Audio and Llama-Guard-3 safeguards.
It works under both strong and weak adversary scenarios.
The attack remains effective over-the-air with reverberation.
Abstract
Jailbreak attacks to Large audio-language models (LALMs) are studied recently, but they exclusively focused on the attack scenario where the adversary can fully manipulate user prompts (named strong adversary) and limited in effectiveness, applicability, and practicability. In this work, we first conduct an extensive evaluation showing that advanced text jailbreak attacks cannot be easily ported to end-to-end LALMs via text-to-speech (TTS) techniques. We then propose AUDIOJAILBREAK, a novel audio jailbreak attack, featuring (1) asynchrony: the jailbreak audios do not need to align with user prompts in the time axis by crafting suffixal jailbreak audios; (2) universality: a single jailbreak perturbation is effective for different prompts by incorporating multiple prompts into the perturbation generation; (3) stealthiness: the malicious intent of jailbreak audios is concealed by proposing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Speech Recognition and Synthesis · Digital Media Forensic Detection
MethodsALIGN
