AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models

Guangke Chen; Fu Song; Zhe Zhao; Xiaojun Jia; Yang Liu; Yanchen Qiao; Weizhe Zhang; Weiping Tu; Yuhong Yang; Bo Du

arXiv:2505.14103·cs.CR·February 4, 2026

AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models

Guangke Chen, Fu Song, Zhe Zhao, Xiaojun Jia, Yang Liu, Yanchen Qiao, Weizhe Zhang, Weiping Tu, Yuhong Yang, Bo Du

PDF

Open Access

TL;DR

This paper introduces AUDIOJAILBREAK, a novel audio attack against large audio-language models that is asynchronous, universal, stealthy, and robust over-the-air, significantly advancing the effectiveness and practicality of jailbreak attacks.

Contribution

The paper proposes AUDIOJAILBREAK, a comprehensive audio jailbreak method with new features like asynchrony, universality, stealthiness, and over-the-air robustness, outperforming prior attacks.

Findings

01

AUDIOJAILBREAK effectively bypasses GPT-4o-Audio and Llama-Guard-3 safeguards.

02

It works under both strong and weak adversary scenarios.

03

The attack remains effective over-the-air with reverberation.

Abstract

Jailbreak attacks to Large audio-language models (LALMs) are studied recently, but they exclusively focused on the attack scenario where the adversary can fully manipulate user prompts (named strong adversary) and limited in effectiveness, applicability, and practicability. In this work, we first conduct an extensive evaluation showing that advanced text jailbreak attacks cannot be easily ported to end-to-end LALMs via text-to-speech (TTS) techniques. We then propose AUDIOJAILBREAK, a novel audio jailbreak attack, featuring (1) asynchrony: the jailbreak audios do not need to align with user prompts in the time axis by crafting suffixal jailbreak audios; (2) universality: a single jailbreak perturbation is effective for different prompts by incorporating multiple prompts into the perturbation generation; (3) stealthiness: the malicious intent of jailbreak audios is concealed by proposing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Speech Recognition and Synthesis · Digital Media Forensic Detection

MethodsALIGN