Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models

Zirui Song; Qian Jiang; Mingxuan Cui; Mingzhe Li; Lang Gao; Zeyu Zhang; Zixiang Xu; Yanbo Wang; Chenxi Wang; Guangxian Ouyang; Zhenhao Chen; Xiuying Chen

arXiv:2505.15406·cs.SD·May 22, 2025

Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models

Zirui Song, Qian Jiang, Mingxuan Cui, Mingzhe Li, Lang Gao, Zeyu Zhang, Zixiang Xu, Yanbo Wang, Chenxi Wang, Guangxian Ouyang, Zhenhao Chen, Xiuying Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces AJailBench, a comprehensive benchmark for evaluating jailbreak vulnerabilities in Large Audio Language Models, highlighting their susceptibility to adversarial audio prompts and proposing methods to generate more effective attacks.

Contribution

The paper presents the first systematic benchmark for LAM jailbreak evaluation and introduces a novel adversarial audio generation toolkit to improve attack realism and effectiveness.

Findings

01

None of the evaluated LAMs are consistently robust against attacks.

02

Small, semantically preserved perturbations can significantly compromise model safety.

03

The proposed adversarial toolkit enhances attack effectiveness by optimizing subtle distortions.

Abstract

The rise of Large Audio Language Models (LAMs) brings both potential and risks, as their audio outputs may contain harmful or unethical content. However, current research lacks a systematic, quantitative evaluation of LAM safety especially against jailbreak attacks, which are challenging due to the temporal and semantic nature of speech. To bridge this gap, we introduce AJailBench, the first benchmark specifically designed to evaluate jailbreak vulnerabilities in LAMs. We begin by constructing AJailBench-Base, a dataset of 1,495 adversarial audio prompts spanning 10 policy-violating categories, converted from textual jailbreak attacks using realistic text to speech synthesis. Using this dataset, we evaluate several state-of-the-art LAMs and reveal that none exhibit consistent robustness across attacks. To further strengthen jailbreak testing and simulate more realistic attack…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mbzuai-nlp/audiojailbreak
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis

MethodsAttentive Walk-Aggregating Graph Neural Network