"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models

Isha Gupta; David Khachaturov; Robert Mullins

arXiv:2502.00718·cs.LG·July 11, 2025

"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models

Isha Gupta, David Khachaturov, Robert Mullins

PDF

Open Access

TL;DR

This paper uncovers universal, robust audio adversarial attacks on Audio-Language Models, revealing their vulnerability to imperceptible perturbations that encode toxic speech, and discusses implications for improving model defenses.

Contribution

It introduces the first universal audio jailbreaks that generalize across prompts and samples, highlighting new vulnerabilities in Audio-Language Models and providing insights for defense strategies.

Findings

01

Universal audio adversarial perturbations exist.

02

Perturbations encode toxic speech imperceptibly.

03

Attacks remain effective in real-world conditions.

Abstract

The rise of multimodal large language models has introduced innovative human-machine interaction paradigms but also significant challenges in machine learning safety. Audio-Language Models (ALMs) are especially relevant due to the intuitive nature of spoken communication, yet little is known about their failure modes. This paper explores audio jailbreaks targeting ALMs, focusing on their ability to bypass alignment mechanisms. We construct adversarial perturbations that generalize across prompts, tasks, and even base audio samples, demonstrating the first universal jailbreaks in the audio modality, and show that these remain effective in simulated real-world conditions. Beyond demonstrating attack feasibility, we analyze how ALMs interpret these audio adversarial examples and reveal them to encode imperceptible first-person toxic speech - suggesting that the most effective perturbations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLaw in Society and Culture

MethodsBalanced Selection