On Optimizing Multimodal Jailbreaks for Spoken Language Models
Aravind Krishnan, Karolina Sta\'nczak, Dietrich Klakow

TL;DR
This paper introduces JAMA, a joint multimodal attack framework that significantly improves the effectiveness of jailbreaking spoken language models by perturbing both speech and text simultaneously, revealing safety vulnerabilities.
Contribution
We propose JAMA, the first gradient-based multimodal jailbreak method combining audio and text perturbations for spoken language models, demonstrating superior attack success rates.
Findings
JAMA outperforms unimodal attacks by 1.5x to 10x in success rate.
Sequential approximation speeds up the attack by 4x to 6x.
Unimodal safety measures are insufficient for robust SLMs.
Abstract
As Spoken Language Models (SLMs) integrate speech and text modalities, they inherit the safety vulnerabilities of their LLM backbone and an expanded attack surface. SLMs have been previously shown to be susceptible to jailbreaking, where adversarial prompts induce harmful responses. Yet existing attacks largely remain unimodal, optimizing either text or audio in isolation. We explore gradient-based multimodal jailbreaks by introducing JAMA (Joint Audio-text Multimodal Attack), a joint multimodal optimization framework combining Greedy Coordinate Gradient (GCG) for text and Projected Gradient Descent (PGD) for audio, to simultaneously perturb both modalities. Evaluations across four state-of-the-art SLMs and four audio types demonstrate that JAMA surpasses unimodal jailbreak rate by 1.5x to 10x. We analyze the operational dynamics of this joint attack and show that a sequential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Speech Recognition and Synthesis
