On Optimizing Multimodal Jailbreaks for Spoken Language Models

Aravind Krishnan; Karolina Sta\'nczak; Dietrich Klakow

arXiv:2603.19127·cs.LG·March 20, 2026

On Optimizing Multimodal Jailbreaks for Spoken Language Models

Aravind Krishnan, Karolina Sta\'nczak, Dietrich Klakow

PDF

Open Access

TL;DR

This paper introduces JAMA, a joint multimodal attack framework that significantly improves the effectiveness of jailbreaking spoken language models by perturbing both speech and text simultaneously, revealing safety vulnerabilities.

Contribution

We propose JAMA, the first gradient-based multimodal jailbreak method combining audio and text perturbations for spoken language models, demonstrating superior attack success rates.

Findings

01

JAMA outperforms unimodal attacks by 1.5x to 10x in success rate.

02

Sequential approximation speeds up the attack by 4x to 6x.

03

Unimodal safety measures are insufficient for robust SLMs.

Abstract

As Spoken Language Models (SLMs) integrate speech and text modalities, they inherit the safety vulnerabilities of their LLM backbone and an expanded attack surface. SLMs have been previously shown to be susceptible to jailbreaking, where adversarial prompts induce harmful responses. Yet existing attacks largely remain unimodal, optimizing either text or audio in isolation. We explore gradient-based multimodal jailbreaks by introducing JAMA (Joint Audio-text Multimodal Attack), a joint multimodal optimization framework combining Greedy Coordinate Gradient (GCG) for text and Projected Gradient Descent (PGD) for audio, to simultaneously perturb both modalities. Evaluations across four state-of-the-art SLMs and four audio types demonstrate that JAMA surpasses unimodal jailbreak rate by 1.5x to 10x. We analyze the operational dynamics of this joint attack and show that a sequential…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Speech Recognition and Synthesis