Audio Jailbreak Attacks: Exposing Vulnerabilities in SpeechGPT in a White-Box Framework

Binhao Ma; Hanqing Guo; Zhengping Jay Luo; Rui Duan

arXiv:2505.18864·cs.CL·May 27, 2025

Audio Jailbreak Attacks: Exposing Vulnerabilities in SpeechGPT in a White-Box Framework

Binhao Ma, Hanqing Guo, Zhengping Jay Luo, Rui Duan

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel adversarial attack on SpeechGPT, exploiting speech tokenization to generate audio prompts that bypass safeguards, revealing significant vulnerabilities in voice-enabled multimodal models.

Contribution

The work presents a new token-level attack method for speech inputs in a white-box setting, demonstrating high success rates and exposing security weaknesses in SpeechGPT.

Findings

01

Achieves up to 89% attack success rate

02

Outperforms existing voice jailbreak methods

03

Reveals vulnerabilities in voice-enabled multimodal systems

Abstract

Recent advances in Multimodal Large Language Models (MLLMs) have significantly enhanced the naturalness and flexibility of human computer interaction by enabling seamless understanding across text, vision, and audio modalities. Among these, voice enabled models such as SpeechGPT have demonstrated considerable improvements in usability, offering expressive, and emotionally responsive interactions that foster deeper connections in real world communication scenarios. However, the use of voice introduces new security risks, as attackers can exploit the unique characteristics of spoken language, such as timing, pronunciation variability, and speech to text translation, to craft inputs that bypass defenses in ways not seen in text-based systems. Despite substantial research on text based jailbreaks, the voice modality remains largely underexplored in terms of both attack strategies and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Magic-Ma-tech/Audio-Jailbreak-Attacks
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis