Voice Jailbreak Attacks Against GPT-4o

Xinyue Shen; Yixin Wu; Michael Backes; Yang Zhang

arXiv:2405.19103·cs.CR·May 30, 2024·2 cites

Voice Jailbreak Attacks Against GPT-4o

Xinyue Shen, Yixin Wu, Michael Backes, Yang Zhang

PDF

Open Access 1 Repo

TL;DR

This paper systematically evaluates jailbreak attacks on GPT-4o's voice mode, introduces VoiceJailbreak to humanize prompts, and significantly increases attack success rates, highlighting security challenges in multimodal language models.

Contribution

It presents the first systematic measurement of jailbreak attacks on GPT-4o's voice mode and proposes VoiceJailbreak, a novel humanized attack method that enhances jailbreak effectiveness.

Findings

01

GPT-4o resists direct text jailbreak prompts due to safeguards.

02

VoiceJailbreak increases attack success rate from 3.3% to 77.8%.

03

Fictional storytelling techniques improve attack effectiveness.

Abstract

Recently, the concept of artificial assistants has evolved from science fiction into real-world applications. GPT-4o, the newest multimodal large language model (MLLM) across audio, vision, and text, has further blurred the line between fiction and reality by enabling more natural human-computer interactions. However, the advent of GPT-4o's voice mode may also introduce a new attack surface. In this paper, we present the first systematic measurement of jailbreak attacks against the voice mode of GPT-4o. We show that GPT-4o demonstrates good resistance to forbidden questions and text jailbreak prompts when directly transferring them to voice mode. This resistance is primarily due to GPT-4o's internal safeguards and the difficulty of adapting text jailbreak prompts to voice mode. Inspired by GPT-4o's human-like behaviors, we propose VoiceJailbreak, a novel voice jailbreak attack that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

TrustAIRLab/VoiceJailbreakAttack
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Artificial Intelligence in Healthcare and Education · Cryptography and Data Security