Unveiling the Safety of GPT-4o: An Empirical Study using Jailbreak   Attacks

Zonghao Ying; Aishan Liu; Xianglong Liu; Dacheng Tao

arXiv:2406.06302·cs.CR·July 4, 2024·3 cites

Unveiling the Safety of GPT-4o: An Empirical Study using Jailbreak Attacks

Zonghao Ying, Aishan Liu, Xianglong Liu, Dacheng Tao

PDF

Open Access 1 Repo

TL;DR

This paper rigorously evaluates GPT-4o's safety against multi-modal jailbreak attacks across text, speech, and image, revealing improved safety in text but new vulnerabilities in audio, highlighting the need for better safety measures.

Contribution

First comprehensive empirical study of GPT-4o's safety against jailbreak attacks across multiple modalities, providing critical insights into its robustness and vulnerabilities.

Findings

01

GPT-4o shows enhanced safety in text modality jailbreaks

02

Audio modality introduces new attack vectors

03

Existing black-box attack methods are largely ineffective

Abstract

The recent release of GPT-4o has garnered widespread attention due to its powerful general capabilities. While its impressive performance is widely acknowledged, its safety aspects have not been sufficiently explored. Given the potential societal impact of risky content generated by advanced generative AI such as GPT-4o, it is crucial to rigorously evaluate its safety. In response to this question, this paper for the first time conducts a rigorous evaluation of GPT-4o against jailbreak attacks. Specifically, this paper adopts a series of multi-modal and uni-modal jailbreak attacks on 4 commonly used benchmarks encompassing three modalities (ie, text, speech, and image), which involves the optimization of over 4,000 initial text queries and the analysis and statistical evaluation of nearly 8,000+ response on GPT-4o. Our extensive experiments reveal several novel observations: (1) In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ny1024/jailbreak_gpt4o
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Academic integrity and plagiarism