Automatic Jailbreaking of the Text-to-Image Generative AI Systems

Minseon Kim; Hyomin Lee; Boqing Gong; Huishuai Zhang; Sung Ju Hwang

arXiv:2405.16567·cs.AI·May 29, 2024·1 cites

Automatic Jailbreaking of the Text-to-Image Generative AI Systems

Minseon Kim, Hyomin Lee, Boqing Gong, Huishuai Zhang, Sung Ju Hwang

PDF

Open Access 1 Repo

TL;DR

This paper evaluates the safety of commercial text-to-image AI systems against naive prompts, introduces an automated jailbreaking pipeline to bypass safety guards, and finds current defenses are largely ineffective.

Contribution

It presents the first empirical analysis of T2I system safety, proposes a novel automated jailbreaking method, and assesses the limitations of existing defense strategies.

Findings

01

ChatGPT blocks 84% of naive prompt attacks

02

Our jailbreaking method achieves 76% success in bypassing safety guards

03

Existing defense strategies are largely ineffective against automated jailbreaking

Abstract

Recent AI systems have shown extremely powerful performance, even surpassing human performance, on various tasks such as information retrieval, language generation, and image generation based on large language models (LLMs). At the same time, there are diverse safety risks that can cause the generation of malicious contents by circumventing the alignment in LLMs, which are often referred to as jailbreaking. However, most of the previous works only focused on the text-based jailbreaking in LLMs, and the jailbreaking of the text-to-image (T2I) generation system has been relatively overlooked. In this paper, we first evaluate the safety of the commercial T2I generation systems, such as ChatGPT, Copilot, and Gemini, on copyright infringement with naive prompts. From this empirical study, we find that Copilot and Gemini block only 12% and 17% of the attacks with naive prompts, respectively,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Kim-Minseon/APGP
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital and Cyber Forensics · Artificial Intelligence in Law