Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study

Yi Liu; Gelei Deng; Zhengzi Xu; Yuekang Li; Yaowen Zheng; Ying Zhang,; Lida Zhao; Tianwei Zhang; Kailong Wang; Yang Liu

arXiv:2305.13860·cs.SE·March 12, 2024·102 cites

Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study

Yi Liu, Gelei Deng, Zhengzi Xu, Yuekang Li, Yaowen Zheng, Ying Zhang,, Lida Zhao, Tianwei Zhang, Kailong Wang, Yang Liu

PDF

Open Access 2 Repos 1 Datasets

TL;DR

This study empirically investigates how various prompt structures can bypass ChatGPT's content restrictions, analyzing prompt types, effectiveness, and model resilience across multiple scenarios.

Contribution

It classifies jailbreak prompts into ten patterns and three categories, evaluates their effectiveness on ChatGPT versions 3.5 and 4.0, and assesses model resistance.

Findings

01

Jailbreak prompts can bypass restrictions in 40 scenarios.

02

Ten distinct prompt patterns and three categories identified.

03

ChatGPT 4.0 shows some increased resilience.

Abstract

Large Language Models (LLMs), like ChatGPT, have demonstrated vast potential but also introduce challenges related to content constraints and potential misuse. Our study investigates three key research questions: (1) the number of different prompt types that can jailbreak LLMs, (2) the effectiveness of jailbreak prompts in circumventing LLM constraints, and (3) the resilience of ChatGPT against these jailbreak prompts. Initially, we develop a classification model to analyze the distribution of existing prompts, identifying ten distinct patterns and three categories of jailbreak prompts. Subsequently, we assess the jailbreak capability of prompts with ChatGPT versions 3.5 and 4.0, utilizing a dataset of 3,120 jailbreak questions across eight prohibited scenarios. Finally, we evaluate the resistance of ChatGPT against jailbreak prompts, finding that the prompts can consistently evade the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

zjunlp/SafeEdit
dataset· 44 dl
44 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Artificial Intelligence in Healthcare and Education