Goal-Oriented Prompt Attack and Safety Evaluation for LLMs
Chengyuan Liu, Fubang Zhao, Lizhi Qing, Yangyang Kang, Changlong Sun,, Kun Kuang, Fei Wu

TL;DR
This paper introduces a new Chinese prompt attack dataset, CPAD, designed to evaluate and improve the safety of LLMs by inducing harmful outputs with high success rates, especially targeting Chinese LLMs.
Contribution
The paper presents a pipeline for constructing high-quality prompt attack samples and creates the first Chinese prompt attack dataset, CPAD, with detailed attack templates and evaluation metrics.
Findings
Prompt attack success rate reaches around 70% on GPT-3.5.
CPAD effectively evaluates LLM safety against prompt attacks.
The dataset is publicly available for research use.
Abstract
Large Language Models (LLMs) presents significant priority in text understanding and generation. However, LLMs suffer from the risk of generating harmful contents especially while being employed to applications. There are several black-box attack methods, such as Prompt Attack, which can change the behaviour of LLMs and induce LLMs to generate unexpected answers with harmful contents. Researchers are interested in Prompt Attack and Defense with LLMs, while there is no publicly available dataset with high successful attacking rate to evaluate the abilities of defending prompt attack. In this paper, we introduce a pipeline to construct high-quality prompt attack samples, along with a Chinese prompt attack dataset called CPAD. Our prompts aim to induce LLMs to generate unexpected outputs with several carefully designed prompt attack templates and widely concerned attacking contents.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · Attention Is All You Need · 15 Ways to Contact How can i speak to someone at Delta Airlines · Weight Decay · Linear Layer · Cosine Annealing · Dense Connections · Linear Warmup With Cosine Annealing · Adam
