Goal-guided Generative Prompt Injection Attack on Large Language Models

Chong Zhang; Mingyu Jin; Qinkai Yu; Chengzhi Liu; Haochen Xue; Xiaobo; Jin

arXiv:2404.07234·cs.CR·November 12, 2024·2 cites

Goal-guided Generative Prompt Injection Attack on Large Language Models

Chong Zhang, Mingyu Jin, Qinkai Yu, Chengzhi Liu, Haochen Xue, Xiaobo, Jin

PDF

Open Access

TL;DR

This paper introduces a goal-guided generative prompt injection attack on large language models, optimizing attack success by maximizing divergence measures, and demonstrates its effectiveness across multiple models and datasets.

Contribution

It redefines prompt injection attack goals using divergence measures and proposes a novel, query-free black-box attack method with proven effectiveness.

Findings

01

Effective attack on seven LLM models

02

High success rate across four datasets

03

Low computational cost for the attack

Abstract

Current large language models (LLMs) provide a strong foundation for large-scale user-oriented natural language tasks. A large number of users can easily inject adversarial text or instructions through the user interface, thus causing LLMs model security challenges. Although there is currently a large amount of research on prompt injection attacks, most of these black-box attacks use heuristic strategies. It is unclear how these heuristic strategies relate to the success rate of attacks and thus effectively improve model robustness. To solve this problem, we redefine the goal of the attack: to maximize the KL divergence between the conditional probabilities of the clean text and the adversarial text. Furthermore, we prove that maximizing the KL divergence is equivalent to maximizing the Mahalanobis distance between the embedded representation $x$ and $x^{'}$ of the clean text and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Natural Language Processing Techniques