PromptAttack: Prompt-based Attack for Language Models via Gradient   Search

Yundi Shi; Piji Li; Changchun Yin; Zhaoyang Han; Lu Zhou; Zhe Liu

arXiv:2209.01882·cs.CL·September 7, 2022

PromptAttack: Prompt-based Attack for Language Models via Gradient Search

Yundi Shi, Piji Li, Changchun Yin, Zhaoyang Han, Lu Zhou, Zhe Liu

PDF

Open Access

TL;DR

PromptAttack is a novel method that constructs malicious prompts to test and reveal security vulnerabilities in prompt-based language models, demonstrating effectiveness across multiple datasets and models.

Contribution

The paper introduces PromptAttack, a new approach for generating adversarial prompts to evaluate security risks in prompt learning methods for PLMs.

Findings

01

PromptAttack successfully causes misclassification in PLMs.

02

The method is effective across different datasets and models.

03

It is applicable in few-shot learning scenarios.

Abstract

As the pre-trained language models (PLMs) continue to grow, so do the hardware and data requirements for fine-tuning PLMs. Therefore, the researchers have come up with a lighter method called \textit{Prompt Learning}. However, during the investigations, we observe that the prompt learning methods are vulnerable and can easily be attacked by some illegally constructed prompts, resulting in classification errors, and serious security problems for PLMs. Most of the current research ignores the security issue of prompt-based methods. Therefore, in this paper, we propose a malicious prompt template construction method (\textbf{PromptAttack}) to probe the security performance of PLMs. Several unfriendly template construction approaches are investigated to guide the model to misclassify the task. Extensive experiments on three datasets and three PLMs prove the effectiveness of our proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling