Exploring the Universal Vulnerability of Prompt-based Learning Paradigm

Lei Xu; Yangyi Chen; Ganqu Cui; Hongcheng Gao; Zhiyuan Liu

arXiv:2204.05239·cs.CL·April 12, 2022·1 cites

Exploring the Universal Vulnerability of Prompt-based Learning Paradigm

Lei Xu, Yangyi Chen, Ganqu Cui, Hongcheng Gao, Zhiyuan Liu

PDF

Open Access 1 Repo

TL;DR

This paper reveals that prompt-based learning models are universally vulnerable to adversarial and backdoor triggers, which can severely impair their performance across various tasks, highlighting a significant security concern.

Contribution

The study demonstrates the universal vulnerability of prompt-based models to triggers and proposes a potential mitigation approach, advancing understanding of model robustness.

Findings

01

Triggers can control or degrade model performance

02

Adversarial triggers transfer across models

03

Fine-tuning models are less vulnerable

Abstract

Prompt-based learning paradigm bridges the gap between pre-training and fine-tuning, and works effectively under the few-shot setting. However, we find that this learning paradigm inherits the vulnerability from the pre-training stage, where model predictions can be misled by inserting certain triggers into the text. In this paper, we explore this universal vulnerability by either injecting backdoor triggers or searching for adversarial triggers on pre-trained language models using only plain text. In both scenarios, we demonstrate that our triggers can totally control or severely decrease the performance of prompt-based models fine-tuned on arbitrary downstream tasks, reflecting the universal vulnerability of the prompt-based learning paradigm. Further experiments show that adversarial triggers have good transferability among language models. We also find conventional fine-tuning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leix28/prompt-universal-vulnerability
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning