TL;DR
This paper introduces a novel backdoor attack method on pre-trained NLP models that maps trigger inputs directly to specific output representations, enabling widespread and task-agnostic malicious influence.
Contribution
It presents a new backdoor attack approach that does not rely on prior task knowledge and introduces metrics for effectiveness and stealthiness in NLP models.
Findings
The method is effective across various NLP tasks and models.
It can be implemented without prior knowledge of downstream tasks.
The attack demonstrates high success and stealthiness in experiments.
Abstract
Pre-trained general-purpose language models have been a dominating component in enabling real-world natural language processing (NLP) applications. However, a pre-trained model with backdoor can be a severe threat to the applications. Most existing backdoor attacks in NLP are conducted in the fine-tuning phase by introducing malicious triggers in the targeted class, thus relying greatly on the prior knowledge of the fine-tuning task. In this paper, we propose a new approach to map the inputs containing triggers directly to a predefined output representation of the pre-trained NLP models, e.g., a predefined output representation for the classification token in BERT, instead of a target label. It can thus introduce backdoor to a wide range of downstream tasks without any prior knowledge. Additionally, in light of the unique properties of triggers in NLP, we propose two new metrics to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · SentencePiece · Refunds@Expedia|||How do I get a full refund from Expedia? · Residual Connection · Layer Normalization · Dense Connections · Adam
