Backdoor Pre-trained Models Can Transfer to All

Lujia Shen; Shouling Ji; Xuhong Zhang; Jinfeng Li; Jing Chen; Jie Shi,; Chengfang Fang; Jianwei Yin; Ting Wang

arXiv:2111.00197·cs.CL·November 2, 2021

Backdoor Pre-trained Models Can Transfer to All

Lujia Shen, Shouling Ji, Xuhong Zhang, Jinfeng Li, Jing Chen, Jie Shi,, Chengfang Fang, Jianwei Yin, Ting Wang

PDF

1 Repo

TL;DR

This paper introduces a novel backdoor attack method on pre-trained NLP models that maps trigger inputs directly to specific output representations, enabling widespread and task-agnostic malicious influence.

Contribution

It presents a new backdoor attack approach that does not rely on prior task knowledge and introduces metrics for effectiveness and stealthiness in NLP models.

Findings

01

The method is effective across various NLP tasks and models.

02

It can be implemented without prior knowledge of downstream tasks.

03

The attack demonstrates high success and stealthiness in experiments.

Abstract

Pre-trained general-purpose language models have been a dominating component in enabling real-world natural language processing (NLP) applications. However, a pre-trained model with backdoor can be a severe threat to the applications. Most existing backdoor attacks in NLP are conducted in the fine-tuning phase by introducing malicious triggers in the targeted class, thus relying greatly on the prior knowledge of the fine-tuning task. In this paper, we propose a new approach to map the inputs containing triggers directly to a predefined output representation of the pre-trained NLP models, e.g., a predefined output representation for the classification token in BERT, instead of a target label. It can thus introduce backdoor to a wide range of downstream tasks without any prior knowledge. Additionally, in light of the unique properties of triggers in NLP, we propose two new metrics to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

plasmashen/BackdoorPTM
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · SentencePiece · Refunds@Expedia|||How do I get a full refund from Expedia? · Residual Connection · Layer Normalization · Dense Connections · Adam