Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning

Linyang Li; Demin Song; Xiaonan Li; Jiehang Zeng; Ruotian Ma; Xipeng; Qiu

arXiv:2108.13888·cs.CR·September 1, 2021·1 cites

Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning

Linyang Li, Demin Song, Xiaonan Li, Jiehang Zeng, Ruotian Ma, Xipeng, Qiu

PDF

Open Access

TL;DR

This paper introduces a novel layerwise weight poisoning attack on pre-trained models, creating more resilient backdoors that evade existing defenses, demonstrated through text classification experiments.

Contribution

It proposes a new layerwise poisoning strategy and a combinatorial trigger to enhance backdoor strength and stealth in pre-trained models.

Findings

01

Previous defenses fail against the new attack

02

The attack effectively plants deep backdoors

03

Method applicable to various models and tasks

Abstract

\textbf{P}re-\textbf{T}rained \textbf{M}odel\textbf{s} have been widely applied and recently proved vulnerable under backdoor attacks: the released pre-trained weights can be maliciously poisoned with certain triggers. When the triggers are activated, even the fine-tuned model will predict pre-defined labels, causing a security threat. These backdoors generated by the poisoning methods can be erased by changing hyper-parameters during fine-tuning or detected by finding the triggers. In this paper, we propose a stronger weight-poisoning attack method that introduces a layerwise weight poisoning strategy to plant deeper backdoors; we also introduce a combinatorial trigger that cannot be easily detected. The experiments on text classification tasks show that previous defense methods cannot resist our weight-poisoning method, which indicates that our method can be widely applied and may…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Anomaly Detection Techniques and Applications