The Ripple Effect: On Unforeseen Complications of Backdoor Attacks
Rui Zhang, Yun Shen, Hongwei Li, Wenbo Jiang, Hanxiao Chen, Yuan Zhang, Guowen Xu, Yang Zhang

TL;DR
This paper investigates the unintended consequences of backdoor attacks on pre-trained language models when adapted to various downstream tasks, revealing widespread complications and proposing a mitigation method using multi-task learning.
Contribution
It provides the first comprehensive quantification of backdoor complications and introduces a novel multi-task learning approach to mitigate these issues without prior task knowledge.
Findings
Backdoor complications are prevalent across multiple PTLMs and datasets.
Triggered samples' output distributions significantly differ from clean samples.
The proposed method effectively reduces complications while preserving attack efficacy.
Abstract
Recent research highlights concerns about the trustworthiness of third-party Pre-Trained Language Models (PTLMs) due to potential backdoor attacks. These backdoored PTLMs, however, are effective only for specific pre-defined downstream tasks. In reality, these PTLMs can be adapted to many other unrelated downstream tasks. Such adaptation may lead to unforeseen consequences in downstream model outputs, consequently raising user suspicion and compromising attack stealthiness. We refer to this phenomenon as backdoor complications. In this paper, we undertake the first comprehensive quantification of backdoor complications. Through extensive experiments using 4 prominent PTLMs and 16 text classification benchmark datasets, we demonstrate the widespread presence of backdoor complications in downstream models fine-tuned from backdoored PTLMs. The output distribution of triggered samples…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Advanced Malware Detection Techniques
