The Ripple Effect: On Unforeseen Complications of Backdoor Attacks

Rui Zhang; Yun Shen; Hongwei Li; Wenbo Jiang; Hanxiao Chen; Yuan Zhang; Guowen Xu; Yang Zhang

arXiv:2505.11586·cs.CR·May 20, 2025

The Ripple Effect: On Unforeseen Complications of Backdoor Attacks

Rui Zhang, Yun Shen, Hongwei Li, Wenbo Jiang, Hanxiao Chen, Yuan Zhang, Guowen Xu, Yang Zhang

PDF

Open Access 1 Repo

TL;DR

This paper investigates the unintended consequences of backdoor attacks on pre-trained language models when adapted to various downstream tasks, revealing widespread complications and proposing a mitigation method using multi-task learning.

Contribution

It provides the first comprehensive quantification of backdoor complications and introduces a novel multi-task learning approach to mitigate these issues without prior task knowledge.

Findings

01

Backdoor complications are prevalent across multiple PTLMs and datasets.

02

Triggered samples' output distributions significantly differ from clean samples.

03

The proposed method effectively reduces complications while preserving attack efficacy.

Abstract

Recent research highlights concerns about the trustworthiness of third-party Pre-Trained Language Models (PTLMs) due to potential backdoor attacks. These backdoored PTLMs, however, are effective only for specific pre-defined downstream tasks. In reality, these PTLMs can be adapted to many other unrelated downstream tasks. Such adaptation may lead to unforeseen consequences in downstream model outputs, consequently raising user suspicion and compromising attack stealthiness. We refer to this phenomenon as backdoor complications. In this paper, we undertake the first comprehensive quantification of backdoor complications. Through extensive experiments using 4 prominent PTLMs and 16 text classification benchmark datasets, we demonstrate the widespread presence of backdoor complications in downstream models fine-tuned from backdoored PTLMs. The output distribution of triggered samples…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhangrui4041/backdoor_complications
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Advanced Malware Detection Techniques