Cut the Deadwood Out: Backdoor Purification via Guided Module Substitution
Yao Tong, Weijun Li, Xuanli He, Haolan Zhan, Qiongkai Xu

TL;DR
This paper introduces Guided Module Substitution (GMS), a retraining-free method for backdoor removal in NLP models that effectively balances utility and security by selectively replacing model modules guided by a proxy model.
Contribution
GMS is a novel, robust, and transferably applicable module substitution technique that outperforms existing defenses against backdoor attacks in NLP models.
Findings
GMS significantly outperforms baseline defenses against backdoor attacks.
GMS maintains robustness even with inaccurate data knowledge.
GMS is effective across different model architectures and attack types.
Abstract
Model NLP models are commonly trained (or fine-tuned) on datasets from untrusted platforms like HuggingFace, posing significant risks of data poisoning attacks. A practical yet underexplored challenge arises when such backdoors are discovered after model deployment, making retraining-required defenses less desirable due to computational costs and data constraints. In this work, we propose Guided Module Substitution (GMS), an effective retraining-free method based on guided merging of the victim model with just a single proxy model. Unlike prior ad-hoc merging defenses, GMS uses a guided trade-off signal between utility and backdoor to selectively replaces modules in the victim model. GMS offers four desirable properties: (1) robustness to the choice and trustworthiness of the proxy model, (2) applicability under inaccurate data knowledge, (3) stability across hyperparameters, and (4)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel-Driven Software Engineering Techniques · Software Testing and Debugging Techniques · Natural Language Processing Techniques
MethodsFocus
