Unlearn to Relearn Backdoors: Deferred Backdoor Functionality Attacks on Deep Learning Models
Jeongjin Shin, Sangdon Park

TL;DR
This paper introduces Deferred Activated Backdoor Functionality (DABF), a new stealthy backdoor attack that remains hidden until after model updates, exploiting model fine-tuning to evade detection.
Contribution
The paper proposes DABF, a novel backdoor attack paradigm that conceals malicious behavior until after model retraining, along with a two-stage training scheme called DeferBad.
Findings
DABF effectively evades existing detection methods.
DeferBad enables easy cancellation and reactivation of backdoors.
Experiments show high success and stealthiness across models and datasets.
Abstract
Deep learning models are vulnerable to backdoor attacks, where adversaries inject malicious functionality during training that activates on trigger inputs at inference time. Extensive research has focused on developing stealthy backdoor attacks to evade detection and defense mechanisms. However, these approaches still have limitations that leave the door open for detection and mitigation due to their inherent design to cause malicious behavior in the presence of a trigger. To address this limitation, we introduce Deferred Activated Backdoor Functionality (DABF), a new paradigm in backdoor attacks. Unlike conventional attacks, DABF initially conceals its backdoor, producing benign outputs even when triggered. This stealthy behavior allows DABF to bypass multiple detection and defense methods, remaining undetected during initial inspections. The backdoor functionality is strategically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
