Unlearn to Relearn Backdoors: Deferred Backdoor Functionality Attacks on   Deep Learning Models

Jeongjin Shin; Sangdon Park

arXiv:2411.14449·cs.CR·November 26, 2024

Unlearn to Relearn Backdoors: Deferred Backdoor Functionality Attacks on Deep Learning Models

Jeongjin Shin, Sangdon Park

PDF

Open Access

TL;DR

This paper introduces Deferred Activated Backdoor Functionality (DABF), a new stealthy backdoor attack that remains hidden until after model updates, exploiting model fine-tuning to evade detection.

Contribution

The paper proposes DABF, a novel backdoor attack paradigm that conceals malicious behavior until after model retraining, along with a two-stage training scheme called DeferBad.

Findings

01

DABF effectively evades existing detection methods.

02

DeferBad enables easy cancellation and reactivation of backdoors.

03

Experiments show high success and stealthiness across models and datasets.

Abstract

Deep learning models are vulnerable to backdoor attacks, where adversaries inject malicious functionality during training that activates on trigger inputs at inference time. Extensive research has focused on developing stealthy backdoor attacks to evade detection and defense mechanisms. However, these approaches still have limitations that leave the door open for detection and mitigation due to their inherent design to cause malicious behavior in the presence of a trigger. To address this limitation, we introduce Deferred Activated Backdoor Functionality (DABF), a new paradigm in backdoor attacks. Unlike conventional attacks, DABF initially conceals its backdoor, producing benign outputs even when triggered. This stealthy behavior allows DABF to bypass multiple detection and defense methods, remaining undetected during initial inspections. The backdoor functionality is strategically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning