REFINE: Inversion-Free Backdoor Defense via Model Reprogramming

Yukun Chen; Shuo Shao; Enhao Huang; Yiming Li; Pin-Yu Chen; Zhan Qin,; Kui Ren

arXiv:2502.18508·cs.CR·February 27, 2025

REFINE: Inversion-Free Backdoor Defense via Model Reprogramming

Yukun Chen, Shuo Shao, Enhao Huang, Yiming Li, Pin-Yu Chen, Zhan Qin,, Kui Ren

PDF

Open Access 2 Repos 3 Reviews

TL;DR

REFINE introduces an inversion-free backdoor defense leveraging model reprogramming, combining input transformation and output remapping with contrastive loss to effectively neutralize backdoors without trigger inversion.

Contribution

This work presents a novel inversion-free backdoor defense method called REFINE that uses model reprogramming, improving robustness and utility over existing input transformation and trigger inversion techniques.

Findings

01

Effective backdoor mitigation demonstrated on benchmark datasets.

02

Resistant to adaptive backdoor attacks.

03

Maintains high model utility while defending against backdoors.

Abstract

Backdoor attacks on deep neural networks (DNNs) have emerged as a significant security threat, allowing adversaries to implant hidden malicious behaviors during the model training phase. Pre-processing-based defense, which is one of the most important defense paradigms, typically focuses on input transformations or backdoor trigger inversion (BTI) to deactivate or eliminate embedded backdoor triggers during the inference process. However, these methods suffer from inherent limitations: transformation-based defenses often fail to balance model utility and defense performance, while BTI-based defenses struggle to accurately reconstruct trigger patterns without prior knowledge. In this paper, we propose REFINE, an inversion-free backdoor defense method based on model reprogramming. REFINE consists of two key components: \textbf{(1)} an input transformation module that disrupts both benign…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 5Confidence 3

Strengths

- Extensive experiments demonstrating the effectiveness of REFINE across different datasets. - Thorough ablation studies validating each component's contribution

Weaknesses

- The connection between motivation and the proposed methods is not very close. However, the analysis of BTI-based defenses does not involve this domain-based perspective. The paper didn’t discuss whether and how the purification process in BTI-based defenses alter the image domain. - The theoretical analysis through Theorem 1 effectively explains the limitations of transformation-based defenses by quantifying how domain transformations affect defense performance. - Experiments mainly focus on

Reviewer 02Rating 6Confidence 4

Strengths

1. A theoretical analysis demonstrates that the effect of backdoor defenses is bounded by the distance of the output features before and after the preprocessing. Therefore, existing methods can not break the trade-off between the model utility and the defense effectiveness. 2. The proposed method is novel and interesting. By integrating model reprogramming techniques, they only need to change the model input without changing the model parameters to achieve backdoor elimination, and it does not a

Weaknesses

1. The authors discuss the pre-processing defense methods, i.e., input-transformation defenses and BTI-based defenses, and analyze their limitations in details. However, the proposed methods actually belong to the input transformation-based method. This paper also spend a large amount of time to analyze and compare BTI methods with the proposed method, which makes it hard to read. 2. This paper assumes that they have access to an unlabeled dataset that is independent and identically distributed

Reviewer 03Rating 5Confidence 4

Strengths

1. Revisit the pre-processing defenses against backdoor attacks and reveal their limitations. The pre-processing-based defense is important to protect model security while not changing model structure or weights. 2. Propose a pre-processing defense against backdoor attacks, which seems to be simple but effective. 3. Conduct extensive experiments to demonstrate the effectiveness of the proposed defense.

Weaknesses

1. The claim of the limitations of prior works are subjective and confusing. For the first limitation, the authors think "transformation-based backdoor defenses methods face a trade-off between utility and effectiveness". So, can the proposed defense overcome this limitation? From the design and experimental results, REFINE also suffer from the same problem. Otherwise, the BA with REFINE should be same with original model. Moreover, the authors try to utilize experiments to validate their claim

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Security and Verification in Computing · Advanced Malware Detection Techniques