Turning a Curse into a Blessing: Enabling In-Distribution-Data-Free   Backdoor Removal via Stabilized Model Inversion

Si Chen; Yi Zeng; Jiachen T.Wang; Won Park; Xun Chen; Lingjuan Lyu,; Zhuoqing Mao; Ruoxi Jia

arXiv:2206.07018·cs.CV·March 27, 2023·1 cites

Turning a Curse into a Blessing: Enabling In-Distribution-Data-Free Backdoor Removal via Stabilized Model Inversion

Si Chen, Yi Zeng, Jiachen T.Wang, Won Park, Xun Chen, Lingjuan Lyu,, Zhuoqing Mao, Ruoxi Jia

PDF

Open Access

TL;DR

This paper introduces a novel stabilized model inversion framework that effectively removes backdoors from machine learning models without requiring clean in-distribution data, leveraging the stability of reconstructed samples.

Contribution

It is the first to thoroughly analyze and utilize model inversion for backdoor removal, emphasizing stability and visual quality of reconstructed samples.

Findings

01

Reconstructed samples from a pre-trained generator are backdoor-free.

02

Stability of model predictions is crucial for effective backdoor removal.

03

The method achieves state-of-the-art results without clean data.

Abstract

Many backdoor removal techniques in machine learning models require clean in-distribution data, which may not always be available due to proprietary datasets. Model inversion techniques, often considered privacy threats, can reconstruct realistic training samples, potentially eliminating the need for in-distribution data. Prior attempts to combine backdoor removal and model inversion yielded limited results. Our work is the first to provide a thorough understanding of leveraging model inversion for effective backdoor removal by addressing key questions about reconstructed samples' properties, perceptual similarity, and the potential presence of backdoor triggers. We establish that relying solely on perceptual similarity is insufficient for robust defenses, and the stability of model predictions in response to input and parameter perturbations is also crucial. To tackle this, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis