Turning a Curse into a Blessing: Enabling In-Distribution-Data-Free Backdoor Removal via Stabilized Model Inversion
Si Chen, Yi Zeng, Jiachen T.Wang, Won Park, Xun Chen, Lingjuan Lyu,, Zhuoqing Mao, Ruoxi Jia

TL;DR
This paper introduces a novel stabilized model inversion framework that effectively removes backdoors from machine learning models without requiring clean in-distribution data, leveraging the stability of reconstructed samples.
Contribution
It is the first to thoroughly analyze and utilize model inversion for backdoor removal, emphasizing stability and visual quality of reconstructed samples.
Findings
Reconstructed samples from a pre-trained generator are backdoor-free.
Stability of model predictions is crucial for effective backdoor removal.
The method achieves state-of-the-art results without clean data.
Abstract
Many backdoor removal techniques in machine learning models require clean in-distribution data, which may not always be available due to proprietary datasets. Model inversion techniques, often considered privacy threats, can reconstruct realistic training samples, potentially eliminating the need for in-distribution data. Prior attempts to combine backdoor removal and model inversion yielded limited results. Our work is the first to provide a thorough understanding of leveraging model inversion for effective backdoor removal by addressing key questions about reconstructed samples' properties, perceptual similarity, and the potential presence of backdoor triggers. We establish that relying solely on perceptual similarity is insufficient for robust defenses, and the stability of model predictions in response to input and parameter perturbations is also crucial. To tackle this, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
