Adversarial Purification through Representation Disentanglement
Tao Bai, Jun Zhao, Lanqing Guo, Bihan Wen

TL;DR
This paper introduces a novel adversarial purification method that disentangles natural images from adversarial perturbations, significantly improving robustness against unseen attacks without compromising clean accuracy.
Contribution
It proposes a new disentanglement-based purification scheme that enhances defense generalizability and effectiveness against strong, unseen adversarial attacks.
Findings
Reduces attack success rate from 61.7% to 14.9%.
Restores perturbed images perfectly.
Maintains clean accuracy of models.
Abstract
Deep learning models are vulnerable to adversarial examples and make incomprehensible mistakes, which puts a threat on their real-world deployment. Combined with the idea of adversarial training, preprocessing-based defenses are popular and convenient to use because of their task independence and good generalizability. Current defense methods, especially purification, tend to remove ``noise" by learning and recovering the natural images. However, different from random noise, the adversarial patterns are much easier to be overfitted during model training due to their strong correlation to the images. In this work, we propose a novel adversarial purification scheme by presenting disentanglement of natural images and adversarial perturbations as a preprocessing defense. With extensive experiments, our defense is shown to be generalizable and make significant protection against unseen…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Advanced Neural Network Applications
