Defense Against Adversarial Attacks Using Feature Scattering-based Adversarial Training
Haichao Zhang, Jianyu Wang

TL;DR
This paper proposes a novel unsupervised feature scattering method for adversarial training that enhances model robustness by considering inter-sample relationships, avoiding label leaking issues present in traditional methods.
Contribution
It introduces a feature scattering-based adversarial training approach that generates perturbed images in latent space, improving robustness without label leaking.
Findings
Enhanced robustness against adversarial attacks
Outperforms state-of-the-art methods in experiments
Avoids label leaking in adversarial training
Abstract
We introduce a feature scattering-based adversarial training approach for improving model robustness against adversarial attacks. Conventional adversarial training approaches leverage a supervised scheme (either targeted or non-targeted) in generating attacks for training, which typically suffer from issues such as label leaking as noted in recent works. Differently, the proposed approach generates adversarial images for training through feature scattering in the latent space, which is unsupervised in nature and avoids label leaking. More importantly, this new approach generates perturbed images in a collaborative fashion, taking the inter-sample relationships into consideration. We conduct analysis on model robustness and demonstrate the effectiveness of the proposed approach through extensively experiments on different datasets compared with state-of-the-art approaches.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Integrated Circuits and Semiconductor Failure Analysis
