Utilizing Adversarial Targeted Attacks to Boost Adversarial Robustness
Uriya Pesso, Koby Bibas, Meir Feder

TL;DR
This paper introduces a novel defense against adversarial attacks by using predictive hypothesis testing with adversarial targeted attacks, significantly improving robustness across multiple benchmarks and models.
Contribution
It proposes a new adversarial defense method based on Predictive Normalized Maximum Likelihood and hypothesis comparison, enhancing robustness beyond traditional adversarial training.
Findings
Up to 5.7% accuracy improvement on ImageNet
Up to 3.7% accuracy improvement on CIFAR10
Up to 0.6% accuracy improvement on MNIST
Abstract
Adversarial attacks have been shown to be highly effective at degrading the performance of deep neural networks (DNNs). The most prominent defense is adversarial training, a method for learning a robust model. Nevertheless, adversarial training does not make DNNs immune to adversarial perturbations. We propose a novel solution by adopting the recently suggested Predictive Normalized Maximum Likelihood. Specifically, our defense performs adversarial targeted attacks according to different hypotheses, where each hypothesis assumes a specific label for the test sample. Then, by comparing the hypothesis probabilities, we predict the label. Our refinement process corresponds to recent findings of the adversarial subspace properties. We extensively evaluate our approach on 16 adversarial attack benchmarks using ResNet-50, WideResNet-28, and a2-layer ConvNet trained with ImageNet, CIFAR10, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Physical Unclonable Functions (PUFs) and Hardware Security · Advanced Malware Detection Techniques
