Improving Transformation-based Defenses against Adversarial Examples with First-order Perturbations
Haimin Zhang, Min Xu

TL;DR
This paper introduces a novel inference-time method that enhances the robustness of neural networks against adversarial attacks by adding carefully crafted first-order perturbations, without retraining the model.
Contribution
It proposes a new technique that counteracts adversarial perturbations using first-order label-based perturbations, improving existing transformation-based defenses.
Findings
Effective against strong adversarial examples
Improves defense performance on CIFAR-10 and CIFAR-100
Does not require model retraining or fine-tuning
Abstract
Deep neural networks have been successfully applied in various machine learning tasks. However, studies show that neural networks are susceptible to adversarial attacks. This exposes a potential threat to neural network-based intelligent systems. We observe that the probability of the correct result outputted by the neural network increases by applying small first-order perturbations generated for non-predicted class labels to adversarial examples. Based on this observation, we propose a method for counteracting adversarial perturbations to improve adversarial robustness. In the proposed method, we randomly select a number of class labels and generate small first-order perturbations for these selected labels. The generated perturbations are added together and then clamped onto a specified space. The obtained perturbation is finally added to the adversarial example to counteract the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
