TL;DR
This paper introduces a novel black-box adversarial attack method that minimizes visual distortion by learning noise distribution, achieving high success rates with lower perceptual impact on images.
Contribution
It proposes a new approach that directly minimizes visual distortion in black-box attacks by learning the noise distribution, improving perceptual quality of adversarial examples.
Findings
Achieves 100% attack success rate on multiple models.
Produces significantly lower visual distortion than existing methods.
Validated on ImageNet dataset.
Abstract
Constructing adversarial examples in a black-box threat model injures the original images by introducing visual distortion. In this paper, we propose a novel black-box attack approach that can directly minimize the induced distortion by learning the noise distribution of the adversarial example, assuming only loss-oracle access to the black-box network. The quantified visual distortion, which measures the perceptual distance between the adversarial example and the original image, is introduced in our loss whilst the gradient of the corresponding non-differentiable loss function is approximated by sampling noise from the learned noise distribution. We validate the effectiveness of our attack on ImageNet. Our attack results in much lower distortion when compared to the state-of-the-art black-box attacks and achieves success rate on InceptionV3, ResNet50 and VGG16bn. The code is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
