Detecting Adversarial Examples by Input Transformations, Defense Perturbations, and Voting
Federico Nesti, Alessandro Biondi, Giorgio Buttazzo

TL;DR
This paper investigates detecting adversarial examples in CNNs through input transformations and introduces defense perturbations to counteract robust adversarial attacks, also exploring multi-network adversarial examples in ensemble systems.
Contribution
It proposes a novel defense perturbation method for detecting robust adversarial examples and introduces multi-network adversarial examples to challenge ensemble defenses.
Findings
Defense perturbation effectively detects robust adversarial examples.
Input transformations can identify non-robust adversarial attacks.
Multi-network adversarial examples can fool multiple CNNs simultaneously.
Abstract
Over the last few years, convolutional neural networks (CNNs) have proved to reach super-human performance in visual recognition tasks. However, CNNs can easily be fooled by adversarial examples, i.e., maliciously-crafted images that force the networks to predict an incorrect output while being extremely similar to those for which a correct output is predicted. Regular adversarial examples are not robust to input image transformations, which can then be used to detect whether an adversarial example is presented to the network. Nevertheless, it is still possible to generate adversarial examples that are robust to such transformations. This paper extensively explores the detection of adversarial examples via image transformations and proposes a novel methodology, called \textit{defense perturbation}, to detect robust adversarial examples with the same input transformations the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
