TL;DR
This paper introduces Argos, a multi-view inconsistency detector that identifies adversarial images by amplifying and detecting discrepancies between the true content and added perturbations, significantly improving detection accuracy and robustness.
Contribution
The paper proposes a novel multi-view detection method using autoregressive generation to reveal inconsistencies in adversarial images, outperforming existing detectors.
Findings
Argos outperforms existing detectors in accuracy and robustness.
Amplification of discrepancies enhances adversarial detection.
Effective against six well-known adversarial attacks.
Abstract
In the evasion attacks against deep neural networks (DNN), the attacker generates adversarial instances that are visually indistinguishable from benign samples and sends them to the target DNN to trigger misclassifications. In this paper, we propose a novel multi-view adversarial image detector, namely Argos, based on a novel observation. That is, there exist two "souls" in an adversarial instance, i.e., the visually unchanged content, which corresponds to the true label, and the added invisible perturbation, which corresponds to the misclassified label. Such inconsistencies could be further amplified through an autoregressive generative approach that generates images with seed pixels selected from the original image, a selected label, and pixel distributions learned from the training data. The generated images (i.e., the "views") will deviate significantly from the original one if the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPixelCNN
