On Trace of PGD-Like Adversarial Attacks
Mo Zhou, Vishal M. Patel

TL;DR
This paper introduces the ARC feature to detect PGD-like adversarial attacks by analyzing gradient consistency, demonstrating its effectiveness on CIFAR-10 and ImageNet for attack detection and recognition.
Contribution
It proposes a novel, lightweight ARC feature that captures model linearity traces left by PGD-like attacks, aiding in attack detection and classification.
Findings
ARC effectively detects PGD-like attacks
ARC distinguishes attack types with high accuracy
ARC is computationally efficient and easy to implement
Abstract
Adversarial attacks pose safety and security concerns to deep learning applications, but their characteristics are under-explored. Yet largely imperceptible, a strong trace could have been left by PGD-like attacks in an adversarial example. Recall that PGD-like attacks trigger the ``local linearity'' of a network, which implies different extents of linearity for benign or adversarial examples. Inspired by this, we construct an Adversarial Response Characteristics (ARC) feature to reflect the model's gradient consistency around the input to indicate the extent of linearity. Under certain conditions, it qualitatively shows a gradually varying pattern from benign example to adversarial example, as the latter leads to Sequel Attack Effect (SAE). To quantitatively evaluate the effectiveness of ARC, we conduct experiments on CIFAR-10 and ImageNet for attack detection and attack type…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning
