Beating Attackers At Their Own Games: Adversarial Example Detection   Using Adversarial Gradient Directions

Yuhang Wu; Sunpreet S. Arora; Yanhong Wu; Hao Yang

arXiv:2012.15386·cs.CV·January 1, 2021·1 cites

Beating Attackers At Their Own Games: Adversarial Example Detection Using Adversarial Gradient Directions

Yuhang Wu, Sunpreet S. Arora, Yanhong Wu, Hao Yang

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel adversarial example detection method that leverages the directions of adversarial gradients, demonstrating high accuracy and efficiency across multiple datasets and attack types.

Contribution

The paper proposes a new detection approach based on adversarial gradient directions, which is more efficient and effective than existing methods that rely on multiple perturbations.

Findings

01

Achieves over 97% AUC-ROC on CIFAR-10

02

Achieves over 98% AUC-ROC on ImageNet

03

Outperforms several state-of-the-art detection methods

Abstract

Adversarial examples are input examples that are specifically crafted to deceive machine learning classifiers. State-of-the-art adversarial example detection methods characterize an input example as adversarial either by quantifying the magnitude of feature variations under multiple perturbations or by measuring its distance from estimated benign example distribution. Instead of using such metrics, the proposed method is based on the observation that the directions of adversarial gradients when crafting (new) adversarial examples play a key role in characterizing the adversarial space. Compared to detection methods that use multiple perturbations, the proposed method is efficient as it only applies a single random perturbation on the input example. Experiments conducted on two different databases, CIFAR-10 and ImageNet, show that the proposed detection method achieves, respectively,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Beating Attackers at Their Own Games: Adversarial Example Detection Using Adversarial Gradient Directions· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Anomaly Detection Techniques and Applications