Tricking Adversarial Attacks To Fail

Blerta Lindqvist

arXiv:2006.04504·cs.LG·June 9, 2020

Tricking Adversarial Attacks To Fail

Blerta Lindqvist

PDF

Open Access

TL;DR

This paper introduces Target Training, a novel adversarial defense that redirects untargeted gradient-based attacks towards designated target classes, enabling accurate classification without prior attack knowledge.

Contribution

The paper proposes a new defense method that minimally alters classifiers and effectively redirects untargeted attacks, outperforming existing defenses on CIFAR10.

Findings

01

Achieves 86.2% accuracy on CW-L2 attack in CIFAR10

02

Eliminates need for attack knowledge and adversarial sample generation

03

Outperforms unsecured classifiers on non-adversarial samples

Abstract

Recent adversarial defense approaches have failed. Untargeted gradient-based attacks cause classifiers to choose any wrong class. Our novel white-box defense tricks untargeted attacks into becoming attacks targeted at designated target classes. From these target classes, we can derive the real classes. Our Target Training defense tricks the minimization at the core of untargeted, gradient-based adversarial attacks: minimize the sum of (1) perturbation and (2) classifier adversarial loss. Target Training changes the classifier minimally, and trains it with additional duplicated points (at 0 distance) labeled with designated classes. These differently-labeled duplicated samples minimize both terms (1) and (2) of the minimization, steering attack convergence to samples of designated classes, from which correct classification is derived. Importantly, Target Training eliminates the need to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Bacillus and Francisella bacterial research