Adversarial Distillation for Ordered Top-k Attacks
Zekun Zhang, Tianfu Wu

TL;DR
This paper introduces a novel adversarial distillation framework to generate ordered Top-k attacks on image classifiers, improving attack success rates over existing methods by leveraging label semantics and targeted distributions.
Contribution
It proposes a new adversarial distillation approach for ordered Top-k attacks, incorporating label semantic similarities to enhance attack effectiveness.
Findings
Outperforms C&W in Top-1 and Top-5 attack settings.
Significant improvements in attack success rates on ImageNet models.
Effective use of label semantics in adversarial attack generation.
Abstract
Deep Neural Networks (DNNs) are vulnerable to adversarial attacks, especially white-box targeted attacks. One scheme of learning attacks is to design a proper adversarial objective function that leads to the imperceptible perturbation for any test image (e.g., the Carlini-Wagner (C&W) method). Most methods address targeted attacks in the Top-1 manner. In this paper, we propose to learn ordered Top-k attacks (k>= 1) for image classification tasks, that is to enforce the Top-k predicted labels of an adversarial example to be the k (randomly) selected and ordered labels (the ground-truth label is exclusive). To this end, we present an adversarial distillation framework: First, we compute an adversarial probability distribution for any given ordered Top-k targeted labels with respect to the ground-truth of a test image. Then, we learn adversarial examples by minimizing the Kullback-Leibler…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Physical Unclonable Functions (PUFs) and Hardware Security · Security and Verification in Computing
