Enhancing Adversarial Attacks: The Similar Target Method
Shuo Zhang, Ziruo Wang, Zikai Zhou, Huanran Chen

TL;DR
This paper introduces the Similar Target (ST) method, which enhances adversarial attack transferability by aligning gradients across models, leading to more effective attacks on various classifiers.
Contribution
The paper proposes a novel gradient similarity regularization technique for ensemble attacks, improving transferability over existing methods.
Findings
Outperforms state-of-the-art attackers on 18 classifiers
Improves transferability to adversarially trained models
Validated on ImageNet dataset
Abstract
Deep neural networks are vulnerable to adversarial examples, posing a threat to the models' applications and raising security concerns. An intriguing property of adversarial examples is their strong transferability. Several methods have been proposed to enhance transferability, including ensemble attacks which have demonstrated their efficacy. However, prior approaches simply average logits, probabilities, or losses for model ensembling, lacking a comprehensive analysis of how and why model ensembling significantly improves transferability. In this paper, we propose a similar targeted attack method named Similar Target~(ST). By promoting cosine similarity between the gradients of each model, our method regularizes the optimization direction to simultaneously attack all surrogate models. This strategy has been proven to enhance generalization ability. Experimental results on ImageNet…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
