Minimizing Maximum Model Discrepancy for Transferable Black-box Targeted Attacks
Anqi Zhao, Tong Chu, Yahao Liu, Wen Li, Jingjing Li, Lixin Duan

TL;DR
This paper introduces a new theoretical framework and algorithm for black-box targeted attacks that minimize model discrepancy among substitute models, resulting in more transferable adversarial examples.
Contribution
It provides a generalization error bound for black-box targeted attacks and proposes an algorithm that minimizes maximum model discrepancy to enhance transferability.
Findings
Outperforms existing methods on ImageNet dataset
Achieves higher attack success rates
Demonstrates robustness to model variation
Abstract
In this work, we study the black-box targeted attack problem from the model discrepancy perspective. On the theoretical side, we present a generalization error bound for black-box targeted attacks, which gives a rigorous theoretical analysis for guaranteeing the success of the attack. We reveal that the attack error on a target model mainly depends on empirical attack error on the substitute model and the maximum model discrepancy among substitute models. On the algorithmic side, we derive a new algorithm for black-box targeted attacks based on our theoretical analysis, in which we additionally minimize the maximum model discrepancy(M3D) of the substitute models when training the generator to generate adversarial examples. In this way, our model is capable of crafting highly transferable adversarial examples that are robust to the model variation, thus improving the success rate for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · COVID-19 diagnosis using AI
