Generating Adversarial Examples with Controllable Non-transferability
Renzhi Wang, Tianwei Zhang, Xiaofei Xie, Lei Ma, Cong Tian, Felix, Juefei-Xu, Yang Liu

TL;DR
This paper introduces novel methods for generating adversarial examples with controllable non-transferability, enabling targeted attacks on specific models while avoiding others, thus enhancing attack precision and security analysis.
Contribution
It proposes two new attack techniques—Reversed Loss Function Ensemble and Transferability Classification—for crafting non-transferable adversarial examples in various threat models.
Findings
Effective in white-box and gray-box settings
Guided generation of non-transferable adversarial examples
Demonstrates efficiency and effectiveness of methods
Abstract
Adversarial attacks against Deep Neural Networks have been widely studied. One significant feature that makes such attacks particularly powerful is transferability, where the adversarial examples generated from one model can be effective against other similar models as well. A large number of works have been done to increase the transferability. However, how to decrease the transferability and craft malicious samples only for specific target models are not explored yet. In this paper, we design novel attack methodologies to generate adversarial examples with controllable non-transferability. With these methods, an adversary can efficiently produce precise adversarial examples to attack a set of target models he desires, while keeping benign to other models. The first method is Reversed Loss Function Ensemble, where the adversary can craft qualified examples from the gradients of a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
