Common Knowledge Learning for Generating Transferable Adversarial Examples
Ruijie Yang, Yuanfang Guo, Junfu Wang, Jiantao Zhou, Yunhong Wang

TL;DR
This paper introduces a common knowledge learning framework that improves the transferability of black-box adversarial examples by distilling knowledge from multiple teacher models and constraining gradients to reduce output inconsistency.
Contribution
It proposes a multi-teacher knowledge distillation approach with gradient constraints to enhance adversarial transferability across different DNN architectures.
Findings
Significantly improves transferability of adversarial examples
Reduces output inconsistency between models
Enhances black-box attack effectiveness
Abstract
This paper focuses on an important type of black-box attacks, i.e., transfer-based adversarial attacks, where the adversary generates adversarial examples by a substitute (source) model and utilize them to attack an unseen target model, without knowing its information. Existing methods tend to give unsatisfactory adversarial transferability when the source and target models are from different types of DNN architectures (e.g. ResNet-18 and Swin Transformer). In this paper, we observe that the above phenomenon is induced by the output inconsistency problem. To alleviate this problem while effectively utilizing the existing DNN models, we propose a common knowledge learning (CKL) framework to learn better network weights to generate adversarial examples with better transferability, under fixed network architectures. Specifically, to reduce the model-specific features and obtain better…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications
