Teach Me to Trick: Exploring Adversarial Transferability via Knowledge Distillation
Siddhartha Pradhan, Shikshya Shiwakoti, Neha Bathuri

TL;DR
This paper explores how knowledge distillation from multiple models can improve the transferability and efficiency of adversarial example generation, achieving high success rates with reduced computation.
Contribution
It introduces novel KD strategies for adversarial transferability, demonstrating improved attack success and efficiency over traditional ensemble methods.
Findings
Distilled student models match ensemble attack success rates
Transferability improves with lower temperature and hard-label supervision
Generation time reduces by up to six times
Abstract
We investigate whether knowledge distillation (KD) from multiple heterogeneous teacher models can enhance the generation of transferable adversarial examples. A lightweight student model is trained using two KD strategies: curriculum-based switching and joint optimization, with ResNet50 and DenseNet-161 as teachers. The trained student is then used to generate adversarial examples using FG, FGS, and PGD attacks, which are evaluated against a black-box target model (GoogLeNet). Our results show that student models distilled from multiple teachers achieve attack success rates comparable to ensemble-based baselines, while reducing adversarial example generation time by up to a factor of six. An ablation study further reveals that lower temperature settings and the inclusion of hard-label supervision significantly enhance transferability. These findings suggest that KD can serve not only as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
