How and When Adversarial Robustness Transfers in Knowledge Distillation?
Rulin Shao, Jinfeng Yi, Pin-Yu Chen, Cho-Jui Hsieh

TL;DR
This paper investigates how adversarial robustness can be transferred in knowledge distillation, identifies limitations of standard KD, and proposes KDIGA to effectively transfer and even enhance robustness across diverse models and datasets.
Contribution
The paper introduces KDIGA, a novel method that preserves and transfers adversarial robustness in knowledge distillation, supported by theoretical proofs and extensive empirical validation.
Findings
KDIGA enables robustness transfer across different architectures.
Students can surpass teacher robustness with KDIGA.
Theoretical bounds align with empirical robustness results.
Abstract
Knowledge distillation (KD) has been widely used in teacher-student training, with applications to model compression in resource-constrained deep learning. Current works mainly focus on preserving the accuracy of the teacher model. However, other important model properties, such as adversarial robustness, can be lost during distillation. This paper studies how and when the adversarial robustness can be transferred from a teacher model to a student model in KD. We show that standard KD training fails to preserve adversarial robustness, and we propose KD with input gradient alignment (KDIGA) for remedy. Under certain assumptions, we prove that the student model using our proposed KDIGA can achieve at least the same certified robustness as the teacher model. Our experiments of KD contain a diverse set of teacher and student models with varying network architectures and sizes evaluated on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
MethodsKaiming Initialization · Residual Connection · 1x1 Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Convolution · Average Pooling · Max Pooling · Global Average Pooling · Bottleneck Residual Block
