DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers
Xianing Chen, Qiong Cao, Yujie Zhong, Jing Zhang, Shenghua Gao,, Dacheng Tao

TL;DR
DearKD is a two-stage knowledge distillation framework that enhances data efficiency for vision transformers by leveraging CNN inductive biases and a boundary-preserving loss, effective even in data-free scenarios.
Contribution
It introduces a novel two-stage distillation method and a boundary-preserving loss, improving data efficiency and performance of vision transformers, including in data-free settings.
Findings
Outperforms baselines and state-of-the-art methods on ImageNet.
Effective in data-free and partial data scenarios.
Enhances transformer training with CNN inductive biases.
Abstract
Transformers are successfully applied to computer vision due to their powerful modeling capacity with self-attention. However, the excellent performance of transformers heavily depends on enormous training images. Thus, a data-efficient transformer solution is urgently needed. In this work, we propose an early knowledge distillation framework, which is termed as DearKD, to improve the data efficiency required by transformers. Our DearKD is a two-stage framework that first distills the inductive biases from the early intermediate layers of a CNN and then gives the transformer full play by training without distillation. Further, our DearKD can be readily applied to the extreme data-free case where no real images are available. In this case, we propose a boundary-preserving intra-divergence loss based on DeepInversion to further close the performance gap against the full-data counterpart.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI
MethodsKnowledge Distillation
