DearKD: Data-Efficient Early Knowledge Distillation for Vision   Transformers

Xianing Chen; Qiong Cao; Yujie Zhong; Jing Zhang; Shenghua Gao,; Dacheng Tao

arXiv:2204.12997·cs.CV·April 29, 2022·1 cites

DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers

Xianing Chen, Qiong Cao, Yujie Zhong, Jing Zhang, Shenghua Gao,, Dacheng Tao

PDF

Open Access

TL;DR

DearKD is a two-stage knowledge distillation framework that enhances data efficiency for vision transformers by leveraging CNN inductive biases and a boundary-preserving loss, effective even in data-free scenarios.

Contribution

It introduces a novel two-stage distillation method and a boundary-preserving loss, improving data efficiency and performance of vision transformers, including in data-free settings.

Findings

01

Outperforms baselines and state-of-the-art methods on ImageNet.

02

Effective in data-free and partial data scenarios.

03

Enhances transformer training with CNN inductive biases.

Abstract

Transformers are successfully applied to computer vision due to their powerful modeling capacity with self-attention. However, the excellent performance of transformers heavily depends on enormous training images. Thus, a data-efficient transformer solution is urgently needed. In this work, we propose an early knowledge distillation framework, which is termed as DearKD, to improve the data efficiency required by transformers. Our DearKD is a two-stage framework that first distills the inductive biases from the early intermediate layers of a CNN and then gives the transformer full play by training without distillation. Further, our DearKD can be readily applied to the extreme data-free case where no real images are available. In this case, we propose a boundary-preserving intra-divergence loss based on DeepInversion to further close the performance gap against the full-data counterpart.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI

MethodsKnowledge Distillation