Knowledge Distillation as Efficient Pre-training: Faster Convergence,   Higher Data-efficiency, and Better Transferability

Ruifei He; Shuyang Sun; Jihan Yang; Song Bai; Xiaojuan Qi

arXiv:2203.05180·cs.CV·March 29, 2022·6 cites

Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better Transferability

Ruifei He, Shuyang Sun, Jihan Yang, Song Bai, Xiaojuan Qi

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel feature-based knowledge distillation method for pre-training that significantly reduces data and time requirements while maintaining competitive performance on downstream tasks.

Contribution

It proposes a new feature-based KD approach with non-parametric feature alignment, enabling efficient pre-training without extensive data or time.

Findings

01

Achieves comparable performance to supervised pre-training on multiple tasks

02

Requires 10x less data and 5x less pre-training time

03

Effective transfer of learned features to downstream applications

Abstract

Large-scale pre-training has been proven to be crucial for various computer vision tasks. However, with the increase of pre-training data amount, model architecture amount, and the private/inaccessible data, it is not very efficient or possible to pre-train all the model architectures on large-scale datasets. In this work, we investigate an alternative strategy for pre-training, namely Knowledge Distillation as Efficient Pre-training (KDEP), aiming to efficiently transfer the learned feature representation from existing pre-trained models to new student models for future downstream tasks. We observe that existing Knowledge Distillation (KD) methods are unsuitable towards pre-training since they normally distill the logits that are going to be discarded when transferred to downstream tasks. To resolve this problem, we propose a feature-based KD method with non-parametric feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cvmi-lab/kdep
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning

MethodsKnowledge Distillation