$V_kD:$ Improving Knowledge Distillation using Orthogonal Projections

Roy Miles; Ismail Elezi; Jiankang Deng

arXiv:2403.06213·cs.CV·March 12, 2024·1 cites

$V_kD:$ Improving Knowledge Distillation using Orthogonal Projections

Roy Miles, Ismail Elezi, Jiankang Deng

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel knowledge distillation method called $V_kD$ that uses orthogonal projections and task-specific normalization to enhance the transferability and performance of compact models across various tasks and architectures.

Contribution

The paper presents a new constrained feature distillation approach with orthogonal projections and normalization, improving model performance on multiple tasks beyond previous methods.

Findings

01

Outperforms previous methods on ImageNet with up to 4.4% improvement

02

Achieves consistent gains in object detection and image generation

03

Demonstrates broad applicability across tasks and architectures

Abstract

Knowledge distillation is an effective method for training small and efficient deep learning models. However, the efficacy of a single method can degenerate when transferring to other tasks, modalities, or even other architectures. To address this limitation, we propose a novel constrained feature distillation method. This method is derived from a small set of core principles, which results in two emerging components: an orthogonal projection and a task-specific normalisation. Equipped with both of these components, our transformer models can outperform all previous methods on ImageNet and reach up to a 4.4% relative improvement over the previous state-of-the-art methods. To further demonstrate the generality of our method, we apply it to object detection and image generation, whereby we obtain consistent and substantial performance improvements over state-of-the-art. Code and models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

roymiles/vkd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsSparse Evolutionary Training