$V_kD:$ Improving Knowledge Distillation using Orthogonal Projections
Roy Miles, Ismail Elezi, Jiankang Deng

TL;DR
This paper introduces a novel knowledge distillation method called $V_kD$ that uses orthogonal projections and task-specific normalization to enhance the transferability and performance of compact models across various tasks and architectures.
Contribution
The paper presents a new constrained feature distillation approach with orthogonal projections and normalization, improving model performance on multiple tasks beyond previous methods.
Findings
Outperforms previous methods on ImageNet with up to 4.4% improvement
Achieves consistent gains in object detection and image generation
Demonstrates broad applicability across tasks and architectures
Abstract
Knowledge distillation is an effective method for training small and efficient deep learning models. However, the efficacy of a single method can degenerate when transferring to other tasks, modalities, or even other architectures. To address this limitation, we propose a novel constrained feature distillation method. This method is derived from a small set of core principles, which results in two emerging components: an orthogonal projection and a task-specific normalisation. Equipped with both of these components, our transformer models can outperform all previous methods on ImageNet and reach up to a 4.4% relative improvement over the previous state-of-the-art methods. To further demonstrate the generality of our method, we apply it to object detection and image generation, whereby we obtain consistent and substantial performance improvements over state-of-the-art. Code and models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsSparse Evolutionary Training
