LIB-KD: Teaching Inductive Bias for Efficient Vision Transformer Distillation and Compression
Gousia Habib, Tausifa Jan Saleem, Ishfaq Ahmad Malik, Brejesh Lall

TL;DR
This paper presents LIB-KD, a novel ensemble-based distillation method that transfers diverse inductive biases from lightweight teacher models to improve the efficiency and performance of Vision Transformer training and compression.
Contribution
It introduces an ensemble of lightweight teachers with varied inductive biases and a precomputed logits strategy to accelerate and enhance Vision Transformer distillation.
Findings
Ensemble of diverse teachers improves student ViT performance.
Precomputing logits accelerates the distillation process.
Method reduces computational costs while maintaining high accuracy.
Abstract
With the rapid development of computer vision, Vision Transformers (ViTs) offer the tantalising prospect of unified information processing across visual and textual domains due to the lack of inherent inductive biases in ViTs. ViTs require enormous datasets for training. We introduce an innovative ensemble-based distillation approach that distils inductive bias from complementary lightweight teacher models to make their applications practical. Prior systems relied solely on convolution-based teaching. However, this method incorporates an ensemble of light teachers with different architectural tendencies, such as convolution and involution, to jointly instruct the student transformer. Because of these unique inductive biases, instructors can accumulate a wide range of knowledge, even from readily identifiable stored datasets, which leads to enhanced student performance. Our proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications
MethodsConvolution
