InDistill: Information flow-preserving knowledge distillation for model compression
Ioannis Sarridis, Christos Koutlis, Giorgos Kordopatis-Zilos, Ioannis, Kompatsiaris, Symeon Papadopoulos

TL;DR
InDistill is a novel knowledge distillation method that enhances model compression by preserving critical information flow paths through curriculum learning and adaptive pruning, leading to improved student model performance.
Contribution
It introduces a warmup stage for KD that focuses on critical information flow transfer and employs pruning to match teacher-student layer widths, broadening applicability.
Findings
Consistently improves baseline KD performance across datasets.
Effective in classification and retrieval tasks.
Applicable to various teacher-student architectures.
Abstract
In this paper, we introduce InDistill, a method that serves as a warmup stage for enhancing Knowledge Distillation (KD) effectiveness. InDistill focuses on transferring critical information flow paths from a heavyweight teacher to a lightweight student. This is achieved via a training scheme based on curriculum learning that considers the distillation difficulty of each layer and the critical learning periods when the information flow paths are established. This procedure can lead to a student model that is better prepared to learn from the teacher. To ensure the applicability of InDistill across a wide range of teacher-student pairs, we also incorporate a pruning operation when there is a discrepancy in the width of the teacher and student layers. This pruning operation reduces the width of the teacher's intermediate layers to match those of the student, allowing direct distillation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsPruning · Knowledge Distillation
