InDistill: Information flow-preserving knowledge distillation for model   compression

Ioannis Sarridis; Christos Koutlis; Giorgos Kordopatis-Zilos; Ioannis; Kompatsiaris; Symeon Papadopoulos

arXiv:2205.10003·cs.CV·January 23, 2025·1 cites

InDistill: Information flow-preserving knowledge distillation for model compression

Ioannis Sarridis, Christos Koutlis, Giorgos Kordopatis-Zilos, Ioannis, Kompatsiaris, Symeon Papadopoulos

PDF

Open Access 1 Repo

TL;DR

InDistill is a novel knowledge distillation method that enhances model compression by preserving critical information flow paths through curriculum learning and adaptive pruning, leading to improved student model performance.

Contribution

It introduces a warmup stage for KD that focuses on critical information flow transfer and employs pruning to match teacher-student layer widths, broadening applicability.

Findings

01

Consistently improves baseline KD performance across datasets.

02

Effective in classification and retrieval tasks.

03

Applicable to various teacher-student architectures.

Abstract

In this paper, we introduce InDistill, a method that serves as a warmup stage for enhancing Knowledge Distillation (KD) effectiveness. InDistill focuses on transferring critical information flow paths from a heavyweight teacher to a lightweight student. This is achieved via a training scheme based on curriculum learning that considers the distillation difficulty of each layer and the critical learning periods when the information flow paths are established. This procedure can lead to a student model that is better prepared to learn from the teacher. To ensure the applicability of InDistill across a wide range of teacher-student pairs, we also incorporate a pruning operation when there is a discrepancy in the width of the teacher and student layers. This pruning operation reduces the width of the teacher's intermediate layers to match those of the student, allowing direct distillation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gsarridis/indistill
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsPruning · Knowledge Distillation