EA-KD: Entropy-based Adaptive Knowledge Distillation

Chi-Ping Su; Ching-Hsun Tseng; Bin Pu; Lei Zhao; Jiewen Yang; Zhuangzhuang Chen; Shin-Jye Lee

arXiv:2311.13621·cs.CV·August 12, 2025·1 cites

EA-KD: Entropy-based Adaptive Knowledge Distillation

Chi-Ping Su, Ching-Hsun Tseng, Bin Pu, Lei Zhao, Jiewen Yang, Zhuangzhuang Chen, Shin-Jye Lee

PDF

Open Access

TL;DR

EA-KD introduces an entropy-based adaptive weighting scheme in knowledge distillation that dynamically emphasizes valuable samples, leading to improved performance across various tasks with minimal additional computational cost.

Contribution

It proposes a novel entropy-based method to adaptively reweight samples in knowledge distillation, enhancing effectiveness over traditional uniform approaches.

Findings

01

Consistently improves performance across image classification, object detection, and LLM distillation.

02

Achieves state-of-the-art results with negligible computational overhead.

03

Effectively prioritizes high-entropy samples for better knowledge transfer.

Abstract

Knowledge distillation (KD) enables a smaller "student" model to mimic a larger "teacher" model by transferring knowledge from the teacher's output or features. However, most KD methods treat all samples uniformly, overlooking the varying learning value of each sample and thereby limiting their effectiveness. In this paper, we propose Entropy-based Adaptive Knowledge Distillation (EA-KD), a simple yet effective plug-and-play KD method that prioritizes learning from valuable samples. EA-KD quantifies each sample's learning value by strategically combining the entropy of the teacher and student output, then dynamically reweights the distillation loss to place greater emphasis on high-entropy samples. Extensive experiments across diverse KD frameworks and tasks -- including image classification, object detection, and large language model (LLM) distillation -- demonstrate that EA-KD…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM · Domain Adaptation and Few-Shot Learning · Model Reduction and Neural Networks

MethodsFocus · Knowledge Distillation