Distilling Balanced Knowledge from a Biased Teacher
Seonghak Kim

TL;DR
This paper introduces Long-Tailed Knowledge Distillation (LTKD), a framework that effectively distills balanced knowledge from biased teachers in long-tailed distributions by decomposing and rebalancing class group predictions.
Contribution
LTKD reformulates knowledge distillation into cross-group and within-group components, addressing teacher bias and improving tail-class performance in long-tailed datasets.
Findings
LTKD outperforms existing methods on CIFAR-100-LT, TinyImageNet-LT, and ImageNet-LT.
It significantly improves tail-class accuracy.
The method effectively mitigates teacher bias in long-tailed distributions.
Abstract
Conventional knowledge distillation, designed for model compression, fails on long-tailed distributions because the teacher model tends to be biased toward head classes and provides limited supervision for tail classes. We propose Long-Tailed Knowledge Distillation (LTKD), a novel framework that reformulates the conventional objective into two components: a cross-group loss, capturing mismatches in prediction distributions across class groups (head, medium, and tail), and a within-group loss, capturing discrepancies within each group's distribution. This decomposition reveals the specific sources of the teacher's bias. To mitigate the inherited bias, LTKD introduces (1) a rebalanced cross-group loss that calibrates the teacher's group-level predictions and (2) a reweighted within-group loss that ensures equal contribution from all groups. Extensive experiments on CIFAR-100-LT,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
