Distilling Balanced Knowledge from a Biased Teacher

Seonghak Kim

arXiv:2506.18496·cs.CV·March 2, 2026

Distilling Balanced Knowledge from a Biased Teacher

Seonghak Kim

PDF

TL;DR

This paper introduces Long-Tailed Knowledge Distillation (LTKD), a framework that effectively distills balanced knowledge from biased teachers in long-tailed distributions by decomposing and rebalancing class group predictions.

Contribution

LTKD reformulates knowledge distillation into cross-group and within-group components, addressing teacher bias and improving tail-class performance in long-tailed datasets.

Findings

01

LTKD outperforms existing methods on CIFAR-100-LT, TinyImageNet-LT, and ImageNet-LT.

02

It significantly improves tail-class accuracy.

03

The method effectively mitigates teacher bias in long-tailed distributions.

Abstract

Conventional knowledge distillation, designed for model compression, fails on long-tailed distributions because the teacher model tends to be biased toward head classes and provides limited supervision for tail classes. We propose Long-Tailed Knowledge Distillation (LTKD), a novel framework that reformulates the conventional objective into two components: a cross-group loss, capturing mismatches in prediction distributions across class groups (head, medium, and tail), and a within-group loss, capturing discrepancies within each group's distribution. This decomposition reveals the specific sources of the teacher's bias. To mitigate the inherited bias, LTKD introduces (1) a rebalanced cross-group loss that calibrates the teacher's group-level predictions and (2) a reweighted within-group loss that ensures equal contribution from all groups. Extensive experiments on CIFAR-100-LT,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.