REDistill: Robust Estimator Distillation for Balancing Robustness and Efficiency
Ondrej Tybl, Lukas Neumann

TL;DR
REDistill introduces a robust distillation framework using power divergence loss to effectively handle noisy teacher outputs, improving student model accuracy without extensive hyper-parameter tuning across various architectures.
Contribution
It proposes a principled robust estimator distillation method based on power divergence, enhancing knowledge distillation robustness and generalization.
Findings
Consistently improves student accuracy on CIFAR-100 and ImageNet-1k.
Requires no hyper-parameter tuning for different teacher-student pairs.
Seamlessly integrates into existing KD pipelines with negligible overhead.
Abstract
Knowledge Distillation (KD) transfers knowledge from a large teacher model to a smaller student by aligning their predictive distributions. However, conventional KD formulations - typically based on Kullback-Leibler divergence - assume that the teacher provides reliable soft targets. In practice, teacher predictions are often noisy or overconfident, and existing correction-based approaches rely on ad-hoc heuristics and extensive hyper-parameter tuning, which hinders generalization. We introduce REDistill (Robust Estimator Distillation), a simple yet principled framework grounded in robust statistics. REDistill replaces the standard KD objective with a power divergence loss, a generalization of KL divergence that adaptively downweights unreliable teacher output while preserving informative logit relationships. This formulation provides a unified and interpretable treatment of teacher…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
