REDistill: Robust Estimator Distillation for Balancing Robustness and Efficiency

Ondrej Tybl; Lukas Neumann

arXiv:2602.04677·cs.LG·February 5, 2026

REDistill: Robust Estimator Distillation for Balancing Robustness and Efficiency

Ondrej Tybl, Lukas Neumann

PDF

Open Access

TL;DR

REDistill introduces a robust distillation framework using power divergence loss to effectively handle noisy teacher outputs, improving student model accuracy without extensive hyper-parameter tuning across various architectures.

Contribution

It proposes a principled robust estimator distillation method based on power divergence, enhancing knowledge distillation robustness and generalization.

Findings

01

Consistently improves student accuracy on CIFAR-100 and ImageNet-1k.

02

Requires no hyper-parameter tuning for different teacher-student pairs.

03

Seamlessly integrates into existing KD pipelines with negligible overhead.

Abstract

Knowledge Distillation (KD) transfers knowledge from a large teacher model to a smaller student by aligning their predictive distributions. However, conventional KD formulations - typically based on Kullback-Leibler divergence - assume that the teacher provides reliable soft targets. In practice, teacher predictions are often noisy or overconfident, and existing correction-based approaches rely on ad-hoc heuristics and extensive hyper-parameter tuning, which hinders generalization. We introduce REDistill (Robust Estimator Distillation), a simple yet principled framework grounded in robust statistics. REDistill replaces the standard KD objective with a power divergence loss, a generalization of KL divergence that adaptively downweights unreliable teacher output while preserving informative logit relationships. This formulation provides a unified and interpretable treatment of teacher…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning