Parameter-Free Logit Distillation via Sorting Mechanism
Stephen Ekaputra Limantoro

TL;DR
This paper introduces a parameter-free logit distillation method that uses a sorting mechanism to correct teacher predictions and reorder logits, improving knowledge transfer in neural networks.
Contribution
It proposes a novel, plug-and-play logit processing scheme based on sorting to enhance existing knowledge distillation methods by fixing incorrect predictions and reordering logits.
Findings
Effective on CIFAR-100 and ImageNet datasets
Improves accuracy of student models
Compatible with existing KD methods
Abstract
Knowledge distillation (KD) aims to distill the knowledge from the teacher (larger) to the student (smaller) model via soft-label for the efficient neural network. In general, the performance of a model is determined by accuracy, which is measured with labels. However, existing KD approaches usually use the teacher with its original distribution, neglecting the potential of incorrect prediction. This may contradict the motivation of hard-label learning through cross-entropy loss, which may lead to sub-optimal knowledge distillation on certain samples. To address this issue, we propose a novel logit processing scheme via a sorting mechanism. Specifically, our method has a two-fold goal: (1) fixing the incorrect prediction of the teacher based on the labels and (2) reordering the distribution in a natural way according to priority rank at once. As an easy-to-use, plug-and-play…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
