Parameter-Free Logit Distillation via Sorting Mechanism

Stephen Ekaputra Limantoro

arXiv:2508.16544·eess.SP·August 25, 2025

Parameter-Free Logit Distillation via Sorting Mechanism

Stephen Ekaputra Limantoro

PDF

TL;DR

This paper introduces a parameter-free logit distillation method that uses a sorting mechanism to correct teacher predictions and reorder logits, improving knowledge transfer in neural networks.

Contribution

It proposes a novel, plug-and-play logit processing scheme based on sorting to enhance existing knowledge distillation methods by fixing incorrect predictions and reordering logits.

Findings

01

Effective on CIFAR-100 and ImageNet datasets

02

Improves accuracy of student models

03

Compatible with existing KD methods

Abstract

Knowledge distillation (KD) aims to distill the knowledge from the teacher (larger) to the student (smaller) model via soft-label for the efficient neural network. In general, the performance of a model is determined by accuracy, which is measured with labels. However, existing KD approaches usually use the teacher with its original distribution, neglecting the potential of incorrect prediction. This may contradict the motivation of hard-label learning through cross-entropy loss, which may lead to sub-optimal knowledge distillation on certain samples. To address this issue, we propose a novel logit processing scheme via a sorting mechanism. Specifically, our method has a two-fold goal: (1) fixing the incorrect prediction of the teacher based on the labels and (2) reordering the distribution in a natural way according to priority rank at once. As an easy-to-use, plug-and-play…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.