Enhancing Logits Distillation with Plug\&Play Kendall's $\tau$ Ranking Loss

Yuchen Guan; Runxi Cheng; Kang Liu; Chun Yuan

arXiv:2409.17823·cs.CV·June 17, 2025

Enhancing Logits Distillation with Plug\&Play Kendall's $\tau$ Ranking Loss

Yuchen Guan, Runxi Cheng, Kang Liu, Chun Yuan

PDF

Open Access

TL;DR

This paper introduces a plug-and-play Kendall's tau ranking loss to improve logits distillation by balancing gradient updates, leading to better transfer of inter-class information and enhanced student model performance.

Contribution

The authors propose a novel auxiliary ranking loss based on Kendall's tau that complements existing distillation methods and addresses gradient imbalance issues.

Findings

01

Consistently improves performance across datasets and architectures.

02

Balances gradients towards low-probability channels effectively.

03

Enhances inter-class relational information transfer.

Abstract

Knowledge distillation typically minimizes the Kullback-Leibler (KL) divergence between teacher and student logits. However, optimizing the KL divergence can be challenging for the student and often leads to sub-optimal solutions. We further show that gradients induced by KL divergence scale with the magnitude of the teacher logits, thereby diminishing updates on low-probability channels. This imbalance weakens the transfer of inter-class information and in turn limits the performance improvements achievable by the student. To mitigate this issue, we propose a plug-and-play auxiliary ranking loss based on Kendall's $τ$ coefficient that can be seamlessly integrated into any logit-based distillation framework. It supplies inter-class relational information while rebalancing gradients toward low-probability channels. We demonstrate that the proposed ranking loss is largely invariant to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFuzzy Systems and Optimization

MethodsSoftmax · Attention Is All You Need · Knowledge Distillation · Focus