Enhancing Logits Distillation with Plug\&Play Kendall's $\tau$ Ranking Loss
Yuchen Guan, Runxi Cheng, Kang Liu, Chun Yuan

TL;DR
This paper introduces a plug-and-play Kendall's tau ranking loss to improve logits distillation by balancing gradient updates, leading to better transfer of inter-class information and enhanced student model performance.
Contribution
The authors propose a novel auxiliary ranking loss based on Kendall's tau that complements existing distillation methods and addresses gradient imbalance issues.
Findings
Consistently improves performance across datasets and architectures.
Balances gradients towards low-probability channels effectively.
Enhances inter-class relational information transfer.
Abstract
Knowledge distillation typically minimizes the Kullback-Leibler (KL) divergence between teacher and student logits. However, optimizing the KL divergence can be challenging for the student and often leads to sub-optimal solutions. We further show that gradients induced by KL divergence scale with the magnitude of the teacher logits, thereby diminishing updates on low-probability channels. This imbalance weakens the transfer of inter-class information and in turn limits the performance improvements achievable by the student. To mitigate this issue, we propose a plug-and-play auxiliary ranking loss based on Kendall's coefficient that can be seamlessly integrated into any logit-based distillation framework. It supplies inter-class relational information while rebalancing gradients toward low-probability channels. We demonstrate that the proposed ranking loss is largely invariant to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFuzzy Systems and Optimization
MethodsSoftmax · Attention Is All You Need · Knowledge Distillation · Focus
