PLD: A Choice-Theoretic List-Wise Knowledge Distillation

Ejafa Bassam; Dawei Zhu; Kaigui Bian

arXiv:2506.12542·cs.LG·May 12, 2026

PLD: A Choice-Theoretic List-Wise Knowledge Distillation

Ejafa Bassam, Dawei Zhu, Kaigui Bian

PDF

1 Video

TL;DR

This paper introduces Plackett-Luce Distillation (PLD), a novel list-wise ranking loss for knowledge distillation that models teacher logits as class rankings, leading to improved model compression.

Contribution

It recasts knowledge distillation using a choice-theoretic approach with the Plackett-Luce model, introducing a convex, ranking-based loss that enhances transfer of teacher knowledge.

Findings

01

PLD achieves consistent performance improvements across multiple datasets.

02

It outperforms traditional divergence and correlation-based distillation methods.

03

The approach is effective for diverse architectures and teacher-student configurations.

Abstract

Knowledge distillation is a model compression technique in which a compact "student" network is trained to replicate the predictive behavior of a larger "teacher" network. In logit-based knowledge distillation, it has become the de facto approach to augment cross-entropy with a distillation term. Typically, this term is either a KL divergence that matches marginal probabilities or a correlation-based loss that captures intra- and inter-class relationships. In every case, it acts as an additional term to cross-entropy. This term has its own weight, which must be carefully tuned. In this paper, we adopt a choice-theoretic perspective and recast knowledge distillation under the Plackett-Luce model by interpreting teacher logits as "worth" scores. We introduce "Plackett-Luce Distillation (PLD)", a weighted list-wise ranking loss. In PLD, the teacher model transfers knowledge of its full…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

PLD: A Choice-Theoretic List-Wise Knowledge Distillation· slideslive