Improving Neural Ranking via Lossless Knowledge Distillation

Zhen Qin; Le Yan; Yi Tay; Honglei Zhuang; Xuanhui Wang; Michael; Bendersky; Marc Najork

arXiv:2109.15285·cs.IR·April 7, 2022·1 cites

Improving Neural Ranking via Lossless Knowledge Distillation

Zhen Qin, Le Yan, Yi Tay, Honglei Zhuang, Xuanhui Wang, Michael, Bendersky, Marc Najork

PDF

Open Access

TL;DR

This paper introduces Self-Distilled neural Rankers (SDR), a novel ranking method that significantly improves neural ranking performance by using a specialized listwise distillation framework and score transformation, surpassing traditional models.

Contribution

The paper proposes SDR, a new neural ranking approach that enhances performance without increasing model size, utilizing a unique listwise distillation and score transformation tailored for ranking.

Findings

01

SDR outperforms teacher models on 7 of 9 key metrics.

02

SDR surpasses gradient boosted decision trees in ranking tasks.

03

Theoretical analysis explains the effectiveness of listwise distillation.

Abstract

We explore a novel perspective of knowledge distillation (KD) for learning to rank (LTR), and introduce Self-Distilled neural Rankers (SDR), where student rankers are parameterized identically to their teachers. Unlike the existing ranking distillation work which pursues a good trade-off between performance and efficiency, SDR is able to significantly improve ranking performance of students over the teacher rankers without increasing model capacity. The key success factors of SDR, which differs from common distillation techniques for classification are: (1) an appropriate teacher score transformation function, and (2) a novel listwise distillation framework. Both techniques are specifically designed for ranking problems and are rarely studied in the existing knowledge distillation literature. Building upon the state-of-the-art neural ranking structure, SDR is able to push the limits of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Neural Networks and Applications · Machine Learning and ELM

MethodsKnowledge Distillation