PILE: Pairwise Iterative Logits Ensemble for Multi-Teacher Labeled   Distillation

Lianshang Cai; Linhao Zhang; Dehong Ma; Jun Fan; Daiting Shi; Yi Wu,; Zhicong Cheng; Simiu Gu; Dawei Yin

arXiv:2211.06059·cs.IR·November 14, 2022

PILE: Pairwise Iterative Logits Ensemble for Multi-Teacher Labeled Distillation

Lianshang Cai, Linhao Zhang, Dehong Ma, Jun Fan, Daiting Shi, Yi Wu,, Zhicong Cheng, Simiu Gu, Dawei Yin

PDF

Open Access

TL;DR

This paper introduces PILE, a novel ensemble method for multi-teacher knowledge distillation in ranking models, which iteratively combines logits using label information, leading to improved performance in search systems.

Contribution

The paper proposes PILE, a unified algorithm that effectively ensembles multi-teacher logits and leverages label data in distillation, advancing ranking model performance.

Findings

01

Achieved competitive results in offline and online experiments.

02

Successfully deployed in a real-world commercial search system.

03

Demonstrated effectiveness of iterative ensemble with label supervision.

Abstract

Pre-trained language models have become a crucial part of ranking systems and achieved very impressive effects recently. To maintain high performance while keeping efficient computations, knowledge distillation is widely used. In this paper, we focus on two key questions in knowledge distillation for ranking models: 1) how to ensemble knowledge from multi-teacher; 2) how to utilize the label information of data in the distillation process. We propose a unified algorithm called Pairwise Iterative Logits Ensemble (PILE) to tackle these two questions simultaneously. PILE ensembles multi-teacher logits supervised by label information in an iterative way and achieved competitive performance in both offline and online experiments. The proposed method has been deployed in a real-world commercial search system.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Multimodal Machine Learning Applications · Advanced Graph Neural Networks

MethodsKnowledge Distillation