Weighted KL-Divergence for Document Ranking Model Refinement

Yingrui Yang; Yifan Qiao; Shanxiu He; Tao Yang

arXiv:2406.05977·cs.IR·June 11, 2024

Weighted KL-Divergence for Document Ranking Model Refinement

Yingrui Yang, Yifan Qiao, Shanxiu He, Tao Yang

PDF

Open Access

TL;DR

This paper introduces a contrastive reweighting of KL divergence in transformer-based document ranking models, enhancing their alignment with teachers and improving search relevance on benchmark datasets.

Contribution

It proposes a novel contrastive reweighting method for KL divergence to better align student and teacher models in document ranking tasks.

Findings

01

Improved relevance scores on MS MARCO and BEIR datasets.

02

Enhanced model alignment with teacher models.

03

Effective separation of positive and negative documents.

Abstract

Transformer-based retrieval and reranking models for text document search are often refined through knowledge distillation together with contrastive learning. A tight distribution matching between the teacher and student models can be hard as over-calibration may degrade training effectiveness when a teacher does not perform well. This paper contrastively reweights KL divergence terms to prioritize the alignment between a student and a teacher model for proper separation of positive and negative documents. This paper analyzes and evaluates the proposed loss function on the MS MARCO and BEIR datasets to demonstrate its effectiveness in improving the relevance of tested student models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRough Sets and Fuzzy Logic

MethodsKnowledge Distillation