Distillation versus Contrastive Learning: How to Train Your Rerankers

Zhichao Xu; Zhiqi Huang; Shengyao Zhuang; Vivek Srikumar

arXiv:2507.08336·cs.CL·November 7, 2025

Distillation versus Contrastive Learning: How to Train Your Rerankers

Zhichao Xu, Zhiqi Huang, Shengyao Zhuang, Vivek Srikumar

PDF

TL;DR

This paper empirically compares contrastive learning and knowledge distillation for training text rerankers, finding distillation from larger models generally improves performance, especially in-domain and out-of-domain, guiding practical training choices.

Contribution

It provides a comprehensive empirical comparison of contrastive learning and knowledge distillation for training cross-encoder rerankers across various sizes and architectures.

Findings

01

Knowledge distillation from larger teachers improves reranker performance.

02

Distillation benefits are less pronounced when teachers are of similar capacity.

03

Contrastive learning remains a strong baseline when no large teacher is available.

Abstract

Training effective text rerankers is crucial for information retrieval. Two strategies are widely used: contrastive learning (optimizing directly on ground-truth labels) and knowledge distillation (transferring knowledge from a larger reranker). While both have been studied extensively, a clear comparison of their effectiveness for training cross-encoder rerankers under practical conditions is needed. This paper empirically compares these strategies by training rerankers of different sizes (0.5B, 1.5B, 3B, 7B) and architectures (Transformer, Recurrent) using both methods on the same data, with a strong contrastive learning model acting as the distillation teacher. Our results show that knowledge distillation generally yields better in-domain and out-of-domain ranking performance than contrastive learning when distilling from a more performant teacher model. This finding is consistent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.