Reproducing and Comparing Distillation Techniques for Cross-Encoders

Victor Morand; Mathias Vast; Basile Van Cooten; Laure Soulier; Josiane Mothe; Benjamin Piwowarski

arXiv:2603.03010·cs.IR·March 4, 2026

Reproducing and Comparing Distillation Techniques for Cross-Encoders

Victor Morand, Mathias Vast, Basile Van Cooten, Laure Soulier, Josiane Mothe, Benjamin Piwowarski

PDF

Open Access 10 Models

TL;DR

This paper systematically compares various distillation techniques for cross-encoders across different transformer backbones, revealing that relative comparison objectives significantly improve effectiveness, often matching the gains from larger models.

Contribution

It provides a comprehensive reproduction and comparison of distillation strategies for cross-encoders across multiple architectures and datasets, highlighting the importance of objective choice.

Findings

01

Relative comparison objectives outperform pointwise baselines.

02

Objective choice can match the benefits of larger backbone architectures.

03

Distillation strategies are effective across diverse transformer models.

Abstract

Recent advances in Information Retrieval have established transformer-based cross-encoders as a keystone in IR. Recent studies have focused on knowledge distillation and showed that, with the right strategy, traditional cross-encoders could reach the level of effectiveness of LLM re-rankers. Yet, comparisons with previous training strategies, including distillation from strong cross-encoder teachers, remain unclear. In addition, few studies cover a similar range of backbone encoders, while substantial improvements have been made in this area since BERT. This lack of comprehensive studies in controlled environments makes it difficult to identify robust design choices. In this work, we reproduce \citet{schlattRankDistiLLMClosingEffectiveness2025} LLM-based distillation strategy and compare it to \citet{hofstatterImprovingEfficientNeural2020} approach based on an ensemble of cross-encoder…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Explainable Artificial Intelligence (XAI) · Face recognition and analysis