Reproducing and Comparing Distillation Techniques for Cross-Encoders
Victor Morand, Mathias Vast, Basile Van Cooten, Laure Soulier, Josiane Mothe, Benjamin Piwowarski

TL;DR
This paper systematically compares various distillation techniques for cross-encoders across different transformer backbones, revealing that relative comparison objectives significantly improve effectiveness, often matching the gains from larger models.
Contribution
It provides a comprehensive reproduction and comparison of distillation strategies for cross-encoders across multiple architectures and datasets, highlighting the importance of objective choice.
Findings
Relative comparison objectives outperform pointwise baselines.
Objective choice can match the benefits of larger backbone architectures.
Distillation strategies are effective across diverse transformer models.
Abstract
Recent advances in Information Retrieval have established transformer-based cross-encoders as a keystone in IR. Recent studies have focused on knowledge distillation and showed that, with the right strategy, traditional cross-encoders could reach the level of effectiveness of LLM re-rankers. Yet, comparisons with previous training strategies, including distillation from strong cross-encoder teachers, remain unclear. In addition, few studies cover a similar range of backbone encoders, while substantial improvements have been made in this area since BERT. This lack of comprehensive studies in controlled environments makes it difficult to identify robust design choices. In this work, we reproduce \citet{schlattRankDistiLLMClosingEffectiveness2025} LLM-based distillation strategy and compare it to \citet{hofstatterImprovingEfficientNeural2020} approach based on an ensemble of cross-encoder…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗xpmir/cross-encoder-ettin-150m-infoNCEmodel· 71 dl71 dl
- 🤗xpmir/cross-encoder-ELECTRA-MarginMSEmodel· 72 dl72 dl
- 🤗xpmir/cross-encoder-RoBERTa-Hingemodel· 69 dl69 dl
- 🤗xpmir/cross-encoder-DeBERTav3-MarginMSEmodel· 59 dl59 dl
- 🤗xpmir/cross-encoder-DeBERTav3-Hingemodel· 56 dl56 dl
- 🤗xpmir/cross-encoder-RoBERTa-DistillRankNETmodel· 62 dl62 dl
- 🤗xpmir/cross-encoder-ELECTRA-DistillRankNETmodel· 55 dl55 dl
- 🤗xpmir/cross-encoder-ettin-150m-Hingemodel· 62 dl62 dl
- 🤗xpmir/cross-encoder-DeBERTav3-DistillRankNETmodel· 57 dl57 dl
- 🤗xpmir/cross-encoder-ELECTRA-Hingemodel· 58 dl58 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Explainable Artificial Intelligence (XAI) · Face recognition and analysis
