Pathway to Relevance: How Cross-Encoders Implement a Semantic Variant of BM25

Meng Lu; Catherine Chen; Carsten Eickhoff

arXiv:2502.04645·cs.IR·November 25, 2025

Pathway to Relevance: How Cross-Encoders Implement a Semantic Variant of BM25

Meng Lu, Catherine Chen, Carsten Eickhoff

PDF

Open Access

TL;DR

This paper mechanistically analyzes how cross-encoders in information retrieval models extract and combine relevance signals, revealing that they implement a semantic variant of BM25 through internal interactions.

Contribution

It provides the first detailed mechanistic interpretation of relevance estimation in cross-encoder IR models, connecting neural activations to traditional IR signals.

Findings

01

Relevance signals like term frequency are extracted in early layers.

02

Inverse document frequency is also captured in middle layers.

03

Later layers combine signals in a manner similar to BM25.

Abstract

Mechanistic interpretation has greatly contributed to a more detailed understanding of generative language models, enabling significant progress in identifying structures that implement key behaviors through interactions between internal components. In contrast, interpretability in information retrieval (IR) remains relatively coarse-grained, and much is still unknown as to how IR models determine whether a document is relevant to a query. In this work, we address this gap by mechanistically analyzing how one commonly used model, a cross-encoder, estimates relevance. We find that the model extracts traditional relevance signals, such as term frequency and inverse document frequency, in early-to-middle layers. These concepts are then combined in later layers, similar to the well-known probabilistic ranking function, BM25. Overall, our analysis offers a more nuanced understanding of how…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsAttention Is All You Need · Label Smoothing · Byte Pair Encoding · Layer Normalization · Residual Connection · Dense Connections · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam