Pathway to Relevance: How Cross-Encoders Implement a Semantic Variant of BM25
Meng Lu, Catherine Chen, Carsten Eickhoff

TL;DR
This paper mechanistically analyzes how cross-encoders in information retrieval models extract and combine relevance signals, revealing that they implement a semantic variant of BM25 through internal interactions.
Contribution
It provides the first detailed mechanistic interpretation of relevance estimation in cross-encoder IR models, connecting neural activations to traditional IR signals.
Findings
Relevance signals like term frequency are extracted in early layers.
Inverse document frequency is also captured in middle layers.
Later layers combine signals in a manner similar to BM25.
Abstract
Mechanistic interpretation has greatly contributed to a more detailed understanding of generative language models, enabling significant progress in identifying structures that implement key behaviors through interactions between internal components. In contrast, interpretability in information retrieval (IR) remains relatively coarse-grained, and much is still unknown as to how IR models determine whether a document is relevant to a query. In this work, we address this gap by mechanistically analyzing how one commonly used model, a cross-encoder, estimates relevance. We find that the model extracts traditional relevance signals, such as term frequency and inverse document frequency, in early-to-middle layers. These concepts are then combined in later layers, similar to the well-known probabilistic ranking function, BM25. Overall, our analysis offers a more nuanced understanding of how…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsAttention Is All You Need · Label Smoothing · Byte Pair Encoding · Layer Normalization · Residual Connection · Dense Connections · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam
