Axiomatic Causal Interventions for Reverse Engineering Relevance Computation in Neural Retrieval Models
Catherine Chen, Jack Merullo, Carsten Eickhoff

TL;DR
This paper introduces causal intervention techniques to reverse engineer neural retrieval models, revealing internal mechanisms like attention heads that detect duplicate tokens and contribute to relevance scoring.
Contribution
It proposes a novel causal intervention approach combined with mechanistic interpretability to understand neural ranker decision processes at a granular level.
Findings
Attention heads detect duplicate tokens early in the model
Downstream attention heads integrate signals to determine relevance
Mechanistic analysis reveals how models satisfy term-frequency axioms
Abstract
Neural models have demonstrated remarkable performance across diverse ranking tasks. However, the processes and internal mechanisms along which they determine relevance are still largely unknown. Existing approaches for analyzing neural ranker behavior with respect to IR properties rely either on assessing overall model behavior or employing probing methods that may offer an incomplete understanding of causal mechanisms. To provide a more granular understanding of internal model decision-making processes, we propose the use of causal interventions to reverse engineer neural rankers, and demonstrate how mechanistic interpretability methods can be used to isolate components satisfying term-frequency axioms within a ranking model. We identify a group of attention heads that detect duplicate tokens in earlier layers of the model, then communicate with downstream heads to compute overall…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Machine Learning and Algorithms
