Probing Ranking LLMs: A Mechanistic Analysis for Information Retrieval

Tanya Chowdhury; Atharva Nijasure; James Allan

arXiv:2410.18527·cs.IR·July 23, 2025

Probing Ranking LLMs: A Mechanistic Analysis for Information Retrieval

Tanya Chowdhury, Atharva Nijasure, James Allan

PDF

Open Access

TL;DR

This paper investigates the internal mechanisms of fine-tuned ranking LLMs using probing analysis, revealing how they encode known IR features and respond to out-of-distribution data, thereby enhancing interpretability and reliability.

Contribution

It provides a detailed mechanistic analysis of ranking LLMs, identifying encoded IR features and their generalization behaviors, which was previously unexplored.

Findings

01

Identification of known IR features in LLM activations

02

Detection of missing or underrepresented features

03

Analysis of model responses to out-of-distribution data

Abstract

Transformer networks, particularly those achieving performance comparable to GPT models, are well known for their robust feature extraction abilities. However, the nature of these extracted features and their alignment with human-engineered ones remain unexplored. In this work, we investigate the internal mechanisms of state-of-the-art, fine-tuned LLMs for passage reranking. We employ a probing-based analysis to examine neuron activations in ranking LLMs, identifying the presence of known human-engineered and semantic features. Our study spans a broad range of feature categories, including lexical signals, document structure, query-document interactions, and complex semantic representations, to uncover underlying patterns influencing ranking decisions. Through experiments on four different ranking LLMs, we identify statistical IR features that are prominently encoded in LLM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRough Sets and Fuzzy Logic · Natural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Cosine Annealing · Multi-Head Attention · Linear Warmup With Cosine Annealing · Adam · Softmax · Dropout · Byte Pair Encoding · Layer Normalization