Where Relevance Emerges: A Layer-Wise Study of Internal Attention for Zero-Shot Re-Ranking

Haodong Chen; Shengyao Zhuang; Zheng Yao; Guido Zuccon; and Teerapong Leelanupab

arXiv:2602.22591·cs.IR·April 28, 2026

Where Relevance Emerges: A Layer-Wise Study of Internal Attention for Zero-Shot Re-Ranking

Haodong Chen, Shengyao Zhuang, Zheng Yao, Guido Zuccon, and Teerapong Leelanupab

PDF

1 Repo

TL;DR

This paper investigates internal attention signals within transformer layers for zero-shot re-ranking, revealing a universal relevance distribution and proposing a Selective-ICR method that improves efficiency and effectiveness.

Contribution

It provides a comprehensive layer-wise analysis of internal attention, introduces a universal relevance distribution, and proposes a Selective-ICR strategy that enhances re-ranking efficiency without sacrificing performance.

Findings

01

A universal bell-curve distribution of relevance signals across transformer layers.

02

Selective-ICR reduces inference latency by 30%-50%.

03

A zero-shot 8B model matches or outperforms larger models and state-of-the-art methods.

Abstract

Zero-shot document re-ranking with Large Language Models (LLMs) has evolved from Pointwise methods to Listwise and Setwise approaches that optimize computational efficiency. Despite their success, these methods predominantly rely on generative scoring or output logits, which face bottlenecks in inference latency and result consistency. In-Context Re-ranking (ICR) has recently been proposed as an O(1) alternative method. ICR extracts internal attention signals directly, avoiding the overhead of text generation. However, existing ICR methods simply aggregate signals across all layers; layer-wise contributions and their consistency across architectures have been left unexplored. Furthermore, no unified study has compared internal attention with traditional generative and likelihood-based mechanisms across diverse ranking frameworks under consistent conditions. In this paper, we conduct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ielab/Selective-ICR
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.