HeadRank: Decoding-Free Passage Reranking via Preference-Aligned Attention Heads
Juyuan Wang, Chenxing Wang, Yuchen Fang, Huiyun Hu, Junwu Du, Aolin Li, Shunlin Rong, Haijun Wu, Jin Xu, Ligang Liu, Dongliang Liao

TL;DR
HeadRank introduces a decoding-free passage reranking method that leverages attention weights for efficient and highly accurate document ranking across multiple benchmarks, outperforming existing baselines.
Contribution
The paper presents a novel attention-based reranking framework that enhances preference discrimination in LLM attention weights, reducing inference complexity and improving ranking performance.
Findings
Achieves highest average NDCG@10 across 14 benchmarks on three Qwen3 scales.
Significantly improves middle-zone document relevance discrimination.
Demonstrates 43-percentage-point gap in relevance for top-ranked middle documents.
Abstract
Decoding-free reranking methods that read relevance signals directly from LLM attention weights offer significant latency advantages over autoregressive approaches, yet suffer from attention score homogenization: middle-context documents receive near-identical scores, destroying the fine-grained distinctions required for ranking. We propose HeadRank, a framework that lifts preference optimization from discrete token space into the continuous attention domain through entropy-regularized head selection, hard adjacent-level preference pairs, and a distribution regularizer that jointly sharpen discriminability in the homogenized middle zone. Depth truncation at the deepest selected layer further reduces inference to forward passes. Across 14 benchmarks on three Qwen3 scales (0.6B--4B) using only 211 training queries, HeadRank achieves the highest average NDCG@10 at every…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
