Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking

Wuwei Zhang; Fangcong Yin; Howard Yen; Danqi Chen; Xi Ye

arXiv:2506.09944·cs.CL·September 30, 2025

Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking

Wuwei Zhang, Fangcong Yin, Howard Yen, Danqi Chen, Xi Ye

PDF

Open Access 1 Repo

TL;DR

This paper introduces QRHead and QRRetriever, novel methods that improve long-context reasoning and re-ranking in language models by leveraging query-focused attention, leading to significant performance gains and interpretability insights.

Contribution

The paper presents QRHead and QRRetriever, new techniques for enhancing retrieval and reasoning in long-context language models, with demonstrated improvements over existing methods.

Findings

01

Over 10% performance improvement on LongMemEval and CLIPPER

02

Outperforms strong dense retrievers in long-context reasoning

03

Achieves strong zero-shot re-ranking performance on BEIR

Abstract

Recent work has identified retrieval heads, a subset of attention heads responsible for retrieving salient information in long-context language models (LMs), as measured by their copy-paste behavior in Needlein-a-Haystack tasks. In this paper, we introduce QRHead (Query-Focused Retrieval Head), an improved set of attention heads that enhance retrieval from long context. We identify QRHead by aggregating attention scores with respect to the input query, using a handful of examples from real-world tasks (e.g., long-context QA). We further introduce QRRetriever, an efficient and effective retriever that uses the accumulated attention mass of QRHead as retrieval scores. We use QRRetriever for long-context reasoning by selecting the most relevant parts with the highest retrieval scores. On multi-hop reasoning tasks LongMemEval and CLIPPER, this yields over 10% performance gains over full…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

princeton-pli/qrhead
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Semantic Web and Ontologies

MethodsSoftmax · Attention Is All You Need · simple Copy-Paste · Sparse Evolutionary Training