Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking
Wuwei Zhang, Fangcong Yin, Howard Yen, Danqi Chen, Xi Ye

TL;DR
This paper introduces QRHead and QRRetriever, novel methods that improve long-context reasoning and re-ranking in language models by leveraging query-focused attention, leading to significant performance gains and interpretability insights.
Contribution
The paper presents QRHead and QRRetriever, new techniques for enhancing retrieval and reasoning in long-context language models, with demonstrated improvements over existing methods.
Findings
Over 10% performance improvement on LongMemEval and CLIPPER
Outperforms strong dense retrievers in long-context reasoning
Achieves strong zero-shot re-ranking performance on BEIR
Abstract
Recent work has identified retrieval heads, a subset of attention heads responsible for retrieving salient information in long-context language models (LMs), as measured by their copy-paste behavior in Needlein-a-Haystack tasks. In this paper, we introduce QRHead (Query-Focused Retrieval Head), an improved set of attention heads that enhance retrieval from long context. We identify QRHead by aggregating attention scores with respect to the input query, using a handful of examples from real-world tasks (e.g., long-context QA). We further introduce QRRetriever, an efficient and effective retriever that uses the accumulated attention mass of QRHead as retrieval scores. We use QRRetriever for long-context reasoning by selecting the most relevant parts with the highest retrieval scores. On multi-hop reasoning tasks LongMemEval and CLIPPER, this yields over 10% performance gains over full…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Semantic Web and Ontologies
MethodsSoftmax · Attention Is All You Need · simple Copy-Paste · Sparse Evolutionary Training
