Eliciting In-context Retrieval and Reasoning for Long-context Large Language Models

Yifu Qiu; Varun Embar; Yizhe Zhang; Navdeep Jaitly; Shay B. Cohen; Benjamin Han

arXiv:2501.08248·cs.CL·June 10, 2025

Eliciting In-context Retrieval and Reasoning for Long-context Large Language Models

Yifu Qiu, Varun Embar, Yizhe Zhang, Navdeep Jaitly, Shay B. Cohen, Benjamin Han

PDF

Open Access

TL;DR

This paper introduces a new benchmark, ICR^2, to evaluate long-context language models in realistic retrieval and reasoning tasks, and proposes methods that significantly improve their performance, even surpassing GPT-4-Turbo on some tasks.

Contribution

The paper presents a new benchmark ICR^2 for realistic evaluation of LCLMs and introduces three methods to enhance their retrieval and reasoning capabilities.

Findings

01

Significant performance improvements on LOFT and ICR^2 benchmarks.

02

Best method outperforms GPT-4-Turbo on most tasks.

03

Methods yield +15 to +17 points in Exact Match scores.

Abstract

Recent advancements in long-context language models (LCLMs) promise to transform Retrieval-Augmented Generation (RAG) by simplifying pipelines. With their expanded context windows, LCLMs can process entire knowledge bases and perform retrieval and reasoning directly -- a capability we define as In-Context Retrieval and Reasoning (ICR^2). However, existing benchmarks like LOFT often overestimate LCLM performance by providing overly simplified contexts. To address this, we introduce ICR^2, a benchmark that evaluates LCLMs in more realistic scenarios by including confounding passages retrieved with strong retrievers. We then propose three methods to enhance LCLM performance: (1) retrieve-then-generate fine-tuning, (2) retrieval-attention-probing, which uses attention heads to filter and de-noise long contexts during decoding, and (3) joint retrieval head training alongside the generation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Semantic Web and Ontologies

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Layer Normalization · Dense Connections · Linear Warmup With Linear Decay · WordPiece · Attention Dropout · Adam · Residual Connection · Dropout