ContextBench: A Benchmark for Context Retrieval in Coding Agents

Han Li; Letian Zhu; Bohan Zhang; Rili Feng; Jiaming Wang; Yue Pan; Earl T. Barr; Federica Sarro; Zhaoyang Chu; He Ye

arXiv:2602.05892·cs.LG·February 12, 2026

ContextBench: A Benchmark for Context Retrieval in Coding Agents

Han Li, Letian Zhu, Bohan Zhang, Rili Feng, Jiaming Wang, Yue Pan, Earl T. Barr, Federica Sarro, Zhaoyang Chu, He Ye

PDF

Open Access

TL;DR

ContextBench is a comprehensive benchmark designed to evaluate how effectively coding agents retrieve and utilize code context during problem solving, revealing gaps and guiding improvements in LLM-based coding tools.

Contribution

It introduces a process-oriented evaluation framework with gold-context metrics, covering multiple languages and agent types, to analyze context retrieval in coding agents.

Findings

01

LLMs favor recall over precision in context retrieval

02

Sophisticated scaffolding yields marginal improvements

03

Significant gaps between explored and utilized context exist

Abstract

LLM-based coding agents have shown strong performance on automated issue resolution benchmarks, yet existing evaluations largely focus on final task success, providing limited insight into how agents retrieve and use code context during problem solving. We introduce ContextBench, a process-oriented evaluation of context retrieval in coding agents. ContextBench consists of 1,136 issue-resolution tasks from 66 repositories across eight programming languages, each augmented with human-annotated gold contexts. We further implement an automated evaluation framework that tracks agent trajectories and measures context recall, precision, and efficiency throughout issue resolution. Using ContextBench, we evaluate four frontier LLMs and five coding agents. Our results show that sophisticated agent scaffolding yields only marginal gains in context retrieval ("The Bitter Lesson" of coding agents),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Topic Modeling · Advanced Software Engineering Methodologies