SWE Context Bench: A Benchmark for Context Learning in Coding

Jiayuan Zhu; Junde Wu; Minhao Hu; Shengda Zhu; Jiazhen Pan; Weixiang Shen; Yijun Yang; Fenglin Liu; Jianye Hao; Yueming Jin; Qirong Ho; Min Xu

arXiv:2602.08316·cs.SE·May 7, 2026

SWE Context Bench: A Benchmark for Context Learning in Coding

Jiayuan Zhu, Junde Wu, Minhao Hu, Shengda Zhu, Jiazhen Pan, Weixiang Shen, Yijun Yang, Fenglin Liu, Jianye Hao, Yueming Jin, Qirong Ho, Min Xu

PDF

TL;DR

SWE-ContextBench is a new benchmark for evaluating how well coding agents understand and reuse context from related software engineering tasks, emphasizing retrieval accuracy and efficiency.

Contribution

It introduces a benchmark with real-world related tasks and analyzes the impact of context retrieval strategies on coding agent performance.

Findings

01

Accurate context retrieval improves task resolution accuracy.

02

Proper context management reduces runtime and token costs.

03

Incorrect context selection can negatively impact performance.

Abstract

Large language models are increasingly used as coding agents for software engineering tasks. Current benchmarks mainly evaluate whether the agent can correctly solve the request or fix the bugs. They largely treat tasks as independent and do not assess whether agents can reuse previous experience across related problems. As a result, the efficiency gains from reusing the previous experience remains difficult to measure. We introduce SWE-ContextBench, a benchmark designed to explicitly evaluate context understanding and retrieval in coding agents. SWE-ContextBench consists of 1,100 base tasks with another 376 related tasks derived from real dependency and reference relationships among GitHub issues and pull requests. SWE-ContextBench groups base tasks and related tasks with shared context across 51 unique repositories and 9 programming languages. The benchmark evaluates how accurately…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.