LOCA-bench: Benchmarking Language Agents Under Controllable and Extreme Context Growth

Weihao Zeng; Yuzhen Huang; Junxian He

arXiv:2602.07962·cs.AI·February 10, 2026

LOCA-bench: Benchmarking Language Agents Under Controllable and Extreme Context Growth

Weihao Zeng, Yuzhen Huang, Junxian He

PDF

Open Access

TL;DR

LOCA-bench introduces a scalable benchmark for evaluating language agents' performance under controllably increasing context lengths, addressing the challenge of context rot in long-term tasks.

Contribution

This paper presents LOCA-bench, a novel benchmark that enables controlled, potentially infinite context growth to assess language agents' robustness and effectiveness in dynamic environments.

Findings

01

Advanced context management improves success rates.

02

Performance degrades with increasing environment complexity.

03

Benchmark is open-source for community use.

Abstract

Large language models (LLMs) are increasingly capable of carrying out long-running, real-world tasks. However, as the amount of context grows, their reliability often deteriorates, a phenomenon known as "context rot". Existing long-context benchmarks primarily focus on single-step settings that evaluate a model's ability to retrieve information from a long snippet. In realistic scenarios, however, LLMs often need to act as agents that explore environments, follow instructions and plans, extract useful information, and predict correct actions under a dynamically growing context. To assess language agents in such settings, we introduce LOCA-bench (a benchmark for LOng-Context Agents). Given a task prompt, LOCA-bench leverages automated and scalable control of environment states to regulate the agent's context length. This design enables LOCA-bench to extend the context length potentially…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI)