LOCA-bench: Benchmarking Language Agents Under Controllable and Extreme Context Growth
Weihao Zeng, Yuzhen Huang, Junxian He

TL;DR
LOCA-bench introduces a scalable benchmark for evaluating language agents' performance under controllably increasing context lengths, addressing the challenge of context rot in long-term tasks.
Contribution
This paper presents LOCA-bench, a novel benchmark that enables controlled, potentially infinite context growth to assess language agents' robustness and effectiveness in dynamic environments.
Findings
Advanced context management improves success rates.
Performance degrades with increasing environment complexity.
Benchmark is open-source for community use.
Abstract
Large language models (LLMs) are increasingly capable of carrying out long-running, real-world tasks. However, as the amount of context grows, their reliability often deteriorates, a phenomenon known as "context rot". Existing long-context benchmarks primarily focus on single-step settings that evaluate a model's ability to retrieve information from a long snippet. In realistic scenarios, however, LLMs often need to act as agents that explore environments, follow instructions and plans, extract useful information, and predict correct actions under a dynamically growing context. To assess language agents in such settings, we introduce LOCA-bench (a benchmark for LOng-Context Agents). Given a task prompt, LOCA-bench leverages automated and scalable control of environment states to regulate the agent's context length. This design enables LOCA-bench to extend the context length potentially…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI)
