A Scalable Benchmark for Repository-Oriented Long-Horizon Conversational Context Management
Yang Liu, Li Zhang, Fang Liu, Ping Lin, Xinyi Li

TL;DR
This paper introduces LoCoEval, a new benchmark for evaluating long-horizon conversational context management in repository-oriented development, highlighting challenges and proposing an improved unified memory approach.
Contribution
It presents the first dedicated benchmark for repository-oriented conversations, evaluates existing methods, and proposes a novel unified memory approach to improve context management.
Findings
Existing methods struggle with long-horizon repository conversations.
The proposed unified memory approach outperforms baseline methods.
LoCoEval provides a realistic evaluation framework for future research.
Abstract
In recent years, large language models (LLMs) have advanced rapidly, substantially enhancing their code understanding and generation capabilities and giving rise to powerful code assistants. However, in practical repository development, excessively long-horizon conversational context may overwhelm models, causing the loss of critical information and degraded performance, thereby limiting the utility of code assistants. Existing context management methods proposed to mitigate this context dilemma primarily target general-purpose conversations, while repository-oriented solutions remain largely unexplored, which is largely due to the lack of reliable evaluation benchmarks. To bridge this gap, we present LoCoEval, the first long-horizon conversational context management benchmark tailored to repository-oriented development scenarios. Adhering to three key principles, LoCoEval is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpreadsheets and End-User Computing · AI in Service Interactions · Topic Modeling
