CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges
Kechi Zhang, Jia Li, Ge Li, Xianjie Shi, Zhi Jin

TL;DR
This paper introduces CodeAgent, an LLM-based agent framework that uses external tools to improve repo-level code generation, outperforming existing models and commercial tools on real-world Python projects.
Contribution
The paper presents a novel agent framework, CodeAgent, which integrates external tools to enhance LLM performance in complex, repo-level code generation tasks.
Findings
CodeAgent improves LLM performance by 18.1% to 250%.
CodeAgent outperforms GitHub Copilot in accuracy and efficiency.
Benchmark results demonstrate robustness across various code generation tasks.
Abstract
Large Language Models (LLMs) have shown promise in automated code generation but typically excel only in simpler tasks such as generating standalone code units. Real-world software development, however, often involves complex code repositories (named repo) with complex dependencies and extensive documentation. To fill this gap, our research pivots towards evaluating LLMs in a more realistic setting -- real-world repo-level code generation. We introduce CodeAgentBench, a manually curated benchmark for repo-level code generation. This benchmark comprises five high-quality Python projects, encompassing a total of 101 samples. We assess nine leading LLMs on repo-level tasks and observe a decline in their performance. To tackle this, we present CodeAgent, a novel LLM-based agent framework that employs external tools for effective repo-level code generation. CodeAgent integrates five…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Natural Language Processing Techniques
