CATCODER: Repository-Level Code Generation with Relevant Code and Type Context
Zhiyuan Pan, Xing Hu, Xin Xia, Xiaohu Yang

TL;DR
CatCoder is a new framework that improves repository-level code generation by integrating relevant code and type context, leading to significant performance gains across multiple models and languages.
Contribution
It introduces a static analysis-based approach to incorporate type dependencies into prompts, enhancing code generation accuracy for statically typed languages.
Findings
Outperforms baseline by up to 17.35% in pass@k scores.
Demonstrates consistent improvements across various LLMs.
Scales effectively to large open source repositories.
Abstract
Large language models (LLMs) have demonstrated remarkable capabilities in code generation tasks. However, repository-level code generation presents unique challenges, particularly due to the need to utilize information spread across multiple files within a repository. Specifically, successful generation depends on a solid grasp of both general, context-agnostic knowledge and specific, context-dependent knowledge. While LLMs are widely used for the context-agnostic aspect, existing retrieval-based approaches sometimes fall short as they are limited in obtaining a broader and deeper repository context. In this paper, we present CatCoder, a novel code generation framework designed for statically typed programming languages. CatCoder enhances repository-level code generation by integrating relevant code and type context. Specifically, it leverages static analyzers to extract type…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Web Applications and Data Management · Model-Driven Software Engineering Techniques
