CATCODER: Repository-Level Code Generation with Relevant Code and Type Context

Zhiyuan Pan; Xing Hu; Xin Xia; Xiaohu Yang

arXiv:2406.03283·cs.SE·November 24, 2025·3 cites

CATCODER: Repository-Level Code Generation with Relevant Code and Type Context

Zhiyuan Pan, Xing Hu, Xin Xia, Xiaohu Yang

PDF

Open Access

TL;DR

CatCoder is a new framework that improves repository-level code generation by integrating relevant code and type context, leading to significant performance gains across multiple models and languages.

Contribution

It introduces a static analysis-based approach to incorporate type dependencies into prompts, enhancing code generation accuracy for statically typed languages.

Findings

01

Outperforms baseline by up to 17.35% in pass@k scores.

02

Demonstrates consistent improvements across various LLMs.

03

Scales effectively to large open source repositories.

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities in code generation tasks. However, repository-level code generation presents unique challenges, particularly due to the need to utilize information spread across multiple files within a repository. Specifically, successful generation depends on a solid grasp of both general, context-agnostic knowledge and specific, context-dependent knowledge. While LLMs are widely used for the context-agnostic aspect, existing retrieval-based approaches sometimes fall short as they are limited in obtaining a broader and deeper repository context. In this paper, we present CatCoder, a novel code generation framework designed for statically typed programming languages. CatCoder enhances repository-level code generation by integrating relevant code and type context. Specifically, it leverages static analyzers to extract type…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Web Applications and Data Management · Model-Driven Software Engineering Techniques