On the Impacts of Contexts on Repository-Level Code Generation

Nam Le Hai; Dung Manh Nguyen; Nghi D. Q. Bui

arXiv:2406.11927·cs.SE·February 11, 2025·2 cites

On the Impacts of Contexts on Repository-Level Code Generation

Nam Le Hai, Dung Manh Nguyen, Nghi D. Q. Bui

PDF

Open Access 1 Repo 2 Datasets 1 Video

TL;DR

This paper introduces RepoExec, a benchmark for evaluating repository-level code generation, emphasizing context utilization, correctness, and debugging, and presents findings on model performance with new datasets and metrics.

Contribution

It presents RepoExec, a novel benchmark with datasets and metrics for assessing repository-level code generation, focusing on context handling and functional correctness.

Findings

01

Pretrained LLMs excel in correctness.

02

Instruction-tuned models improve context utilization.

03

RepoExec effectively evaluates code functionality and developer intent alignment.

Abstract

CodeLLMs have gained widespread adoption for code generation tasks, yet their capacity to handle repository-level code generation with complex contextual dependencies remains underexplored. Our work underscores the critical importance of leveraging repository-level contexts to generate executable and functionally correct code. We present RepoExec, a novel benchmark designed to evaluate repository-level code generation, with a focus on three key aspects: executability, functional correctness through comprehensive test case generation, and accurate utilization of cross-file contexts. Our study examines a controlled scenario where developers specify essential code dependencies (contexts), challenging models to integrate them effectively. Additionally, we introduce an instruction-tuned dataset that enhances CodeLLMs' ability to leverage dependencies, along with a new metric, Dependency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

FSoft-AI4Code/RepoExec
noneOfficial

Datasets

Videos

On the Impacts of Contexts on Repository-Level Code Generation· underline

Taxonomy

TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Model-Driven Software Engineering Techniques

MethodsFocus