RLCoder: Reinforcement Learning for Repository-Level Code Completion

Yanlin Wang; Yanli Wang; Daya Guo; Jiachi Chen; Ruikai Zhang; Yuchi; Ma; Zibin Zheng

arXiv:2407.19487·cs.SE·July 31, 2024·2 cites

RLCoder: Reinforcement Learning for Repository-Level Code Completion

Yanlin Wang, Yanli Wang, Daya Guo, Jiachi Chen, Ruikai Zhang, Yuchi, Ma, Zibin Zheng

PDF

Open Access 1 Repo 1 Models

TL;DR

RLCoder introduces a reinforcement learning framework for repository-level code completion that learns to retrieve relevant code snippets without labeled data, improving accuracy and generalization across languages.

Contribution

The paper presents RLCoder, a novel RL-based retrieval method that autonomously learns to select useful code snippets for completion without requiring labeled training data.

Findings

01

Outperforms state-of-the-art methods with 12.2% EM improvement.

02

Generalizes across different programming languages.

03

Enhances previous methods like RepoCoder.

Abstract

Repository-level code completion aims to generate code for unfinished code snippets within the context of a specified repository. Existing approaches mainly rely on retrieval-augmented generation strategies due to limitations in input sequence length. However, traditional lexical-based retrieval methods like BM25 struggle to capture code semantics, while model-based retrieval methods face challenges due to the lack of labeled data for training. Therefore, we propose RLCoder, a novel reinforcement learning framework, which can enable the retriever to learn to retrieve useful content for code completion without the need for labeled data. Specifically, we iteratively evaluate the usefulness of retrieved content based on the perplexity of the target code when provided with the retrieved content as additional context, and provide feedback to update the retriever parameters. This iterative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DeepSoftwareAnalytics/RLCoder
pytorchOfficial

Models

🤗
nov3630/RLRetriever
model· 63 dl· ♡ 4
63 dl♡ 4

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research