RLCoder: Reinforcement Learning for Repository-Level Code Completion
Yanlin Wang, Yanli Wang, Daya Guo, Jiachi Chen, Ruikai Zhang, Yuchi, Ma, Zibin Zheng

TL;DR
RLCoder introduces a reinforcement learning framework for repository-level code completion that learns to retrieve relevant code snippets without labeled data, improving accuracy and generalization across languages.
Contribution
The paper presents RLCoder, a novel RL-based retrieval method that autonomously learns to select useful code snippets for completion without requiring labeled training data.
Findings
Outperforms state-of-the-art methods with 12.2% EM improvement.
Generalizes across different programming languages.
Enhances previous methods like RepoCoder.
Abstract
Repository-level code completion aims to generate code for unfinished code snippets within the context of a specified repository. Existing approaches mainly rely on retrieval-augmented generation strategies due to limitations in input sequence length. However, traditional lexical-based retrieval methods like BM25 struggle to capture code semantics, while model-based retrieval methods face challenges due to the lack of labeled data for training. Therefore, we propose RLCoder, a novel reinforcement learning framework, which can enable the retriever to learn to retrieve useful content for code completion without the need for labeled data. Specifically, we iteratively evaluate the usefulness of retrieved content based on the perplexity of the target code when provided with the retrieved content as additional context, and provide feedback to update the retriever parameters. This iterative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research
