Dataflow-Guided Retrieval Augmentation for Repository-Level Code   Completion

Wei Cheng; Yuhan Wu; Wei Hu

arXiv:2405.19782·cs.SE·May 31, 2024

Dataflow-Guided Retrieval Augmentation for Repository-Level Code Completion

Wei Cheng, Yuhan Wu, Wei Hu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces DraCo, a dataflow-guided retrieval augmentation method that enhances repository-level code completion by leveraging a repo-specific context graph, significantly improving accuracy over existing methods.

Contribution

DraCo is the first approach to use dataflow analysis for precise retrieval of relevant context, enabling more accurate code completion in private repositories.

Findings

01

DraCo improves code exact match by 3.43%.

02

DraCo increases identifier F1-score by 3.27%.

03

DraCo demonstrates superior accuracy and efficiency.

Abstract

Recent years have witnessed the deployment of code language models (LMs) in various code intelligence tasks such as code completion. Yet, it is challenging for pre-trained LMs to generate correct completions in private repositories. Previous studies retrieve cross-file context based on import relations or text similarity, which is insufficiently relevant to completion targets. In this paper, we propose a dataflow-guided retrieval augmentation approach, called DraCo, for repository-level code completion. DraCo parses a private repository into code entities and establishes their relations through an extended dataflow analysis, forming a repo-specific context graph. Whenever triggering code completion, DraCo precisely retrieves relevant background knowledge from the repo-specific context graph and generates well-formed prompts to query code LMs. Furthermore, we construct a large Python…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nju-websoft/DraCo
noneOfficial

Videos

Dataflow-Guided Retrieval Augmentation for Repository-Level Code Completion· underline

Taxonomy

TopicsScientific Computing and Data Management · Advanced Data Storage Technologies