Dataflow-Guided Retrieval Augmentation for Repository-Level Code Completion
Wei Cheng, Yuhan Wu, Wei Hu

TL;DR
This paper introduces DraCo, a dataflow-guided retrieval augmentation method that enhances repository-level code completion by leveraging a repo-specific context graph, significantly improving accuracy over existing methods.
Contribution
DraCo is the first approach to use dataflow analysis for precise retrieval of relevant context, enabling more accurate code completion in private repositories.
Findings
DraCo improves code exact match by 3.43%.
DraCo increases identifier F1-score by 3.27%.
DraCo demonstrates superior accuracy and efficiency.
Abstract
Recent years have witnessed the deployment of code language models (LMs) in various code intelligence tasks such as code completion. Yet, it is challenging for pre-trained LMs to generate correct completions in private repositories. Previous studies retrieve cross-file context based on import relations or text similarity, which is insufficiently relevant to completion targets. In this paper, we propose a dataflow-guided retrieval augmentation approach, called DraCo, for repository-level code completion. DraCo parses a private repository into code entities and establishes their relations through an extended dataflow analysis, forming a repo-specific context graph. Whenever triggering code completion, DraCo precisely retrieves relevant background knowledge from the repo-specific context graph and generates well-formed prompts to query code LMs. Furthermore, we construct a large Python…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsScientific Computing and Data Management · Advanced Data Storage Technologies
