Impact-driven Context Filtering For Cross-file Code Completion
Yanzhou Li, Shangqing Liu, Kangjie Chen, Tianwei Zhang, Yang Liu

TL;DR
This paper introduces CODEFILTER, a novel framework that uses impact-based context filtering to improve repository-level code completion by selecting relevant cross-file contexts, leading to better accuracy and efficiency.
Contribution
The paper proposes an impact-driven filtering method for cross-file code retrieval, significantly enhancing code completion performance and computational efficiency.
Findings
CODEFILTER improves accuracy across benchmarks.
It reduces input prompt length, increasing efficiency.
It generalizes well across different models.
Abstract
Retrieval-augmented generation (RAG) has recently demonstrated considerable potential for repository-level code completion, as it integrates cross-file knowledge with in-file preceding code to provide comprehensive contexts for generation. To better understand the contribution of the retrieved cross-file contexts, we introduce a likelihood-based metric to evaluate the impact of each retrieved code chunk on the completion. Our analysis reveals that, despite retrieving numerous chunks, only a small subset positively contributes to the completion, while some chunks even degrade performance. To address this issue, we leverage this metric to construct a repository-level dataset where each retrieved chunk is labeled as positive, neutral, or negative based on its relevance to the target completion. We then propose an adaptive retrieval context filtering framework, CODEFILTER, trained on this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile and Web Applications · Web Data Mining and Analysis · Power Systems and Technologies
