GrepRAG: An Empirical Study and Optimization of Grep-Like Retrieval for Code Completion
Baoyi Wang, Xingliang Wang, Guochang Li, Chen Zhi, Junxiao Han, Xinkui Zhao, Nan Wang, Shuiguang Deng, Jianwei Yin

TL;DR
This paper investigates the effectiveness of simple, index-free lexical retrieval methods like ripgrep for repository-level code completion, demonstrating that lightweight approaches can rival complex retrieval systems and proposing improvements to enhance their performance.
Contribution
It introduces GrepRAG, a novel lightweight retrieval framework that enhances lexical retrieval with re-ranking and deduplication, outperforming state-of-the-art methods in code completion tasks.
Findings
Naive GrepRAG achieves comparable performance to complex graph-based baselines.
GrepRAG improves code exact match by 7.04-15.58% over SOTA methods.
Lexical retrieval effectiveness is linked to spatial proximity and lexical precision.
Abstract
Repository-level code completion remains challenging for large language models (LLMs) due to cross-file dependencies and limited context windows. Prior work addresses this challenge using Retrieval-Augmented Generation (RAG) frameworks based on semantic indexing or structure-aware graph analysis, but these approaches incur substantial computational overhead for index construction and maintenance. Motivated by common developer workflows that rely on lightweight search utilities (e.g., ripgrep), we revisit a fundamental yet underexplored question: how far can simple, index-free lexical retrieval support repository-level code completion before more complex retrieval mechanisms become necessary? To answer this question, we systematically investigate lightweight, index-free, intent-aware lexical retrieval through extensive empirical analysis. We first introduce Naive GrepRAG, a baseline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Scientific Computing and Data Management · Web Data Mining and Analysis
