Practical Code RAG at Scale: Task-Aware Retrieval Design Choices under Compute Budgets

Timur Galimzyanov; Olga Kolomyttseva; Egor Bogomolov

arXiv:2510.20609·cs.LG·October 24, 2025

Practical Code RAG at Scale: Task-Aware Retrieval Design Choices under Compute Budgets

Timur Galimzyanov, Olga Kolomyttseva, Egor Bogomolov

PDF

Open Access

TL;DR

This paper evaluates retrieval strategies for code generation tasks under compute constraints, comparing various configurations to optimize performance and efficiency across different scenarios.

Contribution

It provides systematic, evidence-based recommendations for retrieval design choices tailored to code-focused tasks and resource limitations.

Findings

01

BM25 with word-level splitting is most effective and fast for code completion.

02

Dense encoders outperform sparse methods for natural language to code retrieval but are slower.

03

Chunk size should scale with available context, with whole-file retrieval being competitive at large sizes.

Abstract

We study retrieval design for code-focused generation tasks under realistic compute budgets. Using two complementary tasks from Long Code Arena -- code completion and bug localization -- we systematically compare retrieval configurations across various context window sizes along three axes: (i) chunking strategy, (ii) similarity scoring, and (iii) splitting granularity. (1) For PL-PL, sparse BM25 with word-level splitting is the most effective and practical, significantly outperforming dense alternatives while being an order of magnitude faster. (2) For NL-PL, proprietary dense encoders (Voyager-3 family) consistently beat sparse retrievers, however requiring 100x larger latency. (3) Optimal chunk size scales with available context: 32-64 line chunks work best at small budgets, and whole-file retrieval becomes competitive at 16000 tokens. (4) Simple line-based chunking matches…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Natural Language Processing Techniques · Web Data Mining and Analysis