Retrieval-augmented code completion for local projects using large language models

Marko Hostnik; Marko Robnik-\v{S}ikonja

arXiv:2408.05026·cs.SE·June 17, 2025

Retrieval-augmented code completion for local projects using large language models

Marko Hostnik, Marko Robnik-\v{S}ikonja

PDF

Open Access

TL;DR

This paper explores using small, efficient language models combined with retrieval techniques to improve local code completion, addressing privacy and computational issues of larger models.

Contribution

It introduces retrieval-augmented generation with small LLMs for local code completion and demonstrates its effectiveness over traditional models.

Findings

01

In-context RAG improves code completion by over 26%.

02

RETRO enhances GPT-2 performance by 12%.

03

Proper tokenization is crucial for optimal results.

Abstract

The use of large language models (LLMs) is becoming increasingly widespread among software developers. However, privacy and computational requirements are problematic with commercial solutions and the use of LLMs. In this work, we focus on using relatively small and efficient LLMs with 160M parameters that are suitable for local execution and augmentation with retrieval from local projects. We train two open transformer-based models, the generative GPT-2 and the retrieval-adapted RETRO, on open-source Python files, and empirically compare them, confirming the benefits of embedding-based retrieval. Furthermore, we improve our models' performance with In-context retrieval-augmented generation (RAG), which retrieves code snippets using the Jaccard similarity of tokens. We evaluate In-context RAG on larger models and determine that, despite its simplicity, the approach is more suitable than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Model-Driven Software Engineering Techniques · Software System Performance and Reliability

MethodsLinear Warmup With Linear Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · BERT · BART · RAG · Attention Is All You Need · Linear Layer · Attention Dropout · Residual Connection · Multi-Head Attention