When Retrieval Hurts Code Completion: A Diagnostic Study of Stale Repository Context
Haojun Weng, Qianqian Yang, Hao Fu, Haobin Pan, Xinwei Lv

TL;DR
This study investigates how using outdated repository snippets in code generation can actively mislead models, highlighting the importance of temporal validity for robust retrieval-augmented code synthesis.
Contribution
It provides a controlled diagnostic analysis showing stale context can induce incorrect code, emphasizing the need to consider temporal relevance in retrieval-augmented models.
Findings
Stale snippets cause models to reference obsolete code in most cases.
No retrieval results in almost no stale references, but also low success.
Adding current evidence can largely correct stale-induced errors.
Abstract
Context: Retrieval-augmented code generation relies on cross-file repository context, but retrieved snippets may come from obsolete project states. Objectives: We study whether temporally stale repository snippets act as harmless noise or actively induce current-state-incompatible code. Methods: We conduct a controlled diagnostic study on a curated 17-sample set of production-helper signature changes from five Python repositories. For each sample, we compare current-only, stale-only, no-retrieval, and mixed current/stale retrieval conditions under prompts that hide commit freshness and expected current signatures. Results: Under neutralized prompts, stale-only retrieval induces stale helper references on 15/17 Qwen2.5-Coder-7B-Instruct samples and 13/17 gpt-4.1-mini samples, corresponding to 88.2 and 76.5 percentage-point increases over current-only retrieval. No retrieval…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
