Keeping Code-Aware LLMs Fresh: Full Refresh, In-Context Deltas, and Incremental Fine-Tuning
Pradeep Kumar Sharma, Ishaan Puri, Mantinder Jit Singh, Swapnil Shivaprasad, Hritvik Shrivastava

TL;DR
This paper explores methods to keep code-aware language models up-to-date with evolving codebases, comparing full retraining, in-context deltas, and incremental fine-tuning across multiple repositories.
Contribution
It introduces a systematic evaluation of update strategies for code models, proposing an alias-aware protocol and a forgetting probe to measure model freshness and retention.
Findings
Incremental fine-tuning with old-aware mixes balances freshness and retention.
In-context learning with English summaries offers rapid adaptation without retraining.
Full refresh provides the highest accuracy for maximum code changes.
Abstract
Modern codebases evolve continuously: files are renamed or deleted; public APIs drift; behavior shifts within otherwise familiar modules. A model trained yesterday to map a developer's natural-language question to the exact set of repository file paths that matter will degrade tomorrow, even if the questions themselves look unchanged. In this paper we study, at system scale and across several widely used repositories, how to keep such a model fresh without surrendering retention on earlier code. We frame freshness as a form of domain drift between a base snapshot and the current HEAD, and we compare three families of update strategies: (A) Full Refresh, retraining the entire model at the new snapshot; (B) In-Context Learning (ICL) that injects recent deltas (raw git diffs or concise English summaries) at inference; and (C) Incremental Fine-Tuning (Inc-FT) on delta-derived training sets,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Web Data Mining and Analysis
