Contextualized Code Pretraining for Code Generation

Chen Liu; Qingyuan Liang; Hanwen Zhang; Zeyu Sun; Yakun Zhang; Lu Zhang

arXiv:2605.17957·cs.SE·May 19, 2026

Contextualized Code Pretraining for Code Generation

Chen Liu, Qingyuan Liang, Hanwen Zhang, Zeyu Sun, Yakun Zhang, Lu Zhang

PDF

TL;DR

This paper introduces a new invocation-aware pretraining framework for code models that leverages calling context to improve code generation in realistic repository scenarios.

Contribution

It proposes contextualized code pretraining using static analysis to incorporate calling context, and introduces CallerGen models and CallerEval benchmark.

Findings

01

CallerGen outperforms comparable models on CallerEval

02

Models trained with invocation-aware objectives achieve higher pass@1 scores

03

Calling context significantly improves code generation performance

Abstract

As code generation becomes increasingly central to improving software development efficiency, modern code models are largely trained and evaluated on code with natural-language descriptions. In real projects, developers often implement missing functions under limited project-specific artifacts, while the local call-site context is already available in the surrounding code. This usage context provides actionable cues about expected behavior, but existing models are not explicitly optimized to leverage it reliably, leading to implementations that may not integrate smoothly with surrounding usage in repository settings. In this work, we propose contextualized code pretraining, an invocation-aware framework that integrates calling context into both the training and evaluation of code models. Using static analysis, we automatically extract large-scale caller-callee pairs from real…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.