Contextualized Code Pretraining for Code Generation
Chen Liu, Qingyuan Liang, Hanwen Zhang, Zeyu Sun, Yakun Zhang, Lu Zhang

TL;DR
This paper introduces a new invocation-aware pretraining framework for code models that leverages calling context to improve code generation in realistic repository scenarios.
Contribution
It proposes contextualized code pretraining using static analysis to incorporate calling context, and introduces CallerGen models and CallerEval benchmark.
Findings
CallerGen outperforms comparable models on CallerEval
Models trained with invocation-aware objectives achieve higher pass@1 scores
Calling context significantly improves code generation performance
Abstract
As code generation becomes increasingly central to improving software development efficiency, modern code models are largely trained and evaluated on code with natural-language descriptions. In real projects, developers often implement missing functions under limited project-specific artifacts, while the local call-site context is already available in the surrounding code. This usage context provides actionable cues about expected behavior, but existing models are not explicitly optimized to leverage it reliably, leading to implementations that may not integrate smoothly with surrounding usage in repository settings. In this work, we propose contextualized code pretraining, an invocation-aware framework that integrates calling context into both the training and evaluation of code models. Using static analysis, we automatically extract large-scale caller-callee pairs from real…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
