From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence

Jian Yang; Xianglong Liu; Weifeng Lv; Ken Deng; Shawn Guo; Lin Jing; Yizhi Li; Shark Liu; Xianzhen Luo; Yuyu Luo; Changzai Pan; Ensheng Shi; Yingshui Tan; Renshuai Tao; Jiajun Wu; Xianjie Wu; Zhenhe Wu; Daoguang Zan; Chenchen Zhang; Wei Zhang; He Zhu; Terry Yue Zhuo; Kerui Cao; Xianfu Cheng; Jun Dong; Shengjie Fang; Zhiwei Fei; Xiangyuan Guan; Qipeng Guo; Zhiguang Han; Joseph James; Tianqi Luo; Renyuan Li; Yuhang Li; Yiming Liang; Congnan Liu; Jiaheng Liu; Qian Liu; Ruitong Liu; Tyler Loakman; Xiangxin Meng; Chuang Peng; Tianhao Peng; Jiajun Shi; Mingjie Tang; Boyang Wang; Haowen Wang; Yunli Wang; Fanglin Xu; Zihan Xu; Fei Yuan; Ge Zhang; Jiayi Zhang; Xinhao Zhang; Wangchunshu Zhou; Hualei Zhu; King Zhu; Bryan Dai; Aishan Liu; Zhoujun Li; Chenghua Lin; Tianyu Liu; Chao Peng; Kai Shen; Libo Qin; Shuangyong Song; Zizheng Zhan; Jiajun Zhang; Jie Zhang; Zhaoxiang Zhang; Bo Zheng

arXiv:2511.18538·cs.SE·December 9, 2025

From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence

Jian Yang, Xianglong Liu, Weifeng Lv, Ken Deng, Shawn Guo, Lin Jing, Yizhi Li, Shark Liu, Xianzhen Luo, Yuyu Luo, Changzai Pan, Ensheng Shi, Yingshui Tan, Renshuai Tao, Jiajun Wu, Xianjie Wu, Zhenhe Wu, Daoguang Zan, Chenchen Zhang, Wei Zhang, He Zhu, Terry Yue Zhuo, Kerui Cao

PDF

Open Access

TL;DR

This paper provides a comprehensive survey and practical guide on code foundation models, analyzing their development, techniques, and real-world applications in automated software development.

Contribution

It systematically examines the entire lifecycle of code LLMs, compares general and specialized models, and bridges the gap between academic research and practical deployment.

Findings

01

Performance improvements from single digits to over 95% success rates on benchmarks.

02

Analysis of techniques, design decisions, and trade-offs in code LLMs.

03

Insights into the research-practice gap and future research directions.

Abstract

Large language models (LLMs) have fundamentally transformed automated software development by enabling direct translation of natural language descriptions into functional code, driving commercial adoption through tools like Github Copilot (Microsoft), Cursor (Anysphere), Trae (ByteDance), and Claude Code (Anthropic). While the field has evolved dramatically from rule-based systems to Transformer-based architectures, achieving performance improvements from single-digit to over 95\% success rates on benchmarks like HumanEval. In this work, we provide a comprehensive synthesis and practical guide (a series of analytic and probing experiments) about code LLMs, systematically examining the complete model life cycle from data curation to post-training through advanced prompting paradigms, code pre-training, supervised fine-tuning, reinforcement learning, and autonomous coding agents. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Explainable Artificial Intelligence (XAI) · Machine Learning and Data Classification