PanGu-Coder: Program Synthesis with Function-Level Language Modeling
Fenia Christopoulou, Gerasimos Lampouras, Milan Gritta, Guchun Zhang,, Yinpeng Guo, Zhongqi Li, Qi Zhang, Meng Xiao, Bo Shen, Lin Li, Hao Yu, Li, Yan, Pingyi Zhou, Xin Wang, Yuchi Ma, Ignacio Iacobacci, Yasheng Wang,, Guangtai Liang, Jiansheng Wei, Xin Jiang, Qianxiang Wang

TL;DR
PanGu-Coder is a pretrained language model designed for text-to-code generation, utilizing a two-stage training process and fine-tuning on programming problems to produce functionally correct code.
Contribution
It introduces a novel two-stage training strategy and fine-tuning approach for program synthesis using a decoder-only language model architecture.
Findings
Achieves comparable or better performance than larger models like CodeX.
Operates with a smaller context window and less training data.
Demonstrates effectiveness in generating functionally correct code.
Abstract
We present PanGu-Coder, a pretrained decoder-only language model adopting the PanGu-Alpha architecture for text-to-code generation, i.e. the synthesis of programming language solutions given a natural language problem description. We train PanGu-Coder using a two-stage strategy: the first stage employs Causal Language Modelling (CLM) to pre-train on raw programming language data, while the second stage uses a combination of Causal Language Modelling and Masked Language Modelling (MLM) training objectives that focus on the downstream task of text-to-code generation and train on loosely curated pairs of natural language program definitions and code functions. Finally, we discuss PanGu-Coder-FT, which is fine-tuned on a combination of competitive programming problems and code with continuous integration tests. We evaluate PanGu-Coder with a focus on whether it generates functionally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Topic Modeling
