Unified Pre-training for Program Understanding and Generation
Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei, Chang

TL;DR
PLBART is a versatile sequence-to-sequence model pre-trained on Java and Python that advances program understanding and generation tasks, outperforming existing models across multiple benchmarks.
Contribution
The paper introduces PLBART, a unified pre-trained model capable of handling diverse code understanding and generation tasks with extensive cross-language capabilities.
Findings
PLBART outperforms state-of-the-art models in code summarization, generation, and translation.
PLBART effectively performs program understanding tasks like repair and clone detection.
PLBART learns program syntax, style, and logical flow, even with limited annotations.
Abstract
Code summarization and generation empower conversion between programming language (PL) and natural language (NL), while code translation avails the migration of legacy code from one PL to another. This paper introduces PLBART, a sequence-to-sequence model capable of performing a broad spectrum of program and language understanding and generation tasks. PLBART is pre-trained on an extensive collection of Java and Python functions and associated NL text via denoising autoencoding. Experiments on code summarization in the English language, code generation, and code translation in seven programming languages show that PLBART outperforms or rivals state-of-the-art models. Moreover, experiments on discriminative tasks, e.g., program repair, clone detection, and vulnerable code detection, demonstrate PLBART's effectiveness in program understanding. Furthermore, analysis reveals that PLBART…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Software Testing and Debugging Techniques
