CCT5: A Code-Change-Oriented Pre-Trained Model
Bo Lin, Shangwen Wang, Zhongxin Liu, Yepang Liu, Xin Xia, Xiaoguang, Mao

TL;DR
This paper introduces CCT5, a pre-trained model specifically designed for code changes, leveraging a large dataset and multiple tasks to improve software maintenance tasks like defect detection and code review.
Contribution
The paper presents a novel pre-training approach tailored for code changes, using a large-scale dataset and multiple tasks to enhance performance on related software engineering tasks.
Findings
CCT5 outperforms existing models on code change tasks.
Pre-training on code change data improves downstream task performance.
The approach effectively captures domain knowledge of code modifications.
Abstract
Software is constantly changing, requiring developers to perform several derived tasks in a timely manner, such as writing a description for the intention of the code change, or identifying the defect-prone code changes. Considering that the cost of dealing with these tasks can account for a large proportion (typically around 70 percent) of the total development expenditure, automating such processes will significantly lighten the burdens of developers. To achieve such a target, existing approaches mainly rely on training deep learning models from scratch or fine-tuning existing pretrained models on such tasks, both of which have weaknesses. Specifically, the former uses comparatively small-scale labelled data for training, making it difficult to learn and exploit the domain knowledge of programming language hidden in the large-amount unlabelled code in the wild; the latter is hard to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Software System Performance and Reliability
