TL;DR
This paper introduces BiCC-BERT, a bi-modal change representation model pre-trained on code and commit messages, significantly improving just-in-time defect prediction accuracy by capturing deeper semantic information.
Contribution
The paper proposes a novel bi-modal pre-training model, BiCC-BERT, with a new RMI objective, and integrates it into JIT defect prediction, outperforming existing methods.
Findings
JIT-BiCC achieves 10.8% higher F1-score than baselines.
BiCC-BERT effectively captures semantic relations between code changes and messages.
The approach demonstrates the importance of natural language semantics in defect prediction.
Abstract
For predicting software defects at an early stage, researchers have proposed just-in-time defect prediction (JIT-DP) to identify potential defects in code commits. The prevailing approaches train models to represent code changes in history commits and utilize the learned representations to predict the presence of defects in the latest commit. However, existing models merely learn editions in source code, without considering the natural language intentions behind the changes. This limitation hinders their ability to capture deeper semantics. To address this, we introduce a novel bi-modal change pre-training model called BiCC-BERT. BiCC-BERT is pre-trained on a code change corpus to learn bi-modal semantic representations. To incorporate commit messages from the corpus, we design a novel pre-training objective called Replaced Message Identification (RMI), which learns the semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
