A study on the impact of pre-trained model on Just-In-Time defect prediction
Yuxiang Guo, Xiaopeng Gao, Zhenyu Zhang, W.K.Chan, Bo Jiang

TL;DR
This paper systematically compares six transformer-based pre-trained models as backbones for JIT defect prediction, revealing their performance differences, input sensitivities, and effectiveness in low-data scenarios, providing insights for model optimization.
Contribution
It introduces a comprehensive analysis of various pre-trained models' impact on JIT defect prediction, including performance, input sensitivity, and resource efficiency, which was lacking in prior research.
Findings
Models with similar pre-training backbones require comparable training resources.
Commit code significantly influences defect detection performance.
Transformer-based models excel in low-data, few-shot scenarios.
Abstract
Previous researchers conducting Just-In-Time (JIT) defect prediction tasks have primarily focused on the performance of individual pre-trained models, without exploring the relationship between different pre-trained models as backbones. In this study, we build six models: RoBERTaJIT, CodeBERTJIT, BARTJIT, PLBARTJIT, GPT2JIT, and CodeGPTJIT, each with a distinct pre-trained model as its backbone. We systematically explore the differences and connections between these models. Specifically, we investigate the performance of the models when using Commit code and Commit message as inputs, as well as the relationship between training efficiency and model distribution among these six models. Additionally, we conduct an ablation experiment to explore the sensitivity of each model to inputs. Furthermore, we investigate how the models perform in zero-shot and few-shot scenarios. Our findings…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Non-Destructive Testing Techniques · Machine Learning and Data Classification
