A study on the impact of pre-trained model on Just-In-Time defect   prediction

Yuxiang Guo; Xiaopeng Gao; Zhenyu Zhang; W.K.Chan; Bo Jiang

arXiv:2309.02317·cs.SE·November 27, 2023

A study on the impact of pre-trained model on Just-In-Time defect prediction

Yuxiang Guo, Xiaopeng Gao, Zhenyu Zhang, W.K.Chan, Bo Jiang

PDF

Open Access 1 Repo

TL;DR

This paper systematically compares six transformer-based pre-trained models as backbones for JIT defect prediction, revealing their performance differences, input sensitivities, and effectiveness in low-data scenarios, providing insights for model optimization.

Contribution

It introduces a comprehensive analysis of various pre-trained models' impact on JIT defect prediction, including performance, input sensitivity, and resource efficiency, which was lacking in prior research.

Findings

01

Models with similar pre-training backbones require comparable training resources.

02

Commit code significantly influences defect detection performance.

03

Transformer-based models excel in low-data, few-shot scenarios.

Abstract

Previous researchers conducting Just-In-Time (JIT) defect prediction tasks have primarily focused on the performance of individual pre-trained models, without exploring the relationship between different pre-trained models as backbones. In this study, we build six models: RoBERTaJIT, CodeBERTJIT, BARTJIT, PLBARTJIT, GPT2JIT, and CodeGPTJIT, each with a distinct pre-trained model as its backbone. We systematically explore the differences and connections between these models. Specifically, we investigate the performance of the models when using Commit code and Commit message as inputs, as well as the relationship between training efficiency and model distribution among these six models. Additionally, we conduct an ablation experiment to explore the sensitivity of each model to inputs. Furthermore, we investigate how the models perform in zero-shot and few-shot scenarios. Our findings…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aresxd/jit_defect_prediciton
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Non-Destructive Testing Techniques · Machine Learning and Data Classification