An Empirical Study on JIT Defect Prediction Based on BERT-style Model

Yuxiang Guo; Xiaopeng Gao; Bo Jiang

arXiv:2403.11158·cs.SE·March 19, 2024·2 cites

An Empirical Study on JIT Defect Prediction Based on BERT-style Model

Yuxiang Guo, Xiaopeng Gao, Bo Jiang

PDF

Open Access

TL;DR

This paper systematically investigates how fine-tuning configurations of BERT-style models affect JIT defect prediction, revealing key insights and proposing a cost-effective fine-tuning method with LoRA.

Contribution

It provides a comprehensive empirical analysis of fine-tuning strategies for BERT in JIT defect prediction and introduces a memory-efficient fine-tuning approach using LoRA.

Findings

01

The first encoder layer is crucial for model performance.

02

Parameter initialization significantly affects results.

03

Adding weight decay slightly improves optimizer performance.

Abstract

Previous works on Just-In-Time (JIT) defect prediction tasks have primarily applied pre-trained models directly, neglecting the configurations of their fine-tuning process. In this study, we perform a systematic empirical study to understand the impact of the settings of the fine-tuning process on BERT-style pre-trained model for JIT defect prediction. Specifically, we explore the impact of different parameter freezing settings, parameter initialization settings, and optimizer strategies on the performance of BERT-style models for JIT defect prediction. Our findings reveal the crucial role of the first encoder layer in the BERT-style model and the project sensitivity to parameter initialization settings. Another notable finding is that the addition of a weight decay strategy in the Adam optimizer can slightly improve model performance. Additionally, we compare performance using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTechnology and Data Analysis