Does chronology matter in JIT defect prediction? A Partial Replication Study
Hadi Jahanshahi, Dhanya Jothimani, Ay\c{s}e Ba\c{s}ar, Mucahit Cevik

TL;DR
This study investigates whether the chronological order of data affects JIT defect prediction models, finding that models perform consistently over time and emphasizing the importance of using recent data and weighted sampling for better accuracy.
Contribution
It provides a partial replication study showing that the chronological order can be disregarded, and recommends weighted sampling and frequent retraining to improve JIT defect prediction.
Findings
Model performance does not significantly change over time.
Using all available data yields similar results to using recent data.
Weighted sampling improves the relevance of change properties.
Abstract
Just-In-Time (JIT) models detect the fix-inducing changes (or defect-inducing changes). These models are designed based on the assumption that past code change properties are similar to future ones. However, as the system evolves, the expertise of developers and/or the complexity of the system also changes. In this work, we aim to investigate the effect of code change properties on JIT models over time. We also study the impact of using recent data as well as all available data on the performance of JIT models. Further, we analyze the effect of weighted sampling on the performance of fix-inducing properties of JIT models. For this purpose, we used datasets from Eclipse JDT, Mozilla, Eclipse Platform, and PostgreSQL. We used five families of change-code properties such as size, diffusion, history, experience, and purpose. We used Random Forest to train and test the JIT model and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Software System Performance and Reliability
