Iterative versus Exhaustive Data Selection for Cross Project Defect Prediction: An Extended Replication Study
Seyedrebvar Hosseini, Burak Turhan

TL;DR
This study replicates and extends previous research on data selection methods for cross-project defect prediction, demonstrating that iterative dataset selection improves recall and F-measure stability, making it a promising practical approach.
Contribution
It introduces an iterative dataset selection approach for CPDP, showing improved recall and F-measure stability over previous methods, with practical feasibility.
Findings
Iterative dataset selection achieves higher recall (0.933) than previous methods.
IDS has lower precision but comparable or better F-measure.
IDS is more stable across different test sets.
Abstract
Context: The effectiveness of data selection approaches in improving the performance of cross project defect prediction(CPDP) has been shown in multiple previous studies. Beside that, replication studies play an important role in the support of any valid study. Repeating a study using the same or different subjects can lead to better understandings of the nature of the problem. Objective: We use an iterative dataset selection (IDS) approach to generate training datasets and evaluate them on a set of randomly created validation datasets in the context of CPDP while considering a higher range of flexibility which makes the approach more feasible in practice. Method: We replicate an earlier study and present some insights into the achieved results while pointing out some of the shortcomings of the original study. Using the lessons learned, we propose to use an alternative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Software Engineering Techniques and Practices
