Leveraging the Defects Life Cycle to Label Affected Versions and Defective Classes
Bailey Vandehei, Daniel Alencar da Costa, Davide Falessi

TL;DR
This study evaluates methods for identifying affected versions and defective classes in software projects, proposing a new automated approach that outperforms existing SZZ implementations in accuracy and consistency.
Contribution
The paper introduces a novel method for retrieving affected versions of defects, demonstrating its superior accuracy over traditional SZZ techniques across large open-source datasets.
Findings
The realistic method is usable for only 49% of defects.
Proposed method significantly outperforms SZZ in accuracy.
Affected version proportion remains stable within projects.
Abstract
Two recent studies explicitly recommend labeling defective classes in releases using the affected versions (AV) available in issue trackers. The aim our study is threefold: 1) to measure the proportion of defects for which the realistic method is usable, 2) to propose a method for retrieving the AVs of a defect, thus making the realistic approach usable when AVs are unavailable, 3) to compare the accuracy of the proposed method versus three SZZ implementations. The assumption of our proposed method is that defects have a stable life cycle in terms of the proportion of the number of versions affected by the defects before discovering and fixing these defects. Results related to 212 open-source projects from the Apache ecosystem, featuring a total of about 125,000 defects, reveal that the realistic method cannot be used in the majority (51%) of defects. Therefore, it is important to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
