PR-SZZ: How pull requests can support the tracing of defects in software repositories
Peter Bludau, Alexander Pretschner

TL;DR
This paper introduces PR-SZZ, an improved version of the SZZ algorithm that leverages pull requests to better trace bug fixing commits and their inducing commits in software repositories, enhancing accuracy and coverage.
Contribution
The paper presents a novel SZZ variant that incorporates pull request data, outperforming existing methods in bug fix identification and reducing false positives.
Findings
18% more bug tickets mapped to fixing commits on average
Overall F-score improved by 40 percentage points
Precision increased by 16 percentage points
Abstract
The SZZ algorithm represents a standard way to identify bug fixing commits as well as inducing counterparts. It forms the basis for data sets used in numerous empirical studies. Since its creation, multiple extensions have been proposed to enhance its performance. For historical reasons, related work relies on commit messages to map bug tickets to possibly related code with no additional data used to trace inducing commits from these fixes. Therefore, we present an updated version of SZZ utilizing pull requests, which are widely adopted today. We evaluate our approach in comparison to existing SZZ variants by conducting experiments and analyzing the usage of pull requests, inner commits, and merge strategies. We base our results on 6 open-source projects with more than 50k commits and 35k pull requests. With respect to bug fixing commits, on average 18% of bug tickets can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
