A Fine-grained Data Set and Analysis of Tangling in Bug Fixing Commits
Steffen Herbold, Alexander Trautsch, Benjamin Ledel, Alireza, Aghamohammadi, Taher Ahmed Ghaleb, Kuljit Kaur Chahal, Tim Bossenmaier,, Bhaveet Nagaria, Philip Makedonski, Matin Nili Ahmadabadi, Kristof Szabados,, Helge Spieker, Matej Madeja, Nathaniel Hoy, Valentina Lenarduzzi

TL;DR
This paper investigates the prevalence and nature of tangled commits in bug fixing changes, revealing significant noise and complexity that impact research accuracy and emphasizing the need for careful data validation.
Contribution
It provides a detailed, manually validated dataset and analysis of tangled commits, quantifying their prevalence and types within bug fixing changes.
Findings
17-32% of changes in bug fixes modify the underlying bug
66-87% of changes to production code are directly related to bug fixes
3-47% of data may be noisy due to tangled commits
Abstract
Context: Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled commits mean that they actually study not only bugs, but also other concerns irrelevant for the study of bugs. Objective: We want to improve our understanding of the prevalence of tangling and the types of changes that are tangled within bug fixing commits. Methods: We use a crowd sourcing approach for manual labeling to validate which changes contribute to bug fixes for each line in bug fixing commits. Each line is labeled by four participants. If at least three participants agree on the same label, we have consensus. Results: We estimate that between 17% and 32% of all changes in bug fixing commits modify the source code to fix the underlying problem. However, when we only consider changes to the production code files this ratio increases to 66% to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
