Traceability in the Wild: Automatically Augmenting Incomplete Trace Links
Michael Rath, Jacob Rendall, Jin L.C. Guo, Jane Cleland-Huang, and, Patrick Maeder

TL;DR
This paper presents an automated method to identify and augment missing trace links between commits and issues in software projects, improving traceability completeness using machine learning techniques.
Contribution
It introduces a novel classifier that leverages process and text features to detect missing issue tags and augment trace links in open source projects.
Findings
Achieved 96% recall in recommending missing issue links
Attained over 89% precision in augmenting existing trace links
Improved traceability completeness in open source projects
Abstract
Software and systems traceability is widely accepted as an essential element for supporting many software development tasks. Today's version control systems provide inbuilt features that allow developers to tag each commit with one or more issue ID, thereby providing the building blocks from which project-wide traceability can be established between feature requests, bug fixes, commits, source code, and specific developers. However, our analysis of six open source projects showed that on average only 60% of the commits were linked to specific issues. Without these fundamental links the entire set of project-wide links will be incomplete, and therefore not trustworthy. In this paper we address the fundamental problem of missing links between commits and issues. Our approach leverages a combination of process and text-related features characterizing issues and code changes to train a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
