On Interpreting the Effectiveness of Unsupervised Software Traceability with Information Theory
David N. Palacio, Daniel Rodriguez-Cardenas, Denys Poshyvanyk, and Kevin Moran

TL;DR
This paper introduces an information theory-based approach, TraceXplainer, to evaluate and understand the fundamental limits of unsupervised software traceability techniques, revealing inherent information constraints affecting their effectiveness.
Contribution
It proposes the use of information theory metrics to assess traceability methods and demonstrates their application, uncovering information imbalances and limits in existing datasets and techniques.
Findings
Source code has 1.48 times more entropy than linked documentation.
Average mutual information is 4.81 bits, indicating limited link predictability.
Identified information-theoretic limits constrain the effectiveness of unsupervised traceability methods.
Abstract
Traceability is a cornerstone of modern software development, ensuring system reliability and facilitating software maintenance. While unsupervised techniques leveraging Information Retrieval (IR) and Machine Learning (ML) methods have been widely used for predicting trace links, their effectiveness remains underexplored. In particular, these techniques often assume traceability patterns are present within textual data - a premise that may not hold universally. Moreover, standard evaluation metrics such as precision, recall, accuracy, or F1 measure can misrepresent the model performance when underlying data distributions are not properly analyzed. Given that automated traceability techniques tend to struggle to establish links, we need further insight into the information limits related to traceability artifacts. In this paper, we propose an approach, TraceXplainer, for using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Information and Cyber Security
