The Impact of Correlated Metrics on Defect Models
Jirayus Jiarpakdee, Chakkrit Tantithamthavorn, Ahmed E. Hassan

TL;DR
This study investigates how correlated metrics influence defect model interpretations, revealing that removing such metrics improves consistency and suggests avoiding ANOVA Type-I for more reliable defect analysis.
Contribution
The paper provides empirical evidence on the impact of correlated metrics and evaluates the effects of removing them on defect model interpretation and performance.
Findings
Correlated metrics affect the ranking of top metrics across interpretation techniques.
Removing correlated metrics improves consistency of metric rankings.
Negligible impact on defect model performance when removing correlated metrics.
Abstract
Defect models are analytical models that are used to build empirical theories that are related to software quality. Prior studies often derive knowledge from such models using interpretation techniques, such as ANOVA Type-I. Recent work raises concerns that prior studies rarely remove correlated metrics when constructing such models. Such correlated metrics may impact the interpretation of models. Yet, the impact of correlated metrics in such models has not been investigated. In this paper, we set out to investigate the impact of correlated metrics, and the benefits and costs of removing correlated metrics on defect models. Through a case study of 15 publicly-available defect datasets, we find that (1) correlated metrics impact the ranking of the highest ranked metric for all of the 9 studied model interpretation techniques. On the other hand, removing correlated metrics (2) improves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Software System Performance and Reliability
