Can Causality Cure Confusion Caused By Correlation (in Software Analytics)?
Amirali Rayegan, Tim Menzies

TL;DR
This paper explores whether integrating causality-aware split criteria into symbolic models enhances their stability and robustness in software engineering tasks, without sacrificing predictive accuracy, and compares automated models with human expert judgments.
Contribution
It introduces causality-aware decision trees for software analytics and demonstrates their improved stability over traditional correlation-based models, validated through extensive empirical evaluation.
Findings
Causality-aware trees show higher stability than correlation-based trees.
Human expert judgments are more stable than automated models.
Causality-aware models maintain comparable predictive performance.
Abstract
Background: Symbolic models, particularly decision trees, are widely used in software engineering for explainable analytics in defect prediction, configuration tuning, and software quality assessment. Most of these models rely on correlational split criteria, such as variance reduction or information gain, which identify statistical associations but cannot imply causation between X and Y. Recent empirical studies in software engineering show that both correlational models and causal discovery algorithms suffer from pronounced instability. This instability arises from two complementary issues: 1-Correlation-based methods conflate association with causation. 2-Causal discovery algorithms rely on heuristic approximations to cope with the NP-hard nature of structure learning, causing their inferred graphs to vary widely under minor input perturbations. Together, these issues undermine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Bayesian Modeling and Causal Inference · Explainable Artificial Intelligence (XAI)
