Decomposing Observational Multiplicity in Decision Trees: Leaf and Structural Regret
Mustafa Cavus

TL;DR
This paper introduces a formal framework to decompose and quantify the sources of variability in decision tree predictions caused by observational multiplicity, highlighting the dominance of structural regret and its implications for model safety.
Contribution
It defines leaf and structural regret for decision trees, providing a formal decomposition of observational multiplicity and demonstrating their practical utility in improving model safety.
Findings
Structural regret accounts for over 15 times the variability of leaf regret.
The theoretical decomposition aligns closely with empirical variance observed.
Using regret measures for abstention improves recall from 92% to 100%.
Abstract
Many machine learning tasks admit multiple models that perform almost equally well, a phenomenon known as predictive multiplicity. A fundamental source of this multiplicity is observational multiplicity, which arises from the stochastic nature of label collection: observed training labels represent only a single realization of the underlying ground-truth probabilities. While theoretical frameworks for observational multiplicity have been established for logistic regression, their implications for non-smooth, partition-based models like decision trees remain underexplored. In this paper, we introduce two complementary notions of observational multiplicity for decision tree classifiers: leaf regret and structural regret. Leaf regret quantifies the intrinsic variability of predictions within a fixed leaf due to finite-sample noise, while structural regret captures variability induced by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Imbalanced Data Classification Techniques · Adversarial Robustness in Machine Learning
