Analyzing Error Sources in Global Feature Effect Estimation
Timo Hei{\ss}, Coco B\"ogel, Bernd Bischl, Giuseppe Casalicchio

TL;DR
This paper systematically analyzes the sources of bias and variance in global feature effect estimates like PD and ALE, providing insights into their reliability and guidance on estimation strategies.
Contribution
It introduces a mean-squared-error decomposition for bias and variance in feature effect estimation and validates findings through extensive simulations.
Findings
Using holdout data has negligible bias compared to training data.
Sample size significantly affects estimation variance, especially for ALE.
Cross-validation reduces model variance, beneficial for overfitting models.
Abstract
Global feature effects such as partial dependence (PD) and accumulated local effects (ALE) plots are widely used to interpret black-box models. However, they are only estimates of true underlying effects, and their reliability depends on multiple sources of error. Despite the popularity of global feature effects, these error sources are largely unexplored. In particular, the practically relevant question of whether to use training or holdout data to estimate feature effects remains unanswered. We address this gap by providing a systematic, estimator-level analysis that disentangles sources of bias and variance for PD and ALE. To this end, we derive a mean-squared-error decomposition that separates model bias, estimation bias, model variance, and estimation variance, and analyze their dependence on model characteristics, data selection, and sample size. We validate our theoretical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Computational and Text Analysis Methods
