On the Robustness of Global Feature Effect Explanations

Hubert Baniecki; Giuseppe Casalicchio; Bernd Bischl; Przemyslaw Biecek

arXiv:2406.09069·cs.LG·July 29, 2025

On the Robustness of Global Feature Effect Explanations

Hubert Baniecki, Giuseppe Casalicchio, Bernd Bischl, Przemyslaw Biecek

PDF

1 Repo

TL;DR

This paper investigates the robustness of global feature effect explanations like partial dependence plots and local effects, providing theoretical bounds and empirical analysis to understand their vulnerability to data and model perturbations.

Contribution

It introduces theoretical bounds for robustness of global explanations and empirically evaluates their stability across synthetic and real datasets.

Findings

01

Robustness of explanations varies significantly under perturbations.

02

Theoretical bounds help quantify interpretation stability.

03

Empirical results highlight potential misinterpretations in practice.

Abstract

We study the robustness of global post-hoc explanations for predictive models trained on tabular data. Effects of predictor features in black-box supervised learning are an essential diagnostic tool for model debugging and scientific discovery in applied sciences. However, how vulnerable they are to data and model perturbations remains an open research question. We introduce several theoretical bounds for evaluating the robustness of partial dependence plots and accumulated local effects. Our experimental results with synthetic and real-world datasets quantify the gap between the best and worst-case scenarios of (mis)interpreting machine learning predictions globally.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hbaniecki/robust-feature-effects
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.