Explainability of Machine Learning Models under Missing Data

Tuan L. Vo; Thu Nguyen; Luis M. Lopez-Ramos; Hugo L. Hammer; Michael; A. Riegler; Pal Halvorsen

arXiv:2407.00411·cs.LG·January 23, 2025

Explainability of Machine Learning Models under Missing Data

Tuan L. Vo, Thu Nguyen, Luis M. Lopez-Ramos, Hugo L. Hammer, Michael, A. Riegler, Pal Halvorsen

PDF

Open Access 1 Repo

TL;DR

This paper investigates how different data imputation methods affect the explainability of machine learning models, especially using SHAP values, highlighting biases introduced by imputation choices and their impact on model interpretation.

Contribution

It provides a comprehensive analysis of imputation strategies on SHAP explanations, including theoretical insights and practical guidelines for improving model interpretability with missing data.

Findings

01

Imputation methods can bias Shapley values and affect explainability.

02

Lower prediction MSE does not guarantee lower MSE in Shapley values.

03

Using XGBoost directly on missing data can impair interpretability.

Abstract

Missing data is a prevalent issue that can significantly impair model performance and explainability. This paper briefly summarizes the development of the field of missing data with respect to Explainable Artificial Intelligence and experimentally investigates the effects of various imputation methods on SHAP (SHapley Additive exPlanations), a popular technique for explaining the output of complex machine learning models. Next, we compare different imputation strategies and assess their impact on feature importance and interaction as determined by Shapley values. Moreover, we also theoretically analyze the effects of missing values on Shapley values. Importantly, our findings reveal that the choice of imputation method can introduce biases that could lead to changes in the Shapley values, thereby affecting the explainability of the model. Moreover, we also show that a lower test…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

simulamet-host/SHAP
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI)

MethodsShapley Additive Explanations