Conditional Feature Importance for Mixed Data

Kristin Blesch; David S. Watson; Marvin N. Wright

arXiv:2210.03047·stat.ML·May 3, 2023·1 cites

Conditional Feature Importance for Mixed Data

Kristin Blesch, David S. Watson, Marvin N. Wright

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new method for measuring conditional feature importance in mixed data that accounts for feature dependencies and improves interpretability in machine learning models.

Contribution

It combines the CPI framework with sequential knockoff sampling to enable accurate conditional FI measurement for mixed data with complex dependencies.

Findings

01

Controls type I error effectively

02

Achieves high statistical power

03

Aligns with existing conditional FI measures

Abstract

Despite the popularity of feature importance (FI) measures in interpretable machine learning, the statistical adequacy of these methods is rarely discussed. From a statistical perspective, a major distinction is between analyzing a variable's importance before and after adjusting for covariates - i.e., between $marginal$ and $conditional$ measures. Our work draws attention to this rarely acknowledged, yet crucial distinction and showcases its implications. Further, we reveal that for testing conditional FI, only few methods are available and practitioners have hitherto been severely restricted in method application due to mismatching data requirements. Most real-world data exhibits complex feature dependencies and incorporates both continuous and categorical data (mixed data). Both properties are oftentimes neglected by conditional FI measures. To fill this gap, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bips-hb/cfi_mixeddata
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Statistical Methods and Inference