Comparing interpretability and explainability for feature selection
Jack Dunn, Luca Mingardi, Ying Daisy Zhuo

TL;DR
This paper evaluates the effectiveness of different feature importance methods, including SHAP and native importance scores, in identifying relevant features across various machine learning models, highlighting the superior performance of interpretable models.
Contribution
It provides a comparative analysis showing that interpretable models outperform black-box models like XGBoost in feature selection accuracy.
Findings
XGBoost struggles to distinguish relevant from irrelevant features.
Interpretable models like CART and Optimal Trees perform better in feature selection.
SHAP and native importance scores are insufficient for reliable feature importance assessment.
Abstract
A common approach for feature selection is to examine the variable importance scores for a machine learning model, as a way to understand which features are the most relevant for making predictions. Given the significance of feature selection, it is crucial for the calculated importance scores to reflect reality. Falsely overestimating the importance of irrelevant features can lead to false discoveries, while underestimating importance of relevant features may lead us to discard important features, resulting in poor model performance. Additionally, black-box models like XGBoost provide state-of-the art predictive performance, but cannot be easily understood by humans, and thus we rely on variable importance scores or methods for explainability like SHAP to offer insight into their behavior. In this paper, we investigate the performance of variable importance as a feature selection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Imbalanced Data Classification Techniques
MethodsFeature Selection · Shapley Additive Explanations
