An Additive Instance-Wise Approach to Multi-class Model Interpretation
Vy Vo, Van Nguyen, Trung Le, Quan Hung Tran, Gholamreza Haffari, Seyit, Camtepe, Dinh Phung

TL;DR
This paper introduces a novel framework that combines attribution and selection methods to generate accurate, stable, and multi-class explanations for black-box models, improving interpretability and consistency.
Contribution
It proposes a unified approach for multi-class local explanations that leverages strengths of existing methods, enhancing faithfulness and stability of feature importance explanations.
Findings
Outperforms additive and instance-wise methods in faithfulness.
Produces more compact and comprehensible explanations.
Demonstrates stable feature selection across various datasets and models.
Abstract
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system. A large number of interpreting methods focus on identifying explanatory input features, which generally fall into two main categories: attribution and selection. A popular attribution-based approach is to exploit local neighborhoods for learning instance-specific explainers in an additive manner. The process is thus inefficient and susceptible to poorly-conditioned samples. Meanwhile, many selection-based methods directly optimize local feature distributions in an instance-wise training framework, thereby being capable of leveraging global information from other inputs. However, they can only interpret single-class predictions and many suffer from inconsistency across different settings, due to a strict reliance on a pre-defined number of features selected. This work…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Adversarial Robustness in Machine Learning
