An Additive Instance-Wise Approach to Multi-class Model Interpretation

Vy Vo; Van Nguyen; Trung Le; Quan Hung Tran; Gholamreza Haffari; Seyit; Camtepe; Dinh Phung

arXiv:2207.03113·cs.LG·June 2, 2023

An Additive Instance-Wise Approach to Multi-class Model Interpretation

Vy Vo, Van Nguyen, Trung Le, Quan Hung Tran, Gholamreza Haffari, Seyit, Camtepe, Dinh Phung

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel framework that combines attribution and selection methods to generate accurate, stable, and multi-class explanations for black-box models, improving interpretability and consistency.

Contribution

It proposes a unified approach for multi-class local explanations that leverages strengths of existing methods, enhancing faithfulness and stability of feature importance explanations.

Findings

01

Outperforms additive and instance-wise methods in faithfulness.

02

Produces more compact and comprehensible explanations.

03

Demonstrates stable feature selection across various datasets and models.

Abstract

Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system. A large number of interpreting methods focus on identifying explanatory input features, which generally fall into two main categories: attribution and selection. A popular attribution-based approach is to exploit local neighborhoods for learning instance-specific explainers in an additive manner. The process is thus inefficient and susceptible to poorly-conditioned samples. Meanwhile, many selection-based methods directly optimize local feature distributions in an instance-wise training framework, thereby being capable of leveraging global information from other inputs. However, they can only interpret single-class predictions and many suffer from inconsistency across different settings, due to a strict reliance on a pre-defined number of features selected. This work…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

isvy08/aim
pytorchOfficial

Videos

An Additive Instance-Wise Approach to Multi-class Model Interpretation· slideslive

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Adversarial Robustness in Machine Learning