The Out-of-Distribution Problem in Explainability and Search Methods for   Feature Importance Explanations

Peter Hase; Harry Xie; Mohit Bansal

arXiv:2106.00786·cs.LG·October 29, 2021·5 cites

The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations

Peter Hase, Harry Xie, Mohit Bansal

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates the out-of-distribution issues in feature importance explanations, proposes training modifications for better alignment, compares feature removal methods, and introduces a new search algorithm that outperforms existing baselines.

Contribution

It highlights the OOD problem in FI explanations, proposes a training adjustment for improved social alignment, and introduces a novel search-based explanation method that surpasses existing approaches.

Findings

01

Model training adjustments improve explanation alignment.

02

Some feature removal methods produce more OOD counterfactuals.

03

Parallel Local Search outperforms other explanation search methods.

Abstract

Feature importance (FI) estimates are a popular form of explanation, and they are commonly created and evaluated by computing the change in model confidence caused by removing certain input features at test time. For example, in the standard Sufficiency metric, only the top-k most important tokens are kept. In this paper, we study several under-explored dimensions of FI explanations, providing conceptual and empirical improvements for this form of explanation. First, we advance a new argument for why it can be problematic to remove features from an input when creating or evaluating explanations: the fact that these counterfactual inputs are out-of-distribution (OOD) to models implies that the resulting explanations are socially misaligned. The crux of the problem is that the model prior and random weight initialization influence the explanations (and explanation metrics) in unintended…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

peterbhase/ExplanationSearch
pytorchOfficial

Videos

The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations· slideslive

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Machine Learning and Data Classification

MethodsCounterfactuals Explanations · Random Search · Local Interpretable Model-Agnostic Explanations