Robust and Stable Black Box Explanations

Himabindu Lakkaraju; Nino Arsov; Osbert Bastani

arXiv:2011.06169·cs.LG·November 13, 2020·5 cites

Robust and Stable Black Box Explanations

Himabindu Lakkaraju, Nino Arsov, Osbert Bastani

PDF

Open Access 1 Video

TL;DR

This paper introduces a new adversarial training framework to generate black box explanations that are both robust and stable against distribution shifts, improving reliability in real-world applications.

Contribution

It presents the first method for creating post hoc explanations that are resilient to a broad class of adversarial perturbations using a minimax optimization approach.

Findings

01

Significantly enhances explanation robustness against adversarial attacks

02

Maintains high fidelity of explanations on original data

03

Applicable to linear models and decision sets

Abstract

As machine learning black boxes are increasingly being deployed in real-world applications, there has been a growing interest in developing post hoc explanations that summarize the behaviors of these black boxes. However, existing algorithms for generating such explanations have been shown to lack stability and robustness to distribution shifts. We propose a novel framework for generating robust and stable explanations of black box models based on adversarial training. Our framework optimizes a minimax objective that aims to construct the highest fidelity explanation with respect to the worst-case over a set of adversarial perturbations. We instantiate this algorithm for explanations in the form of linear models and decision sets by devising the required optimization procedures. To the best of our knowledge, this work makes the first attempt at generating post hoc explanations that are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Robust and Stable Black Box Explanations· slideslive

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification