Towards the Unification and Robustness of Perturbation and Gradient   Based Explanations

Sushant Agarwal; Shahin Jabbari; Chirag Agarwal; Sohini Upadhyay,; Zhiwei Steven Wu; Himabindu Lakkaraju

arXiv:2102.10618·cs.LG·July 20, 2021·5 cites

Towards the Unification and Robustness of Perturbation and Gradient Based Explanations

Sushant Agarwal, Shahin Jabbari, Chirag Agarwal, Sohini Upadhyay,, Zhiwei Steven Wu, Himabindu Lakkaraju

PDF

Open Access 1 Video

TL;DR

This paper unifies gradient-based and perturbation-based explanation methods, showing their convergence and robustness properties, supported by theoretical analysis and extensive empirical validation on synthetic and real datasets.

Contribution

It provides explicit formulas linking SmoothGrad and LIME explanations, proving their convergence and robustness, and offers finite sample complexity bounds for explanation accuracy.

Findings

01

SmoothGrad and LIME explanations converge in expectation with many samples.

02

The methods exhibit robustness properties derived from their connection.

03

Finite sample bounds ensure reliable explanations with limited perturbations.

Abstract

As machine learning black boxes are increasingly being deployed in critical domains such as healthcare and criminal justice, there has been a growing emphasis on developing techniques for explaining these black boxes in a post hoc manner. In this work, we analyze two popular post hoc interpretation techniques: SmoothGrad which is a gradient based method, and a variant of LIME which is a perturbation based method. More specifically, we derive explicit closed form expressions for the explanations output by these two methods and show that they both converge to the same explanation in expectation, i.e., when the number of perturbed samples used by these methods is large. We then leverage this connection to establish other desirable properties, such as robustness, for these techniques. We also derive finite sample complexity bounds for the number of perturbations required for these methods…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Towards the Unification and Robustness of Perturbation and Gradient Based Explanations· slideslive

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning in Healthcare

MethodsHigh-Order Consensuses · Local Interpretable Model-Agnostic Explanations