Towards the Unification and Robustness of Perturbation and Gradient Based Explanations
Sushant Agarwal, Shahin Jabbari, Chirag Agarwal, Sohini Upadhyay,, Zhiwei Steven Wu, Himabindu Lakkaraju

TL;DR
This paper unifies gradient-based and perturbation-based explanation methods, showing their convergence and robustness properties, supported by theoretical analysis and extensive empirical validation on synthetic and real datasets.
Contribution
It provides explicit formulas linking SmoothGrad and LIME explanations, proving their convergence and robustness, and offers finite sample complexity bounds for explanation accuracy.
Findings
SmoothGrad and LIME explanations converge in expectation with many samples.
The methods exhibit robustness properties derived from their connection.
Finite sample bounds ensure reliable explanations with limited perturbations.
Abstract
As machine learning black boxes are increasingly being deployed in critical domains such as healthcare and criminal justice, there has been a growing emphasis on developing techniques for explaining these black boxes in a post hoc manner. In this work, we analyze two popular post hoc interpretation techniques: SmoothGrad which is a gradient based method, and a variant of LIME which is a perturbation based method. More specifically, we derive explicit closed form expressions for the explanations output by these two methods and show that they both converge to the same explanation in expectation, i.e., when the number of perturbed samples used by these methods is large. We then leverage this connection to establish other desirable properties, such as robustness, for these techniques. We also derive finite sample complexity bounds for the number of perturbations required for these methods…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning in Healthcare
MethodsHigh-Order Consensuses · Local Interpretable Model-Agnostic Explanations
