Stability Guarantees for Feature Attributions with Multiplicative   Smoothing

Anton Xue; Rajeev Alur; Eric Wong

arXiv:2307.05902·cs.LG·October 30, 2023

Stability Guarantees for Feature Attributions with Multiplicative Smoothing

Anton Xue, Rajeev Alur, Eric Wong

PDF

Open Access

TL;DR

This paper introduces Multiplicative Smoothing (MuS), a novel technique that enhances the stability and reliability of feature attribution methods in machine learning models by providing formal guarantees.

Contribution

We propose MuS, a new smoothing method that ensures stability in feature attributions, overcoming limitations of existing techniques and applicable to any classifier and attribution method.

Findings

01

MuS provides formal stability guarantees for feature attributions.

02

MuS improves stability in vision and language models.

03

MuS is compatible with methods like LIME and SHAP.

Abstract

Explanation methods for machine learning models tend not to provide any formal guarantees and may not reflect the underlying decision-making process. In this work, we analyze stability as a property for reliable feature attribution methods. We prove that relaxed variants of stability are guaranteed if the model is sufficiently Lipschitz with respect to the masking of features. We develop a smoothing method called Multiplicative Smoothing (MuS) to achieve such a model. We show that MuS overcomes the theoretical limitations of standard smoothing techniques and can be integrated with any classifier and feature attribution method. We evaluate MuS on vision and language models with various feature attribution methods, such as LIME and SHAP, and demonstrate that MuS endows feature attributions with non-trivial stability guarantees.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Bayesian Modeling and Causal Inference

MethodsShapley Additive Explanations · Local Interpretable Model-Agnostic Explanations