On the Robustness of Removal-Based Feature Attributions

Chris Lin; Ian Covert; Su-In Lee

arXiv:2306.07462·cs.LG·November 1, 2023·1 cites

On the Robustness of Removal-Based Feature Attributions

Chris Lin, Ian Covert, Su-In Lee

PDF

Open Access 1 Video

TL;DR

This paper provides a theoretical and empirical analysis of the robustness of removal-based feature attribution methods, highlighting their sensitivity to perturbations and proposing ways to enhance their stability.

Contribution

It offers a unified theoretical framework for understanding removal-based attribution robustness and demonstrates practical methods to improve it through Lipschitz regularity.

Findings

01

Derived upper bounds for attribution differences under perturbations

02

Validated theoretical bounds with empirical experiments

03

Showed that improving Lipschitz regularity enhances attribution robustness

Abstract

To explain predictions made by complex machine learning models, many feature attribution methods have been developed that assign importance scores to input features. Some recent work challenges the robustness of these methods by showing that they are sensitive to input and model perturbations, while other work addresses this issue by proposing robust attribution methods. However, previous work on attribution robustness has focused primarily on gradient-based feature attributions, whereas the robustness of removal-based attribution methods is not currently well understood. To bridge this gap, we theoretically characterize the robustness properties of removal-based feature attributions. Specifically, we provide a unified analysis of such methods and derive upper bounds for the difference between intact and perturbed attributions, under settings of both input and model perturbations. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

On the Robustness of Removal-Based Feature Attributions· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)