On the Robustness of Removal-Based Feature Attributions
Chris Lin, Ian Covert, Su-In Lee

TL;DR
This paper provides a theoretical and empirical analysis of the robustness of removal-based feature attribution methods, highlighting their sensitivity to perturbations and proposing ways to enhance their stability.
Contribution
It offers a unified theoretical framework for understanding removal-based attribution robustness and demonstrates practical methods to improve it through Lipschitz regularity.
Findings
Derived upper bounds for attribution differences under perturbations
Validated theoretical bounds with empirical experiments
Showed that improving Lipschitz regularity enhances attribution robustness
Abstract
To explain predictions made by complex machine learning models, many feature attribution methods have been developed that assign importance scores to input features. Some recent work challenges the robustness of these methods by showing that they are sensitive to input and model perturbations, while other work addresses this issue by proposing robust attribution methods. However, previous work on attribution robustness has focused primarily on gradient-based feature attributions, whereas the robustness of removal-based attribution methods is not currently well understood. To bridge this gap, we theoretically characterize the robustness properties of removal-based feature attributions. Specifically, we provide a unified analysis of such methods and derive upper bounds for the difference between intact and perturbed attributions, under settings of both input and model perturbations. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
