Rethinking Robustness: A New Approach to Evaluating Feature Attribution Methods
Panagiota Kiourti, Anu Singh, Preeti Duraipandian, Weichao Zhou, and Wenchao Li

TL;DR
This paper proposes a new framework for evaluating the robustness of feature attribution methods in neural networks, emphasizing the need for objective metrics that accurately reflect attribution method weaknesses.
Contribution
It introduces a novel robustness metric, a new definition of similar inputs, and a GAN-based method for generating these inputs, improving evaluation accuracy.
Findings
Existing metrics are insufficient for true attribution robustness assessment
A new robustness metric better captures attribution method weaknesses
GAN-based input generation enhances evaluation of attribution methods
Abstract
This paper studies the robustness of feature attribution methods for deep neural networks. It challenges the current notion of attributional robustness that largely ignores the difference in the model's outputs and introduces a new way of evaluating the robustness of attribution methods. Specifically, we propose a new definition of similar inputs, a new robustness metric, and a novel method based on generative adversarial networks to generate these inputs. In addition, we present a comprehensive evaluation with existing metrics and state-of-the-art attribution methods. Our findings highlight the need for a more objective metric that reveals the weaknesses of an attribution method rather than that of the neural network, thus providing a more accurate evaluation of the robustness of attribution methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI
