"Is your explanation stable?": A Robustness Evaluation Framework for Feature Attribution
Yuyou Gan, Yuhao Mao, Xuhong Zhang, Shouling Ji, Yuwen Pu, Meng Han,, Jianwei Yin, Ting Wang

TL;DR
This paper introduces MeTFA, a model-agnostic framework that enhances the stability and robustness of feature attribution explanations for neural networks by quantifying uncertainty and reducing instability.
Contribution
It proposes a novel median test-based method to evaluate feature importance significance and confidence intervals, improving explanation stability and robustness against noise and adversarial attacks.
Findings
MeTFA significantly reduces explanation instability.
It improves the visual quality and faithfulness of explanations.
MeTFA enhances robustness against explanation attacks.
Abstract
Understanding the decision process of neural networks is hard. One vital method for explanation is to attribute its decision to pivotal features. Although many algorithms are proposed, most of them solely improve the faithfulness to the model. However, the real environment contains many random noises, which may leads to great fluctuations in the explanations. More seriously, recent works show that explanation algorithms are vulnerable to adversarial attacks. All of these make the explanation hard to trust in real scenarios. To bridge this gap, we propose a model-agnostic method \emph{Median Test for Feature Attribution} (MeTFA) to quantify the uncertainty and increase the stability of explanation algorithms with theoretical guarantees. MeTFA has the following two functions: (1) examine whether one feature is significantly important or unimportant and generate a MeTFA-significant map…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Advanced Neural Network Applications
MethodsTest
