Graph Neural Network Explanations are Fragile
Jiate Li, Meng Pang, Yun Dong, Jinyuan Jia, Binghui Wang

TL;DR
This paper demonstrates that explanations generated by GNN explainers are highly fragile and can be drastically altered by minimal adversarial graph perturbations, raising concerns about their robustness.
Contribution
It is the first study to analyze the robustness of GNN explainers under adversarial attacks, proposing practical attack methods and revealing their fragility.
Findings
GNN explainers are highly sensitive to small graph perturbations
Proposed attack methods successfully alter explanations without changing model predictions
GNN explainers' explanations are not robust under adversarial conditions
Abstract
Explainable Graph Neural Network (GNN) has emerged recently to foster the trust of using GNNs. Existing GNN explainers are developed from various perspectives to enhance the explanation performance. We take the first step to study GNN explainers under adversarial attack--We found that an adversary slightly perturbing graph structure can ensure GNN model makes correct predictions, but the GNN explainer yields a drastically different explanation on the perturbed graph. Specifically, we first formulate the attack problem under a practical threat model (i.e., the adversary has limited knowledge about the GNN explainer and a restricted perturbation budget). We then design two methods (i.e., one is loss-based and the other is deduction-based) to realize the attack. We evaluate our attacks on various GNN explainers and the results show these explainers are fragile.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning
MethodsGraph Neural Network
