Explainable Graph Neural Networks Under Fire
Zhong Li, Simon Geisler, Yuhang Wang, Stephan G\"unnemann, Matthijs, van Leeuwen

TL;DR
This paper reveals that current post-hoc explanation methods for graph neural networks are highly vulnerable to adversarial attacks, questioning their reliability and urging the need for robustness evaluation.
Contribution
It introduces GXAttack, the first optimization-based white-box adversarial attack method targeting GNN explanations, highlighting the fragility of existing explanation techniques.
Findings
Existing GNN explanation methods are easily fooled by small perturbations.
GXAttack effectively disrupts explanations without changing model predictions.
Calls for adversarial robustness testing of GNN explanation methods.
Abstract
Predictions made by graph neural networks (GNNs) usually lack interpretability due to their complex computational behavior and the abstract nature of graphs. In an attempt to tackle this, many GNN explanation methods have emerged. Their goal is to explain a model's predictions and thereby obtain trust when GNN models are deployed in decision critical applications. Most GNN explanation methods work in a post-hoc manner and provide explanations in the form of a small subset of important edges and/or nodes. In this paper we demonstrate that these explanations can unfortunately not be trusted, as common GNN explanation methods turn out to be highly susceptible to adversarial perturbations. That is, even small perturbations of the original graph structure that preserve the model's predictions may yield drastically different explanations. This calls into question the trustworthiness and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks
