GNN Explanations that do not Explain and How to find Them
Steve Azzolin, Stefano Teso, Bruno Lepri, Andrea Passerini, Sagar Malhotra

TL;DR
This paper investigates the failure modes of self-explainable GNNs, revealing that explanations can be unfaithful despite optimal model performance, and proposes a new metric to detect such degenerate explanations.
Contribution
The work characterizes failure cases of SE-GNN explanations, demonstrates their potential malicious and natural emergence, and introduces a new faithfulness metric to identify unfaithful explanations.
Findings
Degenerate explanations can be unfaithful despite optimal model risk.
Most faithfulness metrics fail to detect these degenerate explanations.
A new metric reliably marks unfaithful explanations in various scenarios.
Abstract
Explanations provided by Self-explainable Graph Neural Networks (SE-GNNs) are fundamental for understanding the model's inner workings and for identifying potential misuse of sensitive attributes. Although recent works have highlighted that these explanations can be suboptimal and potentially misleading, a characterization of their failure cases is unavailable. In this work, we identify a critical failure of SE-GNN explanations: explanations can be unambiguously unrelated to how the SE-GNNs infer labels. We show that, on the one hand, many SE-GNNs can achieve optimal true risk while producing these degenerate explanations, and on the other, most faithfulness metrics can fail to identify these failure modes. Our empirical analysis reveals that degenerate explanations can be maliciously planted (allowing an attacker to hide the use of sensitive attributes) and can also emerge naturally,…
Peer Reviews
Decision·ICLR 2026 Poster
**1.** This paper is well presented, which makes me enjoy reading it. The flow of questions raised, illustration of the problem, proper designed figures and complete experiments make the motivation and findings of the paper well expressed. **2.** The raised explanation for SE-GNN's unfaithfulness in Theorem 1 is interesting and convincing. It assumes that some common anchor nodes serves as a bridge between real pattern and classification attention is novel, which also makes sense to me. The ex
**1.** The design of the new metric is somewhat limited in novelty. The difference between the proposed metric and other metrics is mainly just enlarging the choice range of samples, i.e., including both nodes and edges for randomness. This metric is designed to approximate all the subgraphs, however, just assumed as the sample amount increases they are equivalent, which may increase high complexity. While this may not be a critic point since evaluation complexity is not a so serious problem, a
1. Identifies a fundamental failure mode with serious implications for trustworthy AI in graph domains. The insight that explanations can serve as label-encoding channels is important for SE-GNN practitioners. 2. Theorem 1 formalizes when optimal risk coincides with degenerate expla- nations for SE-GNNs. The proof technique (constructing explicit e and g pairs) is interesting. Extension to Theorem 2 connecting EST with formal explanation notions strengthens the contribution. 3. Empirical valid
1. Theorem 1 requires hard explanation extractors, excluding soft/continuous scores common in practice, and it assumes |R| > 0, which limits generality. Also, the anchor set definition (single-node subgraphs in all graphs) is restrictive (in the discussion, it mentions generalizations but no formal treatment) 2. Attack requires training access (strong assumption), though authors acknowledge this fits MLaaS scenarios. The stopping criteria (appendix D.1.2) are manually tuned per dataset which is
(1) The probelm is well defined; (2) The paper is well organised; (3) Most statements/claims are supported empirically with experiments; (4) This paper provides strong insights on limitations of exitsing self-explainable GNNs;
(1) The theoretical analysis focuses on using isolation nodes as explanations, which may not be practical; as far as I know, most GNN explataions provide subgraphs (rather than isolated nodes) as explanations; (2) This paper focuses on class classification, and node classification is not discussed;
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Advanced Graph Neural Networks
