GNN Explanations that do not Explain and How to find Them

Steve Azzolin; Stefano Teso; Bruno Lepri; Andrea Passerini; Sagar Malhotra

arXiv:2601.20815·cs.LG·March 3, 2026

GNN Explanations that do not Explain and How to find Them

Steve Azzolin, Stefano Teso, Bruno Lepri, Andrea Passerini, Sagar Malhotra

PDF

Open Access 3 Reviews

TL;DR

This paper investigates the failure modes of self-explainable GNNs, revealing that explanations can be unfaithful despite optimal model performance, and proposes a new metric to detect such degenerate explanations.

Contribution

The work characterizes failure cases of SE-GNN explanations, demonstrates their potential malicious and natural emergence, and introduces a new faithfulness metric to identify unfaithful explanations.

Findings

01

Degenerate explanations can be unfaithful despite optimal model risk.

02

Most faithfulness metrics fail to detect these degenerate explanations.

03

A new metric reliably marks unfaithful explanations in various scenarios.

Abstract

Explanations provided by Self-explainable Graph Neural Networks (SE-GNNs) are fundamental for understanding the model's inner workings and for identifying potential misuse of sensitive attributes. Although recent works have highlighted that these explanations can be suboptimal and potentially misleading, a characterization of their failure cases is unavailable. In this work, we identify a critical failure of SE-GNN explanations: explanations can be unambiguously unrelated to how the SE-GNNs infer labels. We show that, on the one hand, many SE-GNNs can achieve optimal true risk while producing these degenerate explanations, and on the other, most faithfulness metrics can fail to identify these failure modes. Our empirical analysis reveals that degenerate explanations can be maliciously planted (allowing an attacker to hide the use of sensitive attributes) and can also emerge naturally,…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 8Confidence 4

Strengths

**1.** This paper is well presented, which makes me enjoy reading it. The flow of questions raised, illustration of the problem, proper designed figures and complete experiments make the motivation and findings of the paper well expressed. **2.** The raised explanation for SE-GNN's unfaithfulness in Theorem 1 is interesting and convincing. It assumes that some common anchor nodes serves as a bridge between real pattern and classification attention is novel, which also makes sense to me. The ex

Weaknesses

**1.** The design of the new metric is somewhat limited in novelty. The difference between the proposed metric and other metrics is mainly just enlarging the choice range of samples, i.e., including both nodes and edges for randomness. This metric is designed to approximate all the subgraphs, however, just assumed as the sample amount increases they are equivalent, which may increase high complexity. While this may not be a critic point since evaluation complexity is not a so serious problem, a

Reviewer 02Rating 4Confidence 4

Strengths

1. Identifies a fundamental failure mode with serious implications for trustworthy AI in graph domains. The insight that explanations can serve as label-encoding channels is important for SE-GNN practitioners. 2. Theorem 1 formalizes when optimal risk coincides with degenerate expla- nations for SE-GNNs. The proof technique (constructing explicit e and g pairs) is interesting. Extension to Theorem 2 connecting EST with formal explanation notions strengthens the contribution. 3. Empirical valid

Weaknesses

1. Theorem 1 requires hard explanation extractors, excluding soft/continuous scores common in practice, and it assumes |R| > 0, which limits generality. Also, the anchor set definition (single-node subgraphs in all graphs) is restrictive (in the discussion, it mentions generalizations but no formal treatment) 2. Attack requires training access (strong assumption), though authors acknowledge this fits MLaaS scenarios. The stopping criteria (appendix D.1.2) are manually tuned per dataset which is

Reviewer 03Rating 6Confidence 3

Strengths

(1) The probelm is well defined; (2) The paper is well organised; (3) Most statements/claims are supported empirically with experiments; (4) This paper provides strong insights on limitations of exitsing self-explainable GNNs;

Weaknesses

(1) The theoretical analysis focuses on using isolation nodes as explanations, which may not be practical; as far as I know, most GNN explataions provide subgraphs (rather than isolated nodes) as explanations; (2) This paper focuses on class classification, and node classification is not discussed;

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Advanced Graph Neural Networks