TL;DR
HISTOGRAPH is a novel attention-based pooling method for GNNs that leverages intermediate layer activations to improve graph classification, especially in deep architectures.
Contribution
Introduces HISTOGRAPH, a two-stage attention mechanism that utilizes historical activations across layers for enhanced graph pooling.
Findings
Consistently improves performance over traditional pooling methods.
Provides robustness in deep GNN architectures.
Enhances node representation by modeling activation evolution.
Abstract
Graph Neural Networks (GNNs) have demonstrated remarkable success in various domains such as social networks, molecular chemistry, and more. A crucial component of GNNs is the pooling procedure, in which the node features calculated by the model are combined to form an informative final descriptor to be used for the downstream task. However, previous graph pooling schemes rely on the last GNN layer features as an input to the pooling or classifier layers, potentially under-utilizing important activations of previous layers produced during the forward pass of the model, which we regard as historical graph activations. This gap is particularly pronounced in cases where a node's representation can shift significantly over the course of many graph neural layers, and worsened by graph-specific challenges such as over-smoothing in deep architectures. To bridge this gap, we introduce…
Peer Reviews
Decision·ICLR 2026 Poster
1. Novel perspective: The paper introduces a clear and well-motivated idea of learning from the historical trajectory of node activations, addressing the common limitation of relying solely on the last GNN layer. 2. Comprehensive experiments: Evaluations across multiple datasets (TU, OGB, node classification, and link prediction) with both GIN and GCN backbones show consistent improvements. 3. Well-written and well-positioned: The paper situates HISTOGRAPH clearly within prior works on pooling
1. Limited interpretability of learned attention weights: While attention is used layer-wise and node-wise, the paper could benefit from deeper analysis of what the model learns—e.g., visualization of layer weights across datasets. 2. The attention mechanism itself is widely adopted and not novel. However, the paper should further clarify why the proposed method achieves such notable performance gains. A deeper analytical discussion and illustrative case studies would substantially strengthen th
- It is clear that the authors have spent a lot of effort in the experimental section as they compare against a large number of baselines and consider a large number of datasets. - The proposed method can be easily included into existing architectures (at the cost of some training for the new parameters).
- Global self-attention is quadratic in the number of nodes, which makes the method impractical for large graphs. - Caching in memory the activations at all layers for all nodes can become prohibitively expensive. Together with the above, this makes the proposed method very impractical for large graphs. - Section 4 is not very convincing as the arguments are too general. Regarding oversmoothing, Proposition 1 is obvious, and in practice different nodes might perform better with different alphas
1. The motivation is clear; the authors propose leveraging historical representations to mitigate over-smoothing, which is reasonable and well-justified. 2. The experiments are comprehensive, thoroughly validating the effectiveness of their method across various tasks.
1. Lacks comparison with some more recent baselines [1]. 2. No experimental comparisons were conducted on larger graphs, such as those in the OGB [2] suite. How does the time efficiency compare to the baseline when the graph size increases? 3. How are the historical representations specifically utilized? What are the theoretical advantages of the gating mechanism? 4. Lacks a theoretical analysis of the method's effectiveness. [1] Wang Y, Liu S, Zheng T, et al. Unveiling global interactive patte
Videos
Taxonomy
TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Machine Learning in Healthcare
