Scalable Message Passing Neural Networks: No Need for Attention in Large Graph Representation Learning
Haitz S\'aez de Oc\'ariz Borde, Artem Lukoianov, Anastasis Kratsios, Michael Bronstein, Xiaowen Dong

TL;DR
This paper introduces SMPNNs, a scalable GNN architecture that replaces attention with convolutional message passing, achieving competitive performance on large graphs without the high computational costs of attention mechanisms.
Contribution
The paper presents a novel scalable GNN architecture that avoids attention, enabling deep message passing and improved performance on large graphs, supported by new theoretical insights.
Findings
SMPNNs outperform existing Graph Transformers in large graph tasks.
Residual connections are essential for preserving universality in graph convolutions.
The architecture scales to large graphs and allows deep message passing.
Abstract
We propose Scalable Message Passing Neural Networks (SMPNNs) and demonstrate that, by integrating standard convolutional message passing into a Pre-Layer Normalization Transformer-style block instead of attention, we can produce high-performing deep message-passing-based Graph Neural Networks (GNNs). This modification yields results competitive with the state-of-the-art in large graph transductive learning, particularly outperforming the best Graph Transformers in the literature, without requiring the otherwise computationally and memory-expensive attention mechanism. Our architecture not only scales to large graphs but also makes it possible to construct deep message-passing networks, unlike simple GNNs, which have traditionally been constrained to shallow architectures due to oversmoothing. Moreover, we provide a new theoretical analysis of oversmoothing based on universal…
Peer Reviews
Decision·Submitted to ICLR 2025
- The proposed method is simple. - The paper focuses on over-smoothing, an important problem in graph learning. - The paper provides both empirical and theoretical analyses.
- The contribution of this paper is weak. The main focus is replacing the attention module in the transformer with a message-passing module and using the residual connections to alleviate the over-smoothing problem. However, the use of residual connections to address over-smoothing has already been explored in DeepGCN[1], which this paper does not mention or compare. Additionally, the implementation of deep GNNs has been studied in several other works[1]-[4]. - The paper contains a substantial
The paper is well-written with a clear motivation. The methodology is easy to follow, and the experiment section is well-structured.The paper is well-written with a clear motivation. The methodology is easy to follow, and the experiment section is well-structured.
The experiments are not strong enough. For instance, all SMPNN variants should be included consistently across tables and figures in the section. The same applies to baselines, unless there are reasonable explanations for exclusions. Additionally, the distinctions among SMPNN variants and the strengths of SMPNN are not clear. For example, SMPNN uses significantly more GPU memory than SGFormer, yet the paper still claims that it does not use more memory than the baselines. Some presentation needs
1. The paper provides solid theoretical support, particularly on residual connections and universal approximation, which strengthens the SMPNN design and its claims. 2. By replacing attention with scalable message-passing, SMPNN achieves efficient performance and good experiment results on large graphs, offering a notable advancement for scalable GNN applications.
1. In Section 3.2, the authors label their proposed block as a "transformer block." However, the SMPNN framework lacks any attention mechanism, which is a core component of transformers. Consequently, SMPNN functions more like a deep GCN with residual connections rather than a genuine transformer model. This categorization is misleading, as attention mechanisms fundamentally distinguish transformers by enhancing scalability and representation capacity in large graph models. Existing models such
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Neural Networks and Applications
