Scalable Message Passing Neural Networks: No Need for Attention in Large Graph Representation Learning

Haitz S\'aez de Oc\'ariz Borde; Artem Lukoianov; Anastasis Kratsios; Michael Bronstein; Xiaowen Dong

arXiv:2411.00835·cs.LG·March 11, 2026

Scalable Message Passing Neural Networks: No Need for Attention in Large Graph Representation Learning

Haitz S\'aez de Oc\'ariz Borde, Artem Lukoianov, Anastasis Kratsios, Michael Bronstein, Xiaowen Dong

PDF

Open Access 3 Reviews

TL;DR

This paper introduces SMPNNs, a scalable GNN architecture that replaces attention with convolutional message passing, achieving competitive performance on large graphs without the high computational costs of attention mechanisms.

Contribution

The paper presents a novel scalable GNN architecture that avoids attention, enabling deep message passing and improved performance on large graphs, supported by new theoretical insights.

Findings

01

SMPNNs outperform existing Graph Transformers in large graph tasks.

02

Residual connections are essential for preserving universality in graph convolutions.

03

The architecture scales to large graphs and allows deep message passing.

Abstract

We propose Scalable Message Passing Neural Networks (SMPNNs) and demonstrate that, by integrating standard convolutional message passing into a Pre-Layer Normalization Transformer-style block instead of attention, we can produce high-performing deep message-passing-based Graph Neural Networks (GNNs). This modification yields results competitive with the state-of-the-art in large graph transductive learning, particularly outperforming the best Graph Transformers in the literature, without requiring the otherwise computationally and memory-expensive attention mechanism. Our architecture not only scales to large graphs but also makes it possible to construct deep message-passing networks, unlike simple GNNs, which have traditionally been constrained to shallow architectures due to oversmoothing. Moreover, we provide a new theoretical analysis of oversmoothing based on universal…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 3Confidence 3

Strengths

- The proposed method is simple. - The paper focuses on over-smoothing, an important problem in graph learning. - The paper provides both empirical and theoretical analyses.

Weaknesses

- The contribution of this paper is weak. The main focus is replacing the attention module in the transformer with a message-passing module and using the residual connections to alleviate the over-smoothing problem. However, the use of residual connections to address over-smoothing has already been explored in DeepGCN[1], which this paper does not mention or compare. Additionally, the implementation of deep GNNs has been studied in several other works[1]-[4]. - The paper contains a substantial

Reviewer 02Rating 5Confidence 4

Strengths

The paper is well-written with a clear motivation. The methodology is easy to follow, and the experiment section is well-structured.The paper is well-written with a clear motivation. The methodology is easy to follow, and the experiment section is well-structured.

Weaknesses

The experiments are not strong enough. For instance, all SMPNN variants should be included consistently across tables and figures in the section. The same applies to baselines, unless there are reasonable explanations for exclusions. Additionally, the distinctions among SMPNN variants and the strengths of SMPNN are not clear. For example, SMPNN uses significantly more GPU memory than SGFormer, yet the paper still claims that it does not use more memory than the baselines. Some presentation needs

Reviewer 03Rating 5Confidence 3

Strengths

1. The paper provides solid theoretical support, particularly on residual connections and universal approximation, which strengthens the SMPNN design and its claims. 2. By replacing attention with scalable message-passing, SMPNN achieves efficient performance and good experiment results on large graphs, offering a notable advancement for scalable GNN applications.

Weaknesses

1. In Section 3.2, the authors label their proposed block as a "transformer block." However, the SMPNN framework lacks any attention mechanism, which is a core component of transformers. Consequently, SMPNN functions more like a deep GCN with residual connections rather than a genuine transformer model. This categorization is misleading, as attention mechanisms fundamentally distinguish transformers by enhancing scalability and representation capacity in large graph models. Existing models such

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Neural Networks and Applications