Learning Long Range Dependencies on Graphs via Random Walks
Dexiong Chen, Till Hendrik Schulz, Karsten Borgwardt

TL;DR
This paper introduces a novel graph neural network architecture that combines random walks with sequence models to effectively capture long-range dependencies, outperforming existing methods on multiple benchmarks.
Contribution
It proposes a flexible framework integrating random walks and sequence models to enhance long-range dependency modeling in graphs, surpassing prior approaches.
Findings
Achieves up to 13% performance improvement on key benchmarks.
Demonstrates effectiveness across 19 graph and node datasets.
Provides a versatile framework compatible with various GNN and GT architectures.
Abstract
Message-passing graph neural networks (GNNs) excel at capturing local relationships but struggle with long-range dependencies in graphs. In contrast, graph transformers (GTs) enable global information exchange but often oversimplify the graph structure by representing graphs as sets of fixed-length vectors. This work introduces a novel architecture that overcomes the shortcomings of both approaches by combining the long-range information of random walks with local message passing. By treating random walks as sequences, our architecture leverages recent advances in sequence models to effectively capture long-range dependencies within these walks. Based on this concept, we propose a framework that offers (1) more expressive graph representations through random walk sequences, (2) the ability to utilize any sequence model for capturing long-range dependencies, and (3) the flexibility by…
Peer Reviews
Decision·ICLR 2025 Poster
1. The issue of long-range dependencies in GNNs is an important and valuable area of exploration. 2. The proposed method is versatile and can be integrated with various models. 3. The authors offer theoretical insights, including the expressiveness of the proposed method.
1. The novelty of this method is limited, as random walks, combining local and global information, and SSMs are established techniques. For example, Graph-Mamba [1] also uses random walks and Mamba on GNNs, and graph transformers typically combine global and local data. 2. The datasets selected for node classification are mainly heterophilic. Including commonly used datasets like OGB would provide a more comprehensive evaluation. 3. The source of the performance gains is unclear—whether they s
- The paper is overall well-written. - Incorporating random-walks in graph learning is an important direction mainly due to its efficiency and ability to learn the long-range dependencies. - Theoretical results have provided detailed discussions about the expressive power of NeuralWalker and motivates its design. - The training details for reproducibility are reported, which can help future studies to better understand the weaknesses/strengths of NeuralWalker.
- My main concern is that there are several claims in the paper that have remained unsupported/unclear: - The authors claim that ``our approach achieves significant performance improvements on 19 graph and node benchmark datasets``. Based on the reported results NeuralWalker underperforms baselines and several missed state-of-the-art methods. Even looking at the current baselines NeuralWalker does not provide performance improvements in all the 19 datasets! - The authors claim that ``CNNs
1. Existing graph transformers works typically incorporate random walks as positional encodings; directly integrating them into token design is an interesting approach. 2. The theoretical analysis provided is thorough and comprehensive. 3. Extensive experiments are conducted, with detailed results presented both in the main text and the appendix.
1. The primary concern is the unclear motivation. Why do the authors choose to integrate random walks into token design, and how could this approach benefit over graph transformers, which can already use positional encodings to capture structural information? A preliminary study exploring this choice would be helpful. 2. Building on this, the purpose of incorporating multiple techniques, such as the walk aggregator, also lacks clarity. For instance, why use average pooling in the walk aggregato
Code & Models
Videos
Taxonomy
TopicsAdvanced Graph Neural Networks · Text and Document Classification Technologies · Machine Learning and Algorithms
MethodsSoftmax · Layer Normalization · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Label Smoothing · Adam · Attention Is All You Need · Residual Connection · Multi-Head Attention
