Revisiting Random Walks for Learning on Graphs

Jinwoo Kim; Olga Zaghen; Ayhan Suleymanzade; Youngmin Ryou; Seunghoon; Hong

arXiv:2407.01214·cs.LG·March 6, 2025

Revisiting Random Walks for Learning on Graphs

Jinwoo Kim, Olga Zaghen, Ayhan Suleymanzade, Youngmin Ryou, Seunghoon, Hong

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces Random Walk Neural Networks (RWNNs), a simple yet powerful graph learning model that uses random walk records processed by neural networks, capable of universal approximation and leveraging language models for graph tasks.

Contribution

The paper presents RWNNs as an isomorphism-invariant, universal graph function approximator, and demonstrates their advantages over message passing neural networks, including alleviating over-smoothing.

Findings

01

RWNNs are isomorphism invariant and can approximate graph functions.

02

Random walk records, even plain text, can be used effectively with language models.

03

Empirical results show RWNNs outperform traditional methods on certain graph tasks.

Abstract

We revisit a simple model class for machine learning on graphs, where a random walk on a graph produces a machine-readable record, and this record is processed by a deep neural network to directly make vertex-level or graph-level predictions. We call these stochastic machines random walk neural networks (RWNNs), and through principled analysis, show that we can design them to be isomorphism invariant while capable of universal approximation of graph functions in probability. A useful finding is that almost any kind of record of random walks guarantees probabilistic invariance as long as the vertices are anonymized. This enables us, for example, to record random walks in plain text and adopt a language model to read these text records to solve graph tasks. We further establish a parallelism to message passing neural networks using tools from Markov chain theory, and show that…

Peer Reviews

Decision·ICLR 2025 Spotlight

Reviewer 01Rating 8Confidence 4

Strengths

The whole part about random walk strategies and cover times is novel and interesting. The suggested method of compiling the walk information (including anonymization and neighborhood information) into a string to be fed into an LLM is also new, even though building upon exisitng work (this could be more explicit when describing the model). The theoretical results seem sound too.

Weaknesses

A lot of the theory has been implicitly used e.g. in CraWl, e.g. by using 1D CNNs as reader NN. Also the anonymization and neighborhood recording is already done there (with the difference that it outputs a matrix instead of a string). Also there is quite a bit of literature on anonymization and e.g. that anonymization alone is enough to recover the graph which I would have expected to be discussed in that context. The expressive power section is quite lengthy while not saying much - by now we

Reviewer 02Rating 6Confidence 4

Strengths

The problem is interesting, and using llms to process outputs from the recording function makes the results more practical and significant.

Weaknesses

The readability of the paper can be improved. The paper is quite dense and has many theoretical results, while the experiments appear only at the end. Perhaps move the corresponding experiments closer to the relative sections would improve readability. Generally speaking, I find it confusing that the authors emphasize the invariance of the random walk algorithm, while instead this is a probabilistic invariance, that is, an invariance of the distribution. While the authors discuss that they acce

Reviewer 03Rating 8Confidence 4

Strengths

* The work provides multiple novel theoretical insights on random-walk-based GNNs and a general framework for modeling and studying such approaches further. * Applying pre-trained language models to text-based representations of random walks to perform in-context learning is a novel idea, and the presented results seem promising.

Weaknesses

While I think the theoretical contributions are significant, I think the experiment section lacks some details that would improve the paper: * What is the runtime of the compared methods during inference? While the RWNN-Llama model requires no training, I would expect its computational cost during inference to be substantially larger than that of a standard GNN. This by itself is not a problem as LLMs are expected to be costly, and RWNN also allows for more efficient choices for the reader mode

Code & Models

Repositories

jw9730/random-walk
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Machine Learning and Algorithms · Text and Document Classification Technologies