Toward a Graph Foundation Model: Pre-Training Transformers With Random Walks
Ziyuan Tang, Jie Chen

TL;DR
This paper introduces a novel approach to pre-train a graph foundation model using Transformer architecture and random walks, enabling effective graph representation and reasoning across diverse datasets.
Contribution
It proposes a new method of representing nodes with random walks and develops a context prediction loss, advancing the creation of scalable, versatile graph foundation models.
Findings
Pre-trained graph Transformer achieves strong performance on downstream tasks.
Random walk-based node representations effectively capture graph structure.
Theoretical analysis shows expressive power in distinguishing neighborhoods and graphs.
Abstract
A foundation model like GPT elicits many emergent abilities, owing to the pre-training with broad inclusion of data and the use of the powerful Transformer architecture. While foundation models in natural languages are prevalent, can we build similar models for graphs? This paper describes an approach toward a graph foundation model that is pre-trained with diverse graph datasets by adapting the Transformer backbone. A central challenge toward this end is how a sequence model encodes graphs of varying sizes and from different domains. We propose representing a node as multiple random walks, such that the Transformer can extract node representations from sequences, which in turn form edge and graph representations. We develop a novel context prediction loss for these random walks and theoretically analyze their expressive power in distinguishing neighborhoods and graphs. We also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Dense Connections · Cosine Annealing · Absolute Position Encodings · Layer Normalization · Linear Warmup With Cosine Annealing · Attention Dropout · Discriminative Fine-Tuning · GPT
