Generalizing Graph Transformers Across Diverse Graphs and Tasks via Pre-training
Yufei He, Zhenyu Hou, Yukuo Cen, Jun Hu, Feng He, Xu Cheng, Jie Tang, Bryan Hooi

TL;DR
This paper introduces PGT, a scalable transformer-based graph pre-training framework that generalizes across diverse graphs and tasks, demonstrating state-of-the-art results on large-scale datasets and real-world industrial data.
Contribution
The paper presents a novel pre-training strategy using a masked autoencoder architecture with a decoder for feature augmentation, enabling generalization to unseen nodes and graphs.
Findings
Achieves state-of-the-art performance on ogbn-papers100M dataset.
Successfully pre-trains on real-world graphs with over 540 million nodes.
Demonstrates effective generalization across static and dynamic downstream tasks.
Abstract
Graph pre-training has been concentrated on graph-level tasks involving small graphs (e.g., molecular graphs) or learning node representations on a fixed graph. Extending graph pre-trained models to web-scale graphs with billions of nodes in industrial scenarios, while avoiding negative transfer across graphs or tasks, remains a challenge. We aim to develop a general graph pre-trained model with inductive ability that can make predictions for unseen new nodes and even new graphs. In this work, we introduce a scalable transformer-based graph pre-training framework called PGT (Pre-trained Graph Transformer). Based on the masked autoencoder architecture, we design two pre-training tasks: one for reconstructing node features and the other for reconstructing local structures. Unlike the original autoencoder architecture where the pre-trained decoder is discarded, we propose a novel strategy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Machine Learning and Data Classification
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Softmax · Residual Connection · Byte Pair Encoding · Layer Normalization · Laplacian EigenMap · Label Smoothing · Adam
