Generalizing Graph Transformers Across Diverse Graphs and Tasks via Pre-training

Yufei He; Zhenyu Hou; Yukuo Cen; Jun Hu; Feng He; Xu Cheng; Jie Tang; Bryan Hooi

arXiv:2407.03953·cs.LG·November 7, 2025·1 cites

Generalizing Graph Transformers Across Diverse Graphs and Tasks via Pre-training

Yufei He, Zhenyu Hou, Yukuo Cen, Jun Hu, Feng He, Xu Cheng, Jie Tang, Bryan Hooi

PDF

Open Access

TL;DR

This paper introduces PGT, a scalable transformer-based graph pre-training framework that generalizes across diverse graphs and tasks, demonstrating state-of-the-art results on large-scale datasets and real-world industrial data.

Contribution

The paper presents a novel pre-training strategy using a masked autoencoder architecture with a decoder for feature augmentation, enabling generalization to unseen nodes and graphs.

Findings

01

Achieves state-of-the-art performance on ogbn-papers100M dataset.

02

Successfully pre-trains on real-world graphs with over 540 million nodes.

03

Demonstrates effective generalization across static and dynamic downstream tasks.

Abstract

Graph pre-training has been concentrated on graph-level tasks involving small graphs (e.g., molecular graphs) or learning node representations on a fixed graph. Extending graph pre-trained models to web-scale graphs with billions of nodes in industrial scenarios, while avoiding negative transfer across graphs or tasks, remains a challenge. We aim to develop a general graph pre-trained model with inductive ability that can make predictions for unseen new nodes and even new graphs. In this work, we introduce a scalable transformer-based graph pre-training framework called PGT (Pre-trained Graph Transformer). Based on the masked autoencoder architecture, we design two pre-training tasks: one for reconstructing node features and the other for reconstructing local structures. Unlike the original autoencoder architecture where the pre-trained decoder is discarded, we propose a novel strategy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Machine Learning and Data Classification

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Softmax · Residual Connection · Byte Pair Encoding · Layer Normalization · Laplacian EigenMap · Label Smoothing · Adam