GraphFM: A generalist graph transformer that learns transferable representations across diverse domains
Divyansha Lachi, Mehdi Azabou, Vinam Arora, Eva Dyer

TL;DR
GraphFM introduces a scalable, multi-graph pretraining framework using a Perceiver-based encoder, enabling transferability and improved performance across diverse graph datasets and tasks.
Contribution
We propose GraphFM, a novel multi-graph pretraining method that generalizes across diverse graph domains using learned latent tokens and scalable training techniques.
Findings
Training on 152 datasets improves transferability.
Pretraining with synthetic and real graphs enhances stability.
Achieves competitive results on node classification tasks.
Abstract
Graph neural networks (GNNs) are often trained on individual datasets, requiring specialized models and significant hyperparameter tuning due to the unique structures and features of each dataset. This approach limits the scalability and generalizability of GNNs, as models must be tailored for each specific graph type. To address these challenges, we introduce GraphFM, a scalable multi-graph pretraining approach designed for learning across diverse graph datasets. GraphFM uses a Perceiver-based encoder with learned latent tokens to compress domain-specific features into a shared latent space, enabling generalization across graph domains. We propose new techniques for scaling up graph training on datasets of different sizes, allowing us to train GraphFM on 152 distinct graph datasets, containing a total of 7.4 million nodes and 189 million edges. This allows us to study the effect of…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
The authors develop a multi-graph pretraining approach to learn GraphFM, enabling an ability to handle diverse data across a variety of domains.
The contributions of the paper are fairly limited, as the authors have not established the premise of developing their foundation models vis-a-vis some of the existing works, and the evaluation is also not convincing. First, I recommend that the authors consider the survey paper, "A Survey on Self-Supervised Graph Foundation Models: Knowledge-Based Perspective," and also the tutorial on Graph Foundation Models in WWW'24. The authors have not cited the former, and also not compared and contrast
1. The entire paper is well presented and the authors give the details on experiments. 2. the code is available and the reprodubility should be good.
1. The novelty of this work is not high. It mainly uses the graph transformer cimbining with some engineering effort, like distributed tranining. 2. The paper claim that most GNN train on individual graph, which is not true. GNNs can tranin on mutiple grpah as well. i) GPT-GNN: Generative Pre-Training of Graph Neural Networks, KDD 2020 ii) GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training, KDD 2020 3. Traning one model for multiple graphs is not new, in this work, it seems
- GraphFM focuses on learning transferable representations by pretraining on diverse graph datasets from various domains, which helps GraphFM to generalize well across different types of graphs without the need for tuning for each new task. - The DistributedSSSampler proposed in GraphFM can improve the efficiency of sampling in large-scale graph learning by distributing the sampling process across multiple devices, which reduces memory bottlenecks and accelerates training.
- Some notations are not clearly defined. For example, the expression $\tilde{\mathbf{u}}_i=\operatorname{MLP}_g\left(\mathbf{u}_i\right)$ appears only once in the paper, and the meaning of $\tilde{\mathbf{u}}_i$ is unclear. Additionally, the calculation of the position encoding $\mathbf{p}_i$ and how $\mathbf{x}_i$ concatenates a projection of the node features are not sufficiently explained. - The novelty of GraphFM architecture is limited. GraphFM builds upon transformer-based architectures l
1. The writing is clear.
1. How does the model handle more challenging graph tasks, such as link prediction, given its emphasis on node classification? 2. The paper’s focus on node classification limits its scope and raises concerns about the broader applicability of the approach, especially given the minimal evaluation on other graph tasks. 3. The method is not novel. The paper is more like a technical report.
1. The paper is well-written and easy to understand. 2. To my knowledge, this is the first time a single graph encoder has been trained on 152 different graph datasets and evaluated for its effectiveness—a significant and commendable achievement. 3. The evaluation of the model is comprehensive.
1. A primary limitation of the model is its dependency on unique initial MLPs and final predictors for each graph dataset, which necessitates fine-tuning for every new dataset or task. This requirement significantly hinders the model’s practicality in real-world applications. 2. The pre-training results were largely anticipated, given the model’s supervised training approach. However, in real-world scenarios, labeled data is often scarce, making these results difficult to scale for large-scale
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel-Driven Software Engineering Techniques · Software Testing and Debugging Techniques · Teaching and Learning Programming
MethodsSparse Evolutionary Training
