Multi-Domain Riemannian Graph Gluing for Building Graph Foundation Models
Li Sun, Zhenhao Huang, Silei Chen, Lanxu Yang, Junda Ye, Sen Su, Philip S. Yu

TL;DR
This paper introduces a Riemannian geometry approach to unify multiple graph datasets into a smooth manifold, improving knowledge transfer and performance in graph foundation models through a novel theoretical framework and the GraphGlue system.
Contribution
It presents the neural manifold gluing theory and the GraphGlue framework, enabling systematic knowledge integration and transfer across diverse graph domains.
Findings
GraphGlue outperforms existing methods across multiple domains.
Larger datasets lead to smoother manifolds and better transferability.
Empirical validation of the geometric scaling law.
Abstract
Multi-domain graph pre-training integrates knowledge from diverse domains to enhance performance in the target domains, which is crucial for building graph foundation models. Despite initial success, existing solutions often fall short of answering a fundamental question: how is knowledge integrated or transferred across domains? This theoretical limitation motivates us to rethink the consistency and transferability between model pre-training and domain adaptation. In this paper, we propose a fresh Riemannian geometry perspective, whose core idea is to merge any graph dataset into a unified, smooth Riemannian manifold, enabling a systematic understanding of knowledge integration and transfer. To achieve this, our key contribution is the theoretical establishment of neural manifold gluing, which first characterizes local geometry using an adaptive orthogonal frame and then "glues" the…
Peer Reviews
Decision·ICLR 2026 Oral
- This paper is good in originality and introduces a principled and powerful theoretical framework from differential geometry. This Neural Manifold Gluing concept provides a new, systematic, and theoretically sound language to model knowledge integration in graphs, which is a conceptual advance for the GFM field. - The experiments provide strong support for the theory. The Geometric Scaling Law experiment shows that 1-shot accuracy improves and transfer loss decreases as more datasets are added,
- The proposed theory assumes that all source and target domains can be glued into a single smooth manifold. However, it is unclear how the model would perform if a new domain is fundamentally geometrically incompatible. For example, what if a new domain possesses a markedly different intrinsic dimensionality? Therefore, further discussion on the limitations and potential failure cases would be valuable for practical use. - Several key design choices in the GRAPHGLUE framework are not ablated.
1. The paper introduces a novel approach to multi-domain graph pre-training by treating the graphs as local pieces of a larger, unified Riemannian manifold. This fresh perspective allows for a more systematic understanding of knowledge transfer across domains, which is a critical issue in graph foundation models. 2. The concept of “neural manifold gluing” is well-formulated, using differential geometry to tie together multiple domains. The method employs an Adaptive Orthogonal Frame (AOF) to mod
1. The method heavily relies on triangle-based holonomy for graph gluing. However, in sparse graphs or those with few cycles, this assumption may not hold, limiting the approach's applicability. Further analysis of the method’s behavior on sparse or acyclic graphs is needed. 2. The paper introduces the AOF for local geometry estimation, but does not provide sufficient analysis on how sensitive the results are to the choice of hyperparameters like perturbation strength and neighborhood size. 3. S
1. Strong theoretical foundation and unification. The paper provides a mathematically solid perspective, casting multi-domain pretraining and domain adaptation as a manifold-gluing problem; this is novel and helps unify several previously disparate ideas about metric compatibility, holonomy, and transferability. Theorem-level results and carefully defined operators (e.g., edge tangent translation) make the theoretical contribution convincing. 2. Practical scaling via an EMA strategy. The use of
1. The idea of “manifold gluing” may depend on the number of domains and on how many pairwise/collective gluing operations are needed. The paper briefly remarks that a QR-based subroutine reduces complexity, but it lacks a clear, end-to-end complexity and empirical runtime/memory analysis showing how cost scales with the number of domains, graph size, and manifold dimension. 2. The paper states (line ~1532) “For pretraining, we extract the 2-hop ego-graph with 10 neighbors each hop for single gr
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Domain Adaptation and Few-Shot Learning · Topic Modeling
