Out-of-Distribution Graph Models Merging
Yidi Wang, Ziyue Qiao, Jiawei Gu, Xubin Zheng, Pengyang Wang, Xiaobing Pei, Xiao Luo

TL;DR
This paper introduces a novel approach for merging pre-trained graph models from different domains to create a generalized model, using a mixture distribution, MoE module, and masking mechanism, without requiring domain data.
Contribution
It proposes a domain-agnostic graph merging framework that effectively consolidates heterogeneous GNNs into a generalized model, addressing domain discrepancy challenges.
Findings
Effective in merging models from multiple domains
Improves generalization without source/target domain data
Theoretically validated and experimentally successful
Abstract
This paper studies a novel problem of out-of-distribution graph models merging, which aims to construct a generalized model from multiple graph models pre-trained on different domains with distribution discrepancy. This problem is challenging because of the difficulty in learning domain-invariant knowledge implicitly in model parameters and consolidating expertise from potentially heterogeneous GNN backbones. In this work, we propose a graph generation strategy that instantiates the mixture distribution of multiple domains. Then, we merge and fine-tune the pre-trained graph models via a MoE module and a masking mechanism for generalized adaptation. Our framework is architecture-agnostic and can operate without any source/target domain data. Both theoretical analysis and experimental results demonstrate the effectiveness of our approach in addressing the model generalization problem.
Peer Reviews
Decision·ICLR 2026 Poster
- The proposed problem of Out-of-Distribution Graph Models Merging is novel. - The proposed method, OGMM, has a solid theoretical foundation. Generating synthetic data to extract domain knowledge makes sense. - The experiments demonstrate the effectiveness of OGMM and the contributions of each component.
- The scenarios considered are limited. More results on other node classification graph datasets could be provided. Additionally, the paper only focuses the OOD scenario within a single graph, without considering cross-dataset or cross-domain scenarios. - The proposed OGMM relies on a mixture distribution assumption, which is not likely to hold in more complex scenarios. - The proposed OGMM seems to rely on many hyperparameters, some of which significantly impact model performance according to t
1) The integration of graph generation and MoE-based model fusion is conceptually coherent, enabling domain knowledge transfer at both data and model levels. 2) The mixture distribution assumption and accompanying error bound provide a formal justification for the merging process. 3) The framework is applicable to heterogeneous GNNs, enhancing its generality and practical relevance.
1) The motivation for merging multiple pre-trained GNNs is not sufficiently justified. It remains unclear why model-level merging is preferable to retraining on aggregated data or simply using the best domain-specific model. No empirical or application-level evidence is given to show that scenarios requiring model-level merging without data access are common or practically constrained. 2) Methodological novelty appears incremental, as the proposed approach largely builds upon existing techniques
1. The paper introduces a novel challenge of merging out-of-distribution graph models without needing to retrain from scratch, which is both practical and impactful for real-world applications where data is scarce. 2. OGMM consistently outperforms previous fusion methods and demonstrates robustness on large-scale datasets like REDDIT-B and NCI1, showing that it can handle diverse graph domains with different GNN architectures. 3. The authors provide a solid theoretical framework for the problem
1. The two-stage process for merging out-of-distribution models involves multiple steps, including fine-tuning and the use of the Mixture-of-Experts (MoE) module. While effective, the overall time complexity of this process could be quite high, especially as the number of pre-trained models and the size of the graphs increase. 2. The experiments primarily focus on datasets such as REDDIT-B and NCI1, which are not necessarily representative of the most commonly encountered graph types in real-wor
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Machine Learning in Healthcare · Domain Adaptation and Few-Shot Learning
MethodsMixture of Experts
