InfiGFusion: Graph-on-Logits Distillation via Efficient Gromov-Wasserstein for Model Fusion
Yuanyi Wang, Zhaoyi Yan, Yiming Zhang, Qi Zhou, Yanggan Gu, Fei Wu, Hongxia Yang

TL;DR
InfiGFusion introduces a structure-aware model fusion method that explicitly captures semantic dependencies between vocabulary tokens using a graph-based distillation approach, significantly improving performance on reasoning tasks.
Contribution
The paper presents InfiGFusion, a novel graph-on-logits distillation framework with an efficient Gromov-Wasserstein approximation for scalable, structure-aware model fusion.
Findings
Outperforms state-of-the-art models and baselines across 11 benchmarks.
Achieves +35.6 on Multistep Arithmetic and +37.06 on Causal Judgement.
Enhances reasoning, coding, and mathematics capabilities of fused models.
Abstract
Recent advances in large language models (LLMs) have intensified efforts to fuse heterogeneous open-source models into a unified system that inherits their complementary strengths. Existing logit-based fusion methods maintain inference efficiency but treat vocabulary dimensions independently, overlooking semantic dependencies encoded by cross-dimension interactions. These dependencies reflect how token types interact under a model's internal reasoning and are essential for aligning models with diverse generation behaviors. To explicitly model these dependencies, we propose \textbf{InfiGFusion}, the first structure-aware fusion framework with a novel \textit{Graph-on-Logits Distillation} (GLD) loss. Specifically, we retain the top- logits per output and aggregate their outer products across sequence positions to form a global co-activation graph, where nodes represent vocabulary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsScientific Computing and Data Management
MethodsShrink and Fine-Tune
