InfiGFusion: Graph-on-Logits Distillation via Efficient Gromov-Wasserstein for Model Fusion

Yuanyi Wang; Zhaoyi Yan; Yiming Zhang; Qi Zhou; Yanggan Gu; Fei Wu; Hongxia Yang

arXiv:2505.13893·cs.CL·May 21, 2025

InfiGFusion: Graph-on-Logits Distillation via Efficient Gromov-Wasserstein for Model Fusion

Yuanyi Wang, Zhaoyi Yan, Yiming Zhang, Qi Zhou, Yanggan Gu, Fei Wu, Hongxia Yang

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

InfiGFusion introduces a structure-aware model fusion method that explicitly captures semantic dependencies between vocabulary tokens using a graph-based distillation approach, significantly improving performance on reasoning tasks.

Contribution

The paper presents InfiGFusion, a novel graph-on-logits distillation framework with an efficient Gromov-Wasserstein approximation for scalable, structure-aware model fusion.

Findings

01

Outperforms state-of-the-art models and baselines across 11 benchmarks.

02

Achieves +35.6 on Multistep Arithmetic and +37.06 on Causal Judgement.

03

Enhances reasoning, coding, and mathematics capabilities of fused models.

Abstract

Recent advances in large language models (LLMs) have intensified efforts to fuse heterogeneous open-source models into a unified system that inherits their complementary strengths. Existing logit-based fusion methods maintain inference efficiency but treat vocabulary dimensions independently, overlooking semantic dependencies encoded by cross-dimension interactions. These dependencies reflect how token types interact under a model's internal reasoning and are essential for aligning models with diverse generation behaviors. To explicitly model these dependencies, we propose \textbf{InfiGFusion}, the first structure-aware fusion framework with a novel \textit{Graph-on-Logits Distillation} (GLD) loss. Specifically, we retain the top- $k$ logits per output and aggregate their outer products across sequence positions to form a global co-activation graph, where nodes represent vocabulary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

reallm-labs/infigfusion
noneOfficial

Models

🤗
InfiX-ai/InfiGFusion-14B
model· 8 dl· ♡ 6
8 dl♡ 6

Videos

InfiGFusion: Graph-on-Logits Distillation via Efficient Gromov-Wasserstein for Model Fusion· slideslive

Taxonomy

TopicsScientific Computing and Data Management

MethodsShrink and Fine-Tune