Can Heterogeneous Language Models Be Fused?
Shilian Chen, Jie Zhou, Qin Chen, Wen Wu, Xin Li, Qi Feng, Liang He

TL;DR
This paper introduces HeteroFusion, a novel method for merging heterogeneous language models from different architectures by aligning modules and reducing conflicts, enabling effective multi-source fusion.
Contribution
HeteroFusion is the first approach to successfully fuse heterogeneous language models through topology-based alignment and conflict-aware denoising.
Findings
HeteroFusion outperforms existing merging and ensemble baselines.
It demonstrates robustness in noisy-source and cross-family scenarios.
Provides analytical justification for stable transfer processes.
Abstract
Model merging aims to integrate multiple expert models into a single model that inherits their complementary strengths without incurring the inference-time cost of ensembling. Recent progress has shown that merging can be highly effective when all source models are \emph{homogeneous}, i.e., derived from the same pretrained backbone and therefore share aligned parameter coordinates or compatible task vectors. Yet this assumption is increasingly unrealistic in open model ecosystems, where useful experts are often built on different families such as Llama, Qwen, and Mistral. In such \emph{heterogeneous} settings, direct weight-space fusion becomes ill-posed due to architectural mismatch, latent basis misalignment, and amplified cross-source conflict. We address this problem with \texttt{HeteroFusion} for heterogeneous language model fusion, which consists of two key components:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
