ProFuser: Progressive Fusion of Large Language Models

Tianyuan Shi; Fanqi Wan; Canbin Huang; Xiaojun Quan; Chenliang Li; Ming Yan; Ji Zhang; Minhua Huang; Wu Kai

arXiv:2408.04998·cs.CL·November 18, 2025

ProFuser: Progressive Fusion of Large Language Models

Tianyuan Shi, Fanqi Wan, Canbin Huang, Xiaojun Quan, Chenliang Li, Ming Yan, Ji Zhang, Minhua Huang, Wu Kai

PDF

TL;DR

ProFuser introduces a progressive fusion method for large language models that combines training and inference mode evaluations, leading to more effective model integration and improved performance in key areas.

Contribution

The paper presents a novel fusion approach that incorporates both training and inference evaluations, enabling more effective model combination.

Findings

01

Enhanced model performance in knowledge, reasoning, and safety.

02

Successful fusion of Vicuna, Llama-2, and MPT models.

03

ProFuser outperforms baseline fusion methods.

Abstract

While fusing the capacities and advantages of various large language models offers a pathway to construct more powerful and versatile models, a fundamental challenge is to properly select advantageous model during training. Existing fusion methods primarily focus on the training mode that uses cross entropy on ground truth in a teacher-forcing setup to measure a model's advantage, which may provide limited insight towards model advantage. In this paper, we introduce a novel approach that enhances the fusion process by incorporating both the training and inference modes. Our method evaluates model advantage not only through cross entropy during training but also by considering inference outputs, providing a more comprehensive assessment. To combine the two modes effectively, we introduce ProFuser to progressively transition from inference mode to training mode. To validate ProFuser's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus