InfiFPO: Implicit Model Fusion via Preference Optimization in Large Language Models

Yanggan Gu; Yuanyi Wang; Zhaoyi Yan; Yiming Zhang; Qi Zhou; Fei Wu; Hongxia Yang

arXiv:2505.13878·cs.LG·October 23, 2025

InfiFPO: Implicit Model Fusion via Preference Optimization in Large Language Models

Yanggan Gu, Yuanyi Wang, Zhaoyi Yan, Yiming Zhang, Qi Zhou, Fei Wu, Hongxia Yang

PDF

Open Access 1 Repo 1 Models 3 Reviews

TL;DR

InfiFPO introduces an implicit model fusion method that leverages preference optimization to enhance large language models by synthesizing multi-source probabilities, outperforming existing fusion techniques across various benchmarks.

Contribution

The paper proposes InfiFPO, a novel preference optimization approach that fuses multiple models implicitly by synthesizing probabilities, addressing limitations of previous methods that only used response outputs.

Findings

01

InfiFPO outperforms existing methods on 11 benchmarks.

02

Using Phi-4 as pivot, InfiFPO improves performance from 79.95 to 83.33.

03

InfiFPO enhances capabilities in mathematics, coding, and reasoning tasks.

Abstract

Model fusion combines multiple Large Language Models (LLMs) with different strengths into a more powerful, integrated model through lightweight training methods. Existing works on model fusion focus primarily on supervised fine-tuning (SFT), leaving preference alignment (PA) --a critical phase for enhancing LLM performance--largely unexplored. The current few fusion methods on PA phase, like WRPO, simplify the process by utilizing only response outputs from source models while discarding their probability information. To address this limitation, we propose InfiFPO, a preference optimization method for implicit model fusion. InfiFPO replaces the reference model in Direct Preference Optimization (DPO) with a fused source model that synthesizes multi-source probabilities at the sequence level, circumventing complex vocabulary alignment challenges in previous works and meanwhile maintaining…

Peer Reviews

Decision·NeurIPS 2025 spotlight

Reviewer 01Rating 5Confidence 4

Strengths

Strengths: [S1] Integrating preference optimization and model fusion through KL-divergence is technically sound and elegant. The derivation of the loss is clearly presented. [S2] Compared with multi-stage methods, simultaneous fusion and preference optimization can simplify the implementation and thus enhance practical applicability. Convincing empirical results show that InfiFPO can effectively achieve the two goals. Weaknesses: [W1] The data for preference optimization is generated by eith

Reviewer 02Rating 4Confidence 3

Strengths

Strengths: - The implementation details are disclosed comprehensively, which makes potential reproduction feasible and straightforward for readers. - The derivation of the InfiFPO objective is clear and technically sound. - The preliminaries on model fusion and DPO are well-explained and effectively set the stage for the proposed method, making the paper easy to follow. Weaknesses: - The diversity of experiments appears quite limited. The main results are conducted solely using Phi-4 as the piv

Reviewer 03Rating 4Confidence 5

Strengths

## Strengths 1. A novel approach to model fusion with a practical design. The paper proposes integrating model fusion into the preference alignment phase, which is a departure from common practice. The method operates at the sequence-probability level, which effectively addresses the vocabulary alignment issue often encountered in token-level fusion. This design makes the framework applicable to a range of models with different tokenizers without requiring complex alignment heuristics. 2. The p

Code & Models

Repositories

reallm-labs/infifpo
noneOfficial

Models

🤗
InfiX-ai/InfiFPO-14B
model· 6 dl· ♡ 7
6 dl♡ 7

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems

MethodsFocus · ALIGN