InfiFPO: Implicit Model Fusion via Preference Optimization in Large Language Models
Yanggan Gu, Yuanyi Wang, Zhaoyi Yan, Yiming Zhang, Qi Zhou, Fei Wu, Hongxia Yang

TL;DR
InfiFPO introduces an implicit model fusion method that leverages preference optimization to enhance large language models by synthesizing multi-source probabilities, outperforming existing fusion techniques across various benchmarks.
Contribution
The paper proposes InfiFPO, a novel preference optimization approach that fuses multiple models implicitly by synthesizing probabilities, addressing limitations of previous methods that only used response outputs.
Findings
InfiFPO outperforms existing methods on 11 benchmarks.
Using Phi-4 as pivot, InfiFPO improves performance from 79.95 to 83.33.
InfiFPO enhances capabilities in mathematics, coding, and reasoning tasks.
Abstract
Model fusion combines multiple Large Language Models (LLMs) with different strengths into a more powerful, integrated model through lightweight training methods. Existing works on model fusion focus primarily on supervised fine-tuning (SFT), leaving preference alignment (PA) --a critical phase for enhancing LLM performance--largely unexplored. The current few fusion methods on PA phase, like WRPO, simplify the process by utilizing only response outputs from source models while discarding their probability information. To address this limitation, we propose InfiFPO, a preference optimization method for implicit model fusion. InfiFPO replaces the reference model in Direct Preference Optimization (DPO) with a fused source model that synthesizes multi-source probabilities at the sequence level, circumventing complex vocabulary alignment challenges in previous works and meanwhile maintaining…
Peer Reviews
Decision·NeurIPS 2025 spotlight
Strengths: [S1] Integrating preference optimization and model fusion through KL-divergence is technically sound and elegant. The derivation of the loss is clearly presented. [S2] Compared with multi-stage methods, simultaneous fusion and preference optimization can simplify the implementation and thus enhance practical applicability. Convincing empirical results show that InfiFPO can effectively achieve the two goals. Weaknesses: [W1] The data for preference optimization is generated by eith
Strengths: - The implementation details are disclosed comprehensively, which makes potential reproduction feasible and straightforward for readers. - The derivation of the InfiFPO objective is clear and technically sound. - The preliminaries on model fusion and DPO are well-explained and effectively set the stage for the proposed method, making the paper easy to follow. Weaknesses: - The diversity of experiments appears quite limited. The main results are conducted solely using Phi-4 as the piv
## Strengths 1. A novel approach to model fusion with a practical design. The paper proposes integrating model fusion into the preference alignment phase, which is a departure from common practice. The method operates at the sequence-probability level, which effectively addresses the vocabulary alignment issue often encountered in token-level fusion. This design makes the framework applicable to a range of models with different tokenizers without requiring complex alignment heuristics. 2. The p
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
MethodsFocus · ALIGN
