Model Merging in the Era of Large Language Models: Methods, Applications, and Future Directions

Mingyang Song; Mao Zheng

arXiv:2603.09938·cs.CL·March 31, 2026

Model Merging in the Era of Large Language Models: Methods, Applications, and Future Directions

Mingyang Song, Mao Zheng

PDF

TL;DR

This survey explores methods, applications, and future challenges of model merging in large language models, emphasizing theoretical foundations, algorithmic techniques, and practical uses like multi-task learning and federated systems.

Contribution

It provides a comprehensive taxonomy and systematic review of model merging techniques, theoretical insights, and application scenarios in the context of large language models.

Findings

01

Established theoretical foundations including loss landscape geometry and mode connectivity.

02

Reviewed diverse merging algorithms such as weight averaging, task vector arithmetic, and mixture-of-experts.

03

Identified open challenges and future research directions in model merging for LLMs.

Abstract

Model merging combines the parameters of multiple neural networks into a single model without additional training. As fine-tuned large language models (LLMs) proliferate, merging offers a computationally efficient alternative to ensembles and full retraining, enabling practitioners to compose specialized capabilities at minimal cost. This survey examines model merging in the LLM era through the \textbf{FUSE} taxonomy, organized along \textbf{F}oundations, \textbf{U}nification Strategies, \textbf{S}cenarios, and \textbf{E}cosystem. We first establish the theoretical underpinnings of merging, including loss landscape geometry and mode connectivity, then systematically review the algorithmic space spanning weight averaging, task vector arithmetic, sparsification-enhanced methods, mixture-of-experts architectures, and evolutionary optimization. We further examine downstream applications…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.