A Systematic Study of In-the-Wild Model Merging for Large Language Models

O\u{g}uz Ka\u{g}an Hitit; Leander Girrbach; Zeynep Akata

arXiv:2511.21437·cs.CL·March 31, 2026

A Systematic Study of In-the-Wild Model Merging for Large Language Models

O\u{g}uz Ka\u{g}an Hitit, Leander Girrbach, Zeynep Akata

PDF

1 Models

TL;DR

This paper systematically evaluates various model merging techniques for large language models trained on overlapping or conflicting objectives, revealing that simple task arithmetic often outperforms more complex methods in real-world scenarios.

Contribution

It provides a large-scale, empirical comparison of six merging methods across multiple LLMs and benchmarks, highlighting the effectiveness of simple task arithmetic in heterogeneous settings.

Findings

01

Task Arithmetic reliably improves performance in in-the-wild merging scenarios.

02

Most advanced merging methods do not outperform the base models in heterogeneous settings.

03

Current merging techniques struggle to extract useful updates from conflicting models.

Abstract

Model merging combines multiple fine-tuned checkpoints into a single model without additional training, offering an attractive approach to reusing models and efficiently improving performance. However, it remains unclear whether the advantages reported for settings where all merged experts have distinct roles and are tuned on clearly separated tasks also hold in settings where the merged experts do not have clearly distinct roles, but are trained on overlapping or even conflicting objectives. To evaluate this setting, we present a large-scale, systematic evaluation of "in-the-wild" model merging of heterogeneous experts, that may have been trained on overlapping or conflicting objectives. Concretely, we evaluate six state-of-the-art merging methods, including recent subspace methods, across four open-weight LLMs, twelve fine-tuned checkpoints per base model, and sixteen standard LLM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
ManniX-ITA/Qwen3.5-27B-Omnimerge
model· 8 dl
8 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.