TL;DR
This paper systematically evaluates various model merging techniques for large language models trained on overlapping or conflicting objectives, revealing that simple task arithmetic often outperforms more complex methods in real-world scenarios.
Contribution
It provides a large-scale, empirical comparison of six merging methods across multiple LLMs and benchmarks, highlighting the effectiveness of simple task arithmetic in heterogeneous settings.
Findings
Task Arithmetic reliably improves performance in in-the-wild merging scenarios.
Most advanced merging methods do not outperform the base models in heterogeneous settings.
Current merging techniques struggle to extract useful updates from conflicting models.
Abstract
Model merging combines multiple fine-tuned checkpoints into a single model without additional training, offering an attractive approach to reusing models and efficiently improving performance. However, it remains unclear whether the advantages reported for settings where all merged experts have distinct roles and are tuned on clearly separated tasks also hold in settings where the merged experts do not have clearly distinct roles, but are trained on overlapping or even conflicting objectives. To evaluate this setting, we present a large-scale, systematic evaluation of "in-the-wild" model merging of heterogeneous experts, that may have been trained on overlapping or conflicting objectives. Concretely, we evaluate six state-of-the-art merging methods, including recent subspace methods, across four open-weight LLMs, twelve fine-tuned checkpoints per base model, and sixteen standard LLM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
