SimMerge: Learning to Select Merge Operators from Similarity Signals
Oliver Bolton, Aakanksha, Arash Ahmadian, Sara Hooker, Marzieh Fadaee, Beyza Ermis

TL;DR
SimMerge is a method that predicts the best model merge operations using similarity signals, enabling scalable and efficient model merging for large language models without extensive evaluation.
Contribution
It introduces a predictive, task-agnostic approach to select merge operators and model subsets, outperforming fixed methods and generalizing across models and merge scenarios.
Findings
Outperforms fixed merge operators on 7B-parameter LLMs
Generalizes to multi-way merges and larger models without retraining
Supports online addition of tasks and operators with a bandit variant
Abstract
Model merging combines multiple models into a single model with aggregated capabilities, making it a powerful tool for large language model (LLM) development. However, scaling model merging is challenging: performance depends on the choice of merge operator, model subset, and merge order, often requiring expensive merge-and-evaluate searches. In this work, we introduce SimMerge, a predictive merge-selection method that identifies high-performing merges using inexpensive, task-agnostic similarity signals between models. Given a small set of unlabeled probes, SimMerge extracts functional and structural features to predict the performance of candidate two-way merges, enabling merge operator, order and model subset selection without iterative evaluation. We show that SimMerge consistently outperforms the best fixed merge operator across 7B-parameter LLMs and generalizes to multi-way merges…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Natural Language Processing Techniques
