MergeBench: A Benchmark for Merging Domain-Specialized LLMs

Yifei He; Siqi Zeng; Yuzheng Hu; Rui Yang; Tong Zhang; Han Zhao

arXiv:2505.10833·cs.LG·October 21, 2025

MergeBench: A Benchmark for Merging Domain-Specialized LLMs

Yifei He, Siqi Zeng, Yuzheng Hu, Rui Yang, Tong Zhang, Han Zhao

PDF

Open Access 1 Repo 3 Models 1 Video

TL;DR

MergeBench is a comprehensive evaluation suite for assessing the effectiveness of model merging techniques on large, domain-specific language models across multiple tasks, providing insights and guidelines for future research.

Contribution

The paper introduces MergeBench, a standardized benchmarking framework for large-scale model merging, covering multiple domains and evaluating various merging methods with extensive experiments.

Findings

01

Model merging performs better on stronger base models.

02

Techniques like coefficient tuning and sparsification improve knowledge retention.

03

Challenges include high computational costs and performance gaps.

Abstract

Model merging provides a scalable alternative to multi-task training by combining specialized finetuned models through parameter arithmetic, enabling efficient deployment without the need for joint training or access to all task data. While recent methods have shown promise, existing evaluations are limited in both model scale and task diversity, leaving open questions about their applicability to large, domain-specialized LLMs. To tackle the challenges, we introduce MergeBench, a comprehensive evaluation suite designed to assess model merging at scale. MergeBench builds on state-of-the-art open-source language models, including Llama and Gemma families at 2B to 9B scales, and covers five key domains: instruction following, mathematics, multilingual understanding, coding and safety. We standardize finetuning and evaluation protocols, and assess eight representative merging methods…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uiuctml/mergebench
pytorchOfficial

Models

Videos

MergeBench: A Benchmark for Merging Domain-Specialized LLMs· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications

MethodsBalanced Selection · LLaMA