Merge to Mix: Mixing Datasets via Model Merging

Zhixu Silvia Tao; Kasper Vinken; Hao-Wei Yeh; Avi Cooper; Xavier Boix

arXiv:2505.16066·cs.LG·May 23, 2025

Merge to Mix: Mixing Datasets via Model Merging

Zhixu Silvia Tao, Kasper Vinken, Hao-Wei Yeh, Avi Cooper, Xavier Boix

PDF

Open Access

TL;DR

The paper introduces Merge to Mix, a novel approach that uses model merging to efficiently select dataset mixtures for fine-tuning large language models, reducing the need for extensive trial-and-error.

Contribution

It proposes a new method leveraging model merging to accelerate dataset mixture selection, outperforming existing techniques in fine-tuning large language models.

Findings

01

Merge to Mix outperforms state-of-the-art dataset selection methods.

02

Model merging effectively approximates fine-tuning on dataset mixtures.

03

The approach reduces computational costs in dataset composition.

Abstract

Mixing datasets for fine-tuning large models (LMs) has become critical for maximizing performance on downstream tasks. However, composing effective dataset mixtures typically relies on heuristics and trial-and-error, often requiring multiple fine-tuning runs to achieve the desired outcome. We propose a novel method, $Merge to Mix$ , that accelerates composing dataset mixtures through model merging. Model merging is a recent technique that combines the abilities of multiple individually fine-tuned LMs into a single LM by using a few simple arithmetic operations. Our key insight is that merging models individually fine-tuned on each dataset in a mixture can effectively serve as a surrogate for a model fine-tuned on the entire mixture. Merge to Mix leverages this insight to accelerate selecting dataset mixtures without requiring full fine-tuning on each candidate mixture. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Machine Learning in Healthcare · Generative Adversarial Networks and Image Synthesis