Merge to Mix: Mixing Datasets via Model Merging
Zhixu Silvia Tao, Kasper Vinken, Hao-Wei Yeh, Avi Cooper, Xavier Boix

TL;DR
The paper introduces Merge to Mix, a novel approach that uses model merging to efficiently select dataset mixtures for fine-tuning large language models, reducing the need for extensive trial-and-error.
Contribution
It proposes a new method leveraging model merging to accelerate dataset mixture selection, outperforming existing techniques in fine-tuning large language models.
Findings
Merge to Mix outperforms state-of-the-art dataset selection methods.
Model merging effectively approximates fine-tuning on dataset mixtures.
The approach reduces computational costs in dataset composition.
Abstract
Mixing datasets for fine-tuning large models (LMs) has become critical for maximizing performance on downstream tasks. However, composing effective dataset mixtures typically relies on heuristics and trial-and-error, often requiring multiple fine-tuning runs to achieve the desired outcome. We propose a novel method, , that accelerates composing dataset mixtures through model merging. Model merging is a recent technique that combines the abilities of multiple individually fine-tuned LMs into a single LM by using a few simple arithmetic operations. Our key insight is that merging models individually fine-tuned on each dataset in a mixture can effectively serve as a surrogate for a model fine-tuned on the entire mixture. Merge to Mix leverages this insight to accelerate selecting dataset mixtures without requiring full fine-tuning on each candidate mixture. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning in Healthcare · Generative Adversarial Networks and Image Synthesis
