Evolutionary Optimization of Model Merging Recipes

Takuya Akiba; Makoto Shing; Yujin Tang; Qi Sun; David Ha

arXiv:2403.13187·cs.NE·January 28, 2025·3 cites

Evolutionary Optimization of Model Merging Recipes

Takuya Akiba, Makoto Shing, Yujin Tang, Qi Sun, David Ha

PDF

Open Access 1 Repo 10 Models 3 Datasets

TL;DR

This paper introduces an evolutionary method for automatically merging diverse open-source models, resulting in state-of-the-art Japanese language and vision models without extensive training, advancing automated model composition.

Contribution

It presents a novel evolutionary approach for model merging that operates in parameter and data flow space, enabling cross-domain model creation without extensive training.

Findings

01

Japanese Math LLM achieved state-of-the-art performance on benchmarks.

02

Japanese VLM outperformed previous models in culture-specific tasks.

03

Method enables efficient, automated model merging across domains.

Abstract

Large language models (LLMs) have become increasingly capable, but their development often requires substantial computational resources. While model merging has emerged as a cost-effective promising approach for creating new models by combining existing ones, it currently relies on human intuition and domain knowledge, limiting its potential. Here, we propose an evolutionary approach that overcomes this limitation by automatically discovering effective combinations of diverse open-source models, harnessing their collective intelligence without requiring extensive additional training data or compute. Our approach operates in both parameter space and data flow space, allowing for optimization beyond just the weights of the individual models. This approach even facilitates cross-domain merging, generating models like a Japanese LLM with Math reasoning capabilities. Surprisingly, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sakanaai/evolutionary-model-merge
pytorchOfficial

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScheduling and Optimization Algorithms · Model-Driven Software Engineering Techniques · Assembly Line Balancing Optimization