Evolutionary Optimization of Model Merging Recipes
Takuya Akiba, Makoto Shing, Yujin Tang, Qi Sun, David Ha

TL;DR
This paper introduces an evolutionary method for automatically merging diverse open-source models, resulting in state-of-the-art Japanese language and vision models without extensive training, advancing automated model composition.
Contribution
It presents a novel evolutionary approach for model merging that operates in parameter and data flow space, enabling cross-domain model creation without extensive training.
Findings
Japanese Math LLM achieved state-of-the-art performance on benchmarks.
Japanese VLM outperformed previous models in culture-specific tasks.
Method enables efficient, automated model merging across domains.
Abstract
Large language models (LLMs) have become increasingly capable, but their development often requires substantial computational resources. While model merging has emerged as a cost-effective promising approach for creating new models by combining existing ones, it currently relies on human intuition and domain knowledge, limiting its potential. Here, we propose an evolutionary approach that overcomes this limitation by automatically discovering effective combinations of diverse open-source models, harnessing their collective intelligence without requiring extensive additional training data or compute. Our approach operates in both parameter space and data flow space, allowing for optimization beyond just the weights of the individual models. This approach even facilitates cross-domain merging, generating models like a Japanese LLM with Math reasoning capabilities. Surprisingly, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗SakanaAI/EvoVLM-JP-v1-7Bmodel· 15 dl· ♡ 3815 dl♡ 38
- 🤗SakanaAI/EvoLLM-JP-v1-7Bmodel· 57 dl· ♡ 3457 dl♡ 34
- 🤗SakanaAI/EvoLLM-JP-A-v1-7Bmodel· 14 dl· ♡ 1314 dl♡ 13
- 🤗SakanaAI/EvoLLM-JP-v1-10Bmodel· 71 dl· ♡ 4071 dl♡ 40
- 🤗SakanaAI/EvoSDXL-JP-v1model· 2 dl· ♡ 422 dl♡ 42
- 🤗SakanaAI/Llama-3-EvoVLM-JP-v2model· 19 dl· ♡ 2119 dl♡ 21
- 🤗RichardErkhov/SakanaAI_-_EvoLLM-JP-v1-7B-ggufmodel· 61 dl61 dl
- 🤗RichardErkhov/SakanaAI_-_EvoLLM-JP-A-v1-7B-ggufmodel· 38 dl38 dl
- 🤗RichardErkhov/SakanaAI_-_EvoLLM-JP-v1-7B-8bitsmodel
- 🤗RichardErkhov/SakanaAI_-_EvoLLM-JP-v1-7B-awqmodel· 1 dl1 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScheduling and Optimization Algorithms · Model-Driven Software Engineering Techniques · Assembly Line Balancing Optimization
