Arcee's MergeKit: A Toolkit for Merging Large Language Models
Charles Goddard, Shamane Siriwardhana, Malikeh Ehghaghi, Luke Meyers,, Vlad Karpukhin, Brian Benedict, Mark McQuade, Jacob Solawetz

TL;DR
This paper introduces MergeKit, an open-source toolkit that enables the merging of large language models to create versatile multitask models without additional training, addressing challenges like catastrophic forgetting.
Contribution
The paper presents MergeKit, a comprehensive framework for merging large language models, facilitating multitask capabilities and improving model performance without retraining.
Findings
Thousands of models merged using MergeKit
Creation of some of the world's most powerful open-source checkpoints
Enhanced multitask learning without additional training
Abstract
The rapid expansion of the open-source language model landscape presents an opportunity to merge the competencies of these model checkpoints by combining their parameters. Advances in transfer learning, the process of fine-tuning pretrained models for specific tasks, has resulted in the development of vast amounts of task-specific models, typically specialized in individual tasks and unable to utilize each other's strengths. Model merging facilitates the creation of multitask models without the need for additional training, offering a promising avenue for enhancing model performance and versatility. By preserving the intrinsic capabilities of the original models, model merging addresses complex challenges in AI - including the difficulties of catastrophic forgetting and multitask learning. To support this expanding area of research, we introduce MergeKit, a comprehensive, open-source…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗6DammK9/AstolfoMix-XLmodel· 601 dl· ♡ 10601 dl♡ 10
- 🤗CombinHorizon/YiSM-blossom5.1-34B-SLERPmodel· 7.6k dl7.6k dl
- 🤗kainatq/kainaticulous-rp-7bmodel· 6 dl· ♡ 16 dl♡ 1
- 🤗kainatq/kainaticulous-rp-7b-ggufmodel· 63 dl· ♡ 263 dl♡ 2
- 🤗arcee-ai/Meraj-Minimodel· 49 dl· ♡ 1849 dl♡ 18
- 🤗QuantFactory/Meraj-Mini-GGUFmodel· 135 dl· ♡ 3135 dl♡ 3
- 🤗PKU-DS-LAB/FairyR1-32Bmodel· 22 dl· ♡ 10222 dl♡ 102
- 🤗Mungert/FairyR1-32B-GGUFmodel· 67 dl· ♡ 267 dl♡ 2
- 🤗PKU-DS-LAB/FairyR1-14B-Previewmodel· 11 dl· ♡ 2011 dl♡ 20
- 🤗spartan8806/ATLES-Merge-Papermodel
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Semantic Web and Ontologies
MethodsLib
