Extend Model Merging from Fine-Tuned to Pre-Trained Large Language   Models via Weight Disentanglement

Le Yu; Bowen Yu; Haiyang Yu; Fei Huang; Yongbin Li

arXiv:2408.03092·cs.CL·August 7, 2024

Extend Model Merging from Fine-Tuned to Pre-Trained Large Language Models via Weight Disentanglement

Le Yu, Bowen Yu, Haiyang Yu, Fei Huang, Yongbin Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces WIDEN, a novel weight disentanglement method that extends model merging from fine-tuned to pre-trained large language models, enabling effective capability integration across diverse model types.

Contribution

We propose a weight disentanglement approach that allows merging of both fine-tuned and pre-trained LLMs, broadening the applicability of model merging techniques.

Findings

01

WIDEN successfully merges FT and PT LLMs, injecting new abilities.

02

Existing methods often fail with PT LLMs, losing capabilities.

03

WIDEN achieves balanced skill integration in experiments.

Abstract

Merging Large Language Models (LLMs) aims to amalgamate multiple homologous LLMs into one with all the capabilities. Ideally, any LLMs sharing the same backbone should be mergeable, irrespective of whether they are Fine-Tuned (FT) with minor parameter changes or Pre-Trained (PT) with substantial parameter shifts. However, existing methods often manually assign the model importance, rendering them feasible only for LLMs with similar parameter alterations, such as multiple FT LLMs. The diverse parameter changed ranges between FT and PT LLMs pose challenges for current solutions in empirically determining the optimal combination. In this paper, we make a pioneering effort to broaden the applicability of merging techniques from FT to PT LLMs. We initially examine the efficacy of current methods in merging FT and PT LLMs, discovering that they struggle to deal with PT LLMs. Subsequently, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yule-BUAA/MergeLLM
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques