Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging

Zhenyi Lu; Chenghao Fan; Wei Wei; Xiaoye Qu; Dangyang Chen; Yu Cheng

arXiv:2406.15479·cs.CL·October 15, 2024

Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging

Zhenyi Lu, Chenghao Fan, Wei Wei, Xiaoye Qu, Dangyang Chen, Yu Cheng

PDF

Open Access 1 Repo

TL;DR

Twin-Merging introduces a dynamic, modular approach to model merging that separates shared and exclusive knowledge, significantly improving performance and adaptability across diverse language and vision tasks.

Contribution

The paper proposes Twin-Merging, a novel method that modularizes and dynamically merges shared and exclusive knowledge, reducing interference and enhancing efficiency in model merging.

Findings

01

Achieves 28.34% average improvement in discriminative tasks.

02

Surpasses fine-tuned upper bounds on generative tasks.

03

Effectively handles heterogeneous data in model merging.

Abstract

In the era of large language models, model merging is a promising way to combine multiple task-specific models into a single multitask model without extra training. However, two challenges remain: (a) interference between different models and (b) heterogeneous data during testing. Traditional model merging methods often show significant performance gaps compared to fine-tuned models due to these issues. Additionally, a one-size-fits-all model lacks flexibility for diverse test data, leading to performance degradation. We show that both shared and exclusive task-specific knowledge are crucial for merging performance, but directly merging exclusive knowledge hinders overall performance. In view of this, we propose Twin-Merging, a method that encompasses two principal stages: (1) modularizing knowledge into shared and exclusive components, with compression to reduce redundancy and enhance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LZY-the-boys/Twin-Merging
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel-Driven Software Engineering Techniques · Business Process Modeling and Analysis · Multi-Agent Systems and Negotiation