Training-free Heterogeneous Model Merging

Zhengqi Xu; Han Zheng; Jie Song; Li Sun; Mingli Song

arXiv:2501.00061·cs.LG·January 3, 2025

Training-free Heterogeneous Model Merging

Zhengqi Xu, Han Zheng, Jie Song, Li Sun, Mingli Song

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel training-free framework for merging heterogeneous models with different architectures, using layer alignment and elastic neuron zipping, enabling effective model reuse across vision and NLP tasks.

Contribution

It proposes innovative methods for merging models with differing depths and widths without retraining, expanding the applicability of model merging techniques.

Findings

01

Heterogeneous model merging achieves comparable performance to homogeneous merging.

02

Layer alignment effectively handles depth discrepancies.

03

Elastic neuron zipping manages width heterogeneity without performance loss.

Abstract

Model merging has attracted significant attention as a powerful paradigm for model reuse, facilitating the integration of task-specific models into a singular, versatile framework endowed with multifarious capabilities. Previous studies, predominantly utilizing methods such as Weight Average (WA), have shown that model merging can effectively leverage pretrained models without the need for laborious retraining. However, the inherent heterogeneity among models poses a substantial constraint on its applicability, particularly when confronted with discrepancies in model architectures. To overcome this challenge, we propose an innovative model merging framework designed for heterogeneous models, encompassing both depth and width heterogeneity. To address depth heterogeneity, we introduce a layer alignment strategy that harmonizes model layers by segmenting deeper models, treating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zju-vipa/training_free_heterogeneous_model_merging
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks

MethodsSoftmax · Attention Is All You Need