Training-free Heterogeneous Model Merging
Zhengqi Xu, Han Zheng, Jie Song, Li Sun, Mingli Song

TL;DR
This paper introduces a novel training-free framework for merging heterogeneous models with different architectures, using layer alignment and elastic neuron zipping, enabling effective model reuse across vision and NLP tasks.
Contribution
It proposes innovative methods for merging models with differing depths and widths without retraining, expanding the applicability of model merging techniques.
Findings
Heterogeneous model merging achieves comparable performance to homogeneous merging.
Layer alignment effectively handles depth discrepancies.
Elastic neuron zipping manages width heterogeneity without performance loss.
Abstract
Model merging has attracted significant attention as a powerful paradigm for model reuse, facilitating the integration of task-specific models into a singular, versatile framework endowed with multifarious capabilities. Previous studies, predominantly utilizing methods such as Weight Average (WA), have shown that model merging can effectively leverage pretrained models without the need for laborious retraining. However, the inherent heterogeneity among models poses a substantial constraint on its applicability, particularly when confronted with discrepancies in model architectures. To overcome this challenge, we propose an innovative model merging framework designed for heterogeneous models, encompassing both depth and width heterogeneity. To address depth heterogeneity, we introduce a layer alignment strategy that harmonizes model layers by segmenting deeper models, treating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks
MethodsSoftmax · Attention Is All You Need
