SE-Merging: A Self-Enhanced Approach for Dynamic Model Merging
Zijun Chen, Zhanpeng Zhou, Bo Zhang, Weinan Zhang, Xi Sun, Junchi Yan

TL;DR
SE-Merging is a novel framework that enhances dynamic model merging by leveraging task differentiation and adaptive rescaling, achieving improved multi-task performance without extra training.
Contribution
It introduces a self-enhanced merging method that dynamically identifies tasks and adjusts merging coefficients, advancing understanding and capabilities of model merging.
Findings
Significant performance improvements over existing merging methods
Effective task differentiation and adaptive rescaling in merging process
Compatibility with various existing model merging techniques
Abstract
Model merging has gained increasing attention due to its intriguing property: interpolating the parameters of different task-specific fine-tuned models leads to multi-task abilities. However, despite its empirical success, the underlying mechanisms of model merging remain poorly understood. In this work, we delve into the mechanism behind model merging from a representation perspective. Our analysis reveals that model merging achieves multi-task abilities through two key capabilities: i) distinguishing samples from different tasks, and ii) adapting to the corresponding expert model for each sample. These two capabilities allow the merged model to retain task-specific expertise, enabling efficient multi-task adaptation. Building on these insights, we propose \texttt{SE-Merging}, a self-enhanced model merging framework that leverages these two characteristics to dynamically identify the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
