Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent
Yongxian Wei, Anke Tang, Li Shen, Zixuan Hu, Chun Yuan, Xiaochun Cao

TL;DR
This paper introduces a novel adaptive projective gradient descent method for multi-task model merging, focusing on minimizing task conflicts and preserving shared knowledge, leading to superior performance in vision and NLP tasks.
Contribution
It formulates model merging as a constrained optimization problem and proposes a data-free, gradient projection approach with adaptive merging coefficients, advancing multi-task learning techniques.
Findings
Outperforms previous methods across multiple architectures and tasks.
Achieves state-of-the-art results in vision and NLP domains.
Effectively balances task-specific and shared knowledge during merging.
Abstract
Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data. Existing methods attempt to alleviate task conflicts by sparsifying task vectors or promoting orthogonality among them. However, they overlook the fundamental target of model merging: the merged model performs as closely as possible to task-specific models on respective tasks. We find these methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance. Based on our findings, we frame model merging as a constrained optimization problem (, minimizing the gap between the merged model and individual models, subject to the constraint of retaining shared knowledge) and solve it via adaptive projective gradient descent. Specifically, we align the merged model with individual models by decomposing and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Online Learning and Analytics
MethodsALIGN
