Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent

Yongxian Wei; Anke Tang; Li Shen; Zixuan Hu; Chun Yuan; Xiaochun Cao

arXiv:2501.01230·cs.LG·May 27, 2025

Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent

Yongxian Wei, Anke Tang, Li Shen, Zixuan Hu, Chun Yuan, Xiaochun Cao

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel adaptive projective gradient descent method for multi-task model merging, focusing on minimizing task conflicts and preserving shared knowledge, leading to superior performance in vision and NLP tasks.

Contribution

It formulates model merging as a constrained optimization problem and proposes a data-free, gradient projection approach with adaptive merging coefficients, advancing multi-task learning techniques.

Findings

01

Outperforms previous methods across multiple architectures and tasks.

02

Achieves state-of-the-art results in vision and NLP domains.

03

Effectively balances task-specific and shared knowledge during merging.

Abstract

Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data. Existing methods attempt to alleviate task conflicts by sparsifying task vectors or promoting orthogonality among them. However, they overlook the fundamental target of model merging: the merged model performs as closely as possible to task-specific models on respective tasks. We find these methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance. Based on our findings, we frame model merging as a constrained optimization problem ( $i.e.$ , minimizing the gap between the merged model and individual models, subject to the constraint of retaining shared knowledge) and solve it via adaptive projective gradient descent. Specifically, we align the merged model with individual models by decomposing and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

walkerworldpeace/doge
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Online Learning and Analytics

MethodsALIGN