Merging Multi-Task Models via Weight-Ensembling Mixture of Experts

Anke Tang; Li Shen; Yong Luo; Nan Yin; Lefei Zhang; Dacheng Tao

arXiv:2402.00433·cs.LG·June 10, 2024·2 cites

Merging Multi-Task Models via Weight-Ensembling Mixture of Experts

Anke Tang, Li Shen, Yong Luo, Nan Yin, Lefei Zhang, Dacheng Tao

PDF

Open Access 1 Repo 2 Datasets 1 Video

TL;DR

This paper introduces a novel method for merging multi-task Transformer models using a weight-ensembling mixture of experts, which dynamically integrates shared and task-specific knowledge to improve performance and mitigate parameter interference.

Contribution

The paper proposes a dynamic MoE-based merging approach that separates shared and task-specific knowledge, enhancing multi-task model integration beyond static methods.

Findings

01

Effective multi-task model merging demonstrated

02

Improved generalization and robustness shown

03

Dynamic integration outperforms static methods

Abstract

Merging various task-specific Transformer-based models trained on different tasks into a single unified model can execute all the tasks concurrently. Previous methods, exemplified by task arithmetic, have been proven to be both effective and scalable. Existing methods have primarily focused on seeking a static optimal solution within the original model parameter space. A notable challenge is mitigating the interference between parameters of different models, which can substantially deteriorate performance. In this paper, we propose to merge most of the parameters while upscaling the MLP of the Transformer layers to a weight-ensembling mixture of experts (MoE) module, which can dynamically integrate shared and task-specific knowledge based on the input, thereby providing a more flexible solution that can adapt to the specific needs of each instance. Our key insight is that by identifying…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tanganke/weight-ensembling_moe
pytorchOfficial

Datasets

Videos

Merging Multi-Task Models via Weight-Ensembling Mixture of Experts· slideslive

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Human Mobility and Location-Based Analysis · Recommender Systems and Techniques

MethodsAttention Is All You Need · Linear Layer · Dense Connections · Position-Wise Feed-Forward Layer · Label Smoothing · Byte Pair Encoding · Residual Connection · Absolute Position Encodings · Dropout · Layer Normalization