Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging
Li Shen, Anke Tang, Enneng Yang, Guibing Guo, Yong Luo, Lefei Zhang,, Xiaochun Cao, Bo Du, Dacheng Tao

TL;DR
This paper introduces WEMoE and E-WEMoE, novel methods for multi-task model merging that adaptively combine modules and reduce complexity, leading to improved performance and efficiency.
Contribution
The paper proposes a dynamic, input-aware model merging approach using mixture-of-experts and module analysis, enhancing multi-task learning performance and efficiency.
Findings
WEMoE outperforms existing merging methods in MTL tasks.
E-WEMoE reduces model size and computational overhead significantly.
Both methods improve generalization and robustness in multi-task models.
Abstract
Multi-task learning (MTL) leverages a shared model to accomplish multiple tasks and facilitate knowledge transfer. Recent research on task arithmetic-based MTL demonstrates that merging the parameters of independently fine-tuned models can effectively achieve MTL. However, existing merging methods primarily seek a static optimal solution within the original model parameter space, which often results in performance degradation due to the inherent diversity among tasks and potential interferences. To address this challenge, in this paper, we propose a Weight-Ensembling Mixture of Experts (WEMoE) method for multi-task model merging. Specifically, we first identify critical (or sensitive) modules by analyzing parameter variations in core modules of Transformer-based models before and after finetuning. Then, our WEMoE statically merges non-critical modules while transforming critical modules…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Target Tracking and Data Fusion in Sensor Networks · Energy Efficient Wireless Sensor Networks
MethodsMixture of Experts
