MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic
Yuyan Zhou, Liang Song, Bingning Wang, Weipeng Chen

TL;DR
MetaGPT introduces a novel model-exclusive task arithmetic method for merging large language models, enhancing multi-task performance while ensuring data privacy and computational efficiency.
Contribution
It formalizes model merging as a multi-task learning problem and leverages LLM properties to create a data-agnostic, cost-effective merging approach.
Findings
Achieves state-of-the-art performance on multiple tasks
Improves task arithmetic effectiveness
Bypasses heavy search processes
Abstract
The advent of large language models (LLMs) like GPT-4 has catalyzed the exploration of multi-task learning (MTL), in which a single model demonstrates proficiency across diverse tasks. Task arithmetic has emerged as a cost-effective approach for MTL. It enables performance enhancement across multiple tasks by adding their corresponding task vectors to a pre-trained model. However, the current lack of a method that can simultaneously achieve optimal performance, computational efficiency, and data privacy limits their application to LLMs. In this paper, we propose \textbf{M}odel \textbf{E}xclusive \textbf{T}ask \textbf{A}rithmetic for merging \textbf{GPT}-scale models, which formalizes the objective of model merging into a multi-task learning framework, aiming to minimize the average loss difference between the merged model and each individual task model. Since data privacy limits the use…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsResidual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Adam · Attention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer
