MetaGPT: Merging Large Language Models Using Model Exclusive Task   Arithmetic

Yuyan Zhou; Liang Song; Bingning Wang; Weipeng Chen

arXiv:2406.11385·cs.CL·June 28, 2024

MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic

Yuyan Zhou, Liang Song, Bingning Wang, Weipeng Chen

PDF

Open Access 2 Models

TL;DR

MetaGPT introduces a novel model-exclusive task arithmetic method for merging large language models, enhancing multi-task performance while ensuring data privacy and computational efficiency.

Contribution

It formalizes model merging as a multi-task learning problem and leverages LLM properties to create a data-agnostic, cost-effective merging approach.

Findings

01

Achieves state-of-the-art performance on multiple tasks

02

Improves task arithmetic effectiveness

03

Bypasses heavy search processes

Abstract

The advent of large language models (LLMs) like GPT-4 has catalyzed the exploration of multi-task learning (MTL), in which a single model demonstrates proficiency across diverse tasks. Task arithmetic has emerged as a cost-effective approach for MTL. It enables performance enhancement across multiple tasks by adding their corresponding task vectors to a pre-trained model. However, the current lack of a method that can simultaneously achieve optimal performance, computational efficiency, and data privacy limits their application to LLMs. In this paper, we propose \textbf{M}odel \textbf{E}xclusive \textbf{T}ask \textbf{A}rithmetic for merging \textbf{GPT}-scale models, which formalizes the objective of model merging into a multi-task learning framework, aiming to minimize the average loss difference between the merged model and each individual task model. Since data privacy limits the use…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsResidual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Adam · Attention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer