Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning

Aakanksha; Arash Ahmadian; Seraphina Goldfarb-Tarrant; Beyza Ermis,; Marzieh Fadaee; Sara Hooker

arXiv:2410.10801·cs.CL·October 15, 2024

Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning

Aakanksha, Arash Ahmadian, Seraphina Goldfarb-Tarrant, Beyza Ermis,, Marzieh Fadaee, Sara Hooker

PDF

Open Access 10 Models

TL;DR

This paper compares data mixing and model merging strategies for multilingual multi-task learning, finding that objective-based and language-based merging improve performance and safety more effectively.

Contribution

It demonstrates that model merging, especially objective-based and language-specific merging, outperforms data mixing in enhancing multilingual model safety and performance.

Findings

01

Objective-based merging improves safety by up to 10%.

02

Language-based merging increases performance by 4%.

03

Merging approaches enhance multilingual model safety and effectiveness.

Abstract

Large Language Models (LLMs) have been adopted and deployed worldwide for a broad variety of applications. However, ensuring their safe use remains a significant challenge. Preference training and safety measures often overfit to harms prevalent in Western-centric datasets, and safety protocols frequently fail to extend to multilingual settings. In this work, we explore model merging in a diverse multi-task setting, combining safety and general-purpose tasks within a multilingual context. Each language introduces unique and varied learning challenges across tasks. We find that objective-based merging is more effective than mixing data, with improvements of up to 8% and 10% in general performance and safety respectively. We also find that language-based merging is highly effective -- by merging monolingually fine-tuned models, we achieve a 4% increase in general performance and 7%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms