Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning
Aakanksha, Arash Ahmadian, Seraphina Goldfarb-Tarrant, Beyza Ermis,, Marzieh Fadaee, Sara Hooker

TL;DR
This paper compares data mixing and model merging strategies for multilingual multi-task learning, finding that objective-based and language-based merging improve performance and safety more effectively.
Contribution
It demonstrates that model merging, especially objective-based and language-specific merging, outperforms data mixing in enhancing multilingual model safety and performance.
Findings
Objective-based merging improves safety by up to 10%.
Language-based merging increases performance by 4%.
Merging approaches enhance multilingual model safety and effectiveness.
Abstract
Large Language Models (LLMs) have been adopted and deployed worldwide for a broad variety of applications. However, ensuring their safe use remains a significant challenge. Preference training and safety measures often overfit to harms prevalent in Western-centric datasets, and safety protocols frequently fail to extend to multilingual settings. In this work, we explore model merging in a diverse multi-task setting, combining safety and general-purpose tasks within a multilingual context. Each language introduces unique and varied learning challenges across tasks. We find that objective-based merging is more effective than mixing data, with improvements of up to 8% and 10% in general performance and safety respectively. We also find that language-based merging is highly effective -- by merging monolingually fine-tuned models, we achieve a 4% increase in general performance and 7%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗CohereLabs/aya-expanse-8bmodel· 16k dl· ♡ 42316k dl♡ 423
- 🤗CohereLabs/aya-expanse-32bmodel· 6.7k dl· ♡ 2896.7k dl♡ 289
- 🤗jth01/aya-expanse-8b-5.0bpw-exl2model· 2 dl2 dl
- 🤗lucyknada/CohereForAI_aya-expanse-8b-exl2model· ♡ 2♡ 2
- 🤗duyntnet/aya-expanse-8b-imatrix-GGUFmodel· 47 dl47 dl
- 🤗lucyknada/CohereForAI_aya-expanse-32b-exl2model· ♡ 2♡ 2
- 🤗Andrewwwwww/aya-expanse-32bmodel· 3 dl3 dl
- 🤗Svngoku/Aya-Expanse-8B-Frenchmodel· 2 dl2 dl
- 🤗QuantFactory/aya-expanse-8b-GGUFmodel· 194 dl· ♡ 5194 dl♡ 5
- 🤗duyntnet/aya-expanse-32b-imatrix-GGUFmodel· 62 dl62 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms
