M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models
Rishabh Maheshwary, Vikas Yadav, Hoang Nguyen, Khyati Mahajan, and Sathwik Tejaswi Madhusudhan

TL;DR
This paper introduces M2Lingual, a synthetic multilingual, multi-turn instruction dataset created using the Evol taxonomy, which improves LLM alignment across diverse languages and tasks, demonstrated by enhanced performance in experiments.
Contribution
The paper presents the first fully synthetic, multilingual, multi-turn instruction dataset with an Evol taxonomy-guided generation process, covering 70 languages and 17 NLP tasks.
Findings
Enhanced LLM performance across multiple languages
Successful creation of a large, diverse synthetic dataset
Demonstrated effectiveness of Evol-guided instruction finetuning
Abstract
Instruction finetuning (IFT) is critical for aligning Large Language Models (LLMs) to follow instructions. While many effective IFT datasets have been introduced recently, they predominantly focus on high-resource languages like English. To better align LLMs across a broad spectrum of languages and tasks, we propose a fully synthetic, novel taxonomy (Evol) guided Multilingual, Multi-turn instruction finetuning dataset, called M2Lingual. It is constructed by first selecting a diverse set of seed examples and then utilizing the proposed Evol taxonomy to convert these seeds into complex and challenging multi-turn instructions. We demonstrate the effectiveness of M2Lingual by training LLMs of varying sizes and showcasing the enhanced performance across a diverse set of languages. We contribute the 2 step Evol taxonomy with the guided generation code: https://github.com/ServiceNow/M2Lingual,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsSparse Evolutionary Training · ALIGN · Focus
