RLHF Can Speak Many Languages: Unlocking Multilingual Preference   Optimization for LLMs

John Dang; Arash Ahmadian; Kelly Marchisio; Julia Kreutzer; Ahmet; \"Ust\"un; Sara Hooker

arXiv:2407.02552·cs.CL·July 4, 2024·1 cites

RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs

John Dang, Arash Ahmadian, Kelly Marchisio, Julia Kreutzer, Ahmet, \"Ust\"un, Sara Hooker

PDF

Open Access 10 Models 1 Video

TL;DR

This paper advances multilingual large language model alignment by introducing a scalable feedback data generation method, achieving state-of-the-art performance across 23 languages and demonstrating the benefits of cross-lingual transfer and larger datasets.

Contribution

The authors develop a novel scalable method for multilingual feedback data generation and demonstrate its effectiveness in improving multilingual LLM alignment and performance.

Findings

01

Achieved 54.4% win-rate against Aya 23 8B, the current multilingual SOTA.

02

Expanded alignment techniques to 23 languages covering half the world's population.

03

Showed benefits of cross-lingual transfer and larger datasets in preference training.

Abstract

Preference optimization techniques have become a standard final stage for training state-of-art large language models (LLMs). However, despite widespread adoption, the vast majority of work to-date has focused on first-class citizen languages like English and Chinese. This captures a small fraction of the languages in the world, but also makes it unclear which aspects of current state-of-the-art research transfer to a multilingual setting. In this work, we perform an exhaustive study to achieve a new state-of-the-art in aligning multilingual LLMs. We introduce a novel, scalable method for generating high-quality multilingual feedback data to balance data coverage. We establish the benefits of cross-lingual transfer and increased dataset size in preference training. Our preference-trained model achieves a 54.4% win-rate against Aya 23 8B, the current state-of-the-art multilingual LLM in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs· underline

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Translation Studies and Practices