Cross-lingual Transfer of Reward Models in Multilingual Alignment

Jiwoo Hong; Noah Lee; Rodrigo Mart\'inez-Casta\~no; C\'esar; Rodr\'iguez; James Thorne

arXiv:2410.18027·cs.CL·January 24, 2025

Cross-lingual Transfer of Reward Models in Multilingual Alignment

Jiwoo Hong, Noah Lee, Rodrigo Mart\'inez-Casta\~no, C\'esar, Rodr\'iguez, James Thorne

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates the cross-lingual transfer of reward models trained in multiple languages, demonstrating significant improvements in multilingual reinforcement learning and alignment, with extensive analysis and released resources.

Contribution

It provides empirical evidence of strong cross-lingual transfer of reward models and analyzes the underlying representation shifts, advancing multilingual RLHF methods.

Findings

01

English RMs outperform target language RMs by 3-4% on Multilingual RewardBench.

02

Cross-lingual transfer enhances multilingual instruction-following capabilities.

03

Extensive analysis and resources are released for further research.

Abstract

Reinforcement learning with human feedback (RLHF) is shown to largely benefit from precise reward models (RMs). However, recent studies in reward modeling schemes are skewed towards English, limiting the applicability of RLHF in multilingual alignments. In this work, we investigate the cross-lingual transfer of RMs trained in diverse languages, primarily from English. Our experimental results demonstrate the strong cross-lingual transfer of English RMs, exceeding target language RMs by 3~4% average increase in Multilingual RewardBench. Furthermore, we analyze the cross-lingual transfer of RMs through the representation shifts. Finally, we perform multilingual alignment to exemplify how cross-lingual transfer in RM propagates to enhanced multilingual instruction-following capability, along with extensive analyses on off-the-shelf RMs. We release the code, model, and data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

iq-kaist/rm-lingual-transfer
pytorchOfficial

Videos

Cross-lingual Transfer of Reward Models in Multilingual Alignment· underline

Taxonomy

TopicsEmployee Welfare and Language Studies