Advancing Translation Preference Modeling with RLHF: A Step Towards   Cost-Effective Solution

Nuo Xu; Jun Zhao; Can Zu; Sixian Li; Lu Chen; Zhihao Zhang; Rui Zheng,; Shihan Dou; Wenjuan Qin; Tao Gui; Qi Zhang; Xuanjing Huang

arXiv:2402.11525·cs.CL·February 28, 2024·2 cites

Advancing Translation Preference Modeling with RLHF: A Step Towards Cost-Effective Solution

Nuo Xu, Jun Zhao, Can Zu, Sixian Li, Lu Chen, Zhihao Zhang, Rui Zheng,, Shihan Dou, Wenjuan Qin, Tao Gui, Qi Zhang, Xuanjing Huang

PDF

Open Access

TL;DR

This paper introduces a cost-effective reinforcement learning with human feedback approach to improve machine translation quality, especially for low-resource languages, by training reward models to better align with human preferences.

Contribution

It proposes a novel preference learning strategy that enhances translation models using RLHF without requiring extensive high-quality human comparison datasets.

Findings

01

RLHF effectively improves translation quality across multiple languages.

02

A reward model with strong language skills better captures human preferences.

03

The approach benefits translation directions not directly trained with RLHF.

Abstract

Faithfulness, expressiveness, and elegance is the constant pursuit in machine translation. However, traditional metrics like \textit{BLEU} do not strictly align with human preference of translation quality. In this paper, we explore leveraging reinforcement learning with human feedback (\textit{RLHF}) to improve translation quality. It is non-trivial to collect a large high-quality dataset of human comparisons between translations, especially for low-resource languages. To address this issue, we propose a cost-effective preference learning strategy, optimizing reward models by distinguishing between human and machine translations. In this manner, the reward model learns the deficiencies of machine translation compared to human and guides subsequent improvements in machine translation. Experimental results demonstrate that \textit{RLHF} can effectively enhance translation quality and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsALIGN