Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization

Kaden Uhlig; Joern Wuebker; Raphael Reinauer; John DeNero

arXiv:2409.17673·cs.CL·September 30, 2025

Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization

Kaden Uhlig, Joern Wuebker, Raphael Reinauer, John DeNero

PDF

Open Access 1 Video

TL;DR

This paper introduces Direct Quality Optimization, a novel method leveraging a quality estimation model to improve multilingual neural machine translation through task-alignment, addressing data mismatch issues and enhancing translation quality across languages.

Contribution

It presents a new approach called Direct Quality Optimization that applies task-alignment to NMT using a quality estimation model, improving multilingual translation performance.

Findings

01

Improved translation quality across all languages in a multilingual model.

02

Effective use of quality estimation as a proxy for human preferences.

03

Enhancements verified by automatic metrics and human evaluation.

Abstract

Reinforcement Learning from Human Feedback (RLHF) and derivative techniques like Direct Preference Optimization (DPO) are task-alignment algorithms used to repurpose general, foundational models for specific tasks. We show that applying task-alignment to neural machine translation (NMT) addresses an existing task--data mismatch in NMT, leading to improvements across all languages of a multilingual model, even when task-alignment is only applied to a subset of those languages. We do so by introducing Direct Quality Optimization (DQO), a variant of DPO leveraging a pre-trained translation quality estimation model as a proxy for human preferences, and verify the improvements with both automatic metrics and human evaluation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization· underline

Taxonomy

TopicsNatural Language Processing Techniques

MethodsDirect Preference Optimization