Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization
Kaden Uhlig, Joern Wuebker, Raphael Reinauer, John DeNero

TL;DR
This paper introduces Direct Quality Optimization, a novel method leveraging a quality estimation model to improve multilingual neural machine translation through task-alignment, addressing data mismatch issues and enhancing translation quality across languages.
Contribution
It presents a new approach called Direct Quality Optimization that applies task-alignment to NMT using a quality estimation model, improving multilingual translation performance.
Findings
Improved translation quality across all languages in a multilingual model.
Effective use of quality estimation as a proxy for human preferences.
Enhancements verified by automatic metrics and human evaluation.
Abstract
Reinforcement Learning from Human Feedback (RLHF) and derivative techniques like Direct Preference Optimization (DPO) are task-alignment algorithms used to repurpose general, foundational models for specific tasks. We show that applying task-alignment to neural machine translation (NMT) addresses an existing task--data mismatch in NMT, leading to improvements across all languages of a multilingual model, even when task-alignment is only applied to a subset of those languages. We do so by introducing Direct Quality Optimization (DQO), a variant of DPO leveraging a pre-trained translation quality estimation model as a proxy for human preferences, and verify the improvements with both automatic metrics and human evaluation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques
MethodsDirect Preference Optimization
