Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings
Miguel Moura Ramos, Tom\'as Almeida, Daniel Vareta, Filipe Azevedo, Sweta Agrawal, Patrick Fernandes, Andr\'e F. T. Martins

TL;DR
This paper introduces a fine-grained token-level reward optimization method for neural machine translation using error severity mappings, improving translation quality and training stability over traditional sentence-level reward approaches.
Contribution
It presents a novel reinforcement learning approach that leverages token-level quality assessments with error severity levels, enhancing translation performance and training robustness.
Findings
Token-level rewards outperform sentence-level rewards in translation quality.
Training with token-level rewards leads to more stable learning dynamics.
Automatic and human evaluations confirm improved translation quality.
Abstract
Reinforcement learning (RL) has been proven to be an effective and robust method for training neural machine translation systems, especially when paired with powerful reward models that accurately assess translation quality. However, most research has focused on RL methods that use sentence-level feedback, leading to inefficient learning signals due to the reward sparsity problem -- the model receives a single score for the entire sentence. To address this, we propose a novel approach that leverages fine-grained, token-level quality assessments along with error severity levels using RL methods. Specifically, we use xCOMET, a state-of-the-art quality estimation system, as our token-level reward model. We conduct experiments on small and large translation datasets with standard encoder-decoder and large language models-based machine translation systems, comparing the impact of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
