Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings

Miguel Moura Ramos; Tom\'as Almeida; Daniel Vareta; Filipe Azevedo; Sweta Agrawal; Patrick Fernandes; Andr\'e F. T. Martins

arXiv:2411.05986·cs.CL·November 24, 2025

Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings

Miguel Moura Ramos, Tom\'as Almeida, Daniel Vareta, Filipe Azevedo, Sweta Agrawal, Patrick Fernandes, Andr\'e F. T. Martins

PDF

Open Access

TL;DR

This paper introduces a fine-grained token-level reward optimization method for neural machine translation using error severity mappings, improving translation quality and training stability over traditional sentence-level reward approaches.

Contribution

It presents a novel reinforcement learning approach that leverages token-level quality assessments with error severity levels, enhancing translation performance and training robustness.

Findings

01

Token-level rewards outperform sentence-level rewards in translation quality.

02

Training with token-level rewards leads to more stable learning dynamics.

03

Automatic and human evaluations confirm improved translation quality.

Abstract

Reinforcement learning (RL) has been proven to be an effective and robust method for training neural machine translation systems, especially when paired with powerful reward models that accurately assess translation quality. However, most research has focused on RL methods that use sentence-level feedback, leading to inefficient learning signals due to the reward sparsity problem -- the model receives a single score for the entire sentence. To address this, we propose a novel approach that leverages fine-grained, token-level quality assessments along with error severity levels using RL methods. Specifically, we use xCOMET, a state-of-the-art quality estimation system, as our token-level reward model. We conduct experiments on small and large translation datasets with standard encoder-decoder and large language models-based machine translation systems, comparing the impact of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques