TL;DR
PEGRL introduces a two-stage reinforcement learning framework utilizing post-editing as an auxiliary task to enhance machine translation quality, stabilizing training and improving sample efficiency.
Contribution
The paper proposes PEGRL, a novel RL framework that leverages post-editing to guide training, balancing exploration and local optimization for better translation performance.
Findings
Consistent improvements over RL baselines in multiple language pairs.
Performance on English-Turkish translation comparable to advanced LLM systems.
Effective stabilization of RL training through post-editing auxiliary tasks.
Abstract
Reinforcement learning (RL) has shown strong promise for LLM-based machine translation, with recent methods such as GRPO demonstrating notable gains; nevertheless, translation-oriented RL remains challenged by noisy learning signals arising from Monte Carlo return estimation, as well as a large trajectory space that favors global exploration over fine-grained local optimization. We introduce \textbf{PEGRL}, a \textit{two-stage} RL framework that uses post-editing as an auxiliary task to stabilize training and guide overall optimization. At each iteration, translation outputs are sampled to construct post-editing inputs, allowing return estimation in the post-editing stage to benefit from conditioning on the current translation behavior, while jointly supporting both global exploration and fine-grained local optimization. A task-specific weighting scheme further balances the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
