SeqPO-SiMT: Sequential Policy Optimization for Simultaneous Machine Translation
Ting Xu, Zhichao Huang, Jiankai Sun, Shanbo Cheng, Wai Lam

TL;DR
SeqPO-SiMT introduces a sequential policy optimization framework for simultaneous machine translation, improving translation quality and reducing latency by modeling SiMT as a multi-step decision process, outperforming existing methods across multiple datasets.
Contribution
The paper proposes a novel sequential policy optimization approach tailored for SiMT, effectively handling multi-step decision making and achieving superior translation performance.
Findings
Outperforms supervised fine-tuning by 1.13 COMET points.
Reduces Average Lagging by 6.17 on NEWSTEST2021.
Rivals offline translation performance of high-capacity LLMs.
Abstract
We present Sequential Policy Optimization for Simultaneous Machine Translation (SeqPO-SiMT), a new policy optimization framework that defines the simultaneous machine translation (SiMT) task as a sequential decision making problem, incorporating a tailored reward to enhance translation quality while reducing latency. In contrast to popular Reinforcement Learning from Human Feedback (RLHF) methods, such as PPO and DPO, which are typically applied in single-step tasks, SeqPO-SiMT effectively tackles the multi-step SiMT task. This intuitive framework allows the SiMT LLMs to simulate and refine the SiMT process using a tailored reward. We conduct experiments on six datasets from diverse domains for En to Zh and Zh to En SiMT tasks, demonstrating that SeqPO-SiMT consistently achieves significantly higher translation quality with lower latency. In particular, SeqPO-SiMT outperforms the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
