SeqPO-SiMT: Sequential Policy Optimization for Simultaneous Machine Translation

Ting Xu; Zhichao Huang; Jiankai Sun; Shanbo Cheng; Wai Lam

arXiv:2505.20622·cs.CL·May 28, 2025

SeqPO-SiMT: Sequential Policy Optimization for Simultaneous Machine Translation

Ting Xu, Zhichao Huang, Jiankai Sun, Shanbo Cheng, Wai Lam

PDF

Open Access 1 Video

TL;DR

SeqPO-SiMT introduces a sequential policy optimization framework for simultaneous machine translation, improving translation quality and reducing latency by modeling SiMT as a multi-step decision process, outperforming existing methods across multiple datasets.

Contribution

The paper proposes a novel sequential policy optimization approach tailored for SiMT, effectively handling multi-step decision making and achieving superior translation performance.

Findings

01

Outperforms supervised fine-tuning by 1.13 COMET points.

02

Reduces Average Lagging by 6.17 on NEWSTEST2021.

03

Rivals offline translation performance of high-capacity LLMs.

Abstract

We present Sequential Policy Optimization for Simultaneous Machine Translation (SeqPO-SiMT), a new policy optimization framework that defines the simultaneous machine translation (SiMT) task as a sequential decision making problem, incorporating a tailored reward to enhance translation quality while reducing latency. In contrast to popular Reinforcement Learning from Human Feedback (RLHF) methods, such as PPO and DPO, which are typically applied in single-step tasks, SeqPO-SiMT effectively tackles the multi-step SiMT task. This intuitive framework allows the SiMT LLMs to simulate and refine the SiMT process using a tailored reward. We conduct experiments on six datasets from diverse domains for En to Zh and Zh to En SiMT tasks, demonstrating that SeqPO-SiMT consistently achieves significantly higher translation quality with lower latency. In particular, SeqPO-SiMT outperforms the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SeqPO-SiMT: Sequential Policy Optimization for Simultaneous Machine Translation· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling