SimulPL: Aligning Human Preferences in Simultaneous Machine Translation

Donglei Yu; Yang Zhao; Jie Zhu; Yangyifan Xu; Yu Zhou; Chengqing Zong

arXiv:2502.00634·cs.CL·February 6, 2025

SimulPL: Aligning Human Preferences in Simultaneous Machine Translation

Donglei Yu, Yang Zhao, Jie Zhu, Yangyifan Xu, Yu Zhou, Chengqing Zong

PDF

Open Access 3 Reviews

TL;DR

SimulPL introduces a framework for aligning simultaneous machine translation models with human preferences, improving translation quality, simplicity, and latency across multiple language pairs by leveraging preference data and optimizing read/write policies.

Contribution

The paper presents SimulPL, a novel preference learning framework that incorporates human preferences into SiMT models, especially optimizing latency and read/write policies, which was previously unexplored.

Findings

01

Better alignment with human preferences across all latency levels.

02

Improved translation performance in Zh→En, De→En, and En→Zh tasks.

03

Effective use of GPT-4/4o for preference data generation.

Abstract

Simultaneous Machine Translation (SiMT) generates translations while receiving streaming source inputs. This requires the SiMT model to learn a read/write policy, deciding when to translate and when to wait for more source input. Numerous linguistic studies indicate that audiences in SiMT scenarios have distinct preferences, such as accurate translations, simpler syntax, and no unnecessary latency. Aligning SiMT models with these human preferences is crucial to improve their performances. However, this issue still remains unexplored. Additionally, preference optimization for SiMT task is also challenging. Existing methods focus solely on optimizing the generated responses, ignoring human preferences related to latency and the optimization of read/write policy during the preference optimization phase. To address these challenges, we propose Simultaneous Preference Learning (SimulPL), a…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 8Confidence 4

Strengths

1. This is an intriguing question and a useful framework for practical SiMT scenarios. While SiMT has been extensively studied in the research community, most papers have focused on automatic metrics rather than real human preferences. This work proposed 5 human preferences that are closely related to the end-users of SiMT. 2. The proposed optimization objective aligns well with the proposed human preferences. For example, the output length constraint is applied for latency preference. The deri

Weaknesses

1. The data construction relies on GPT-4/4o. What is the cost of construction the training data? Does it mean GPT-4/4o is the upper bound of human preference alignment? Or this work is to GPT-4 preference alignment? Figure 2 only shows the win-rate between GPT-4 translation and original one. Is there a comparison between GPT-4 and professional interpreter? 2. In the experiments, you only tune the hyper-parameter $\alpha$, and set $c_t$ as 0.5 to control the read/write. In general, tuning the t

Reviewer 02Rating 6Confidence 4

Strengths

The paper presents a comprehensive approach to Simultaneous Machine Translation, addressing data construction, model training, and inference.

Weaknesses

1. The paper lacks a clear emphasis on the innovative aspects of the proposed method. Highlighting specific novel contributions would strengthen the paper. 2. It remains unclear how the quality of the automatic labeling by GPT-4 is ensured, and the consistency rate with human labeling is not provided. 3. The distinction between the uppercase X and lowercase x in equations 6 and 7 needs clarification. 4. According to Figure 6, the improvement of SimulPL over only MSFT is not substantial, raisi

Reviewer 03Rating 6Confidence 4

Strengths

1. The proposed approach shows performance improvement on the machine translation datasets. 2. The proposed five aspects for SiMT preference alignment could be useful. 3. The presentation is clear and easy to understand.

Weaknesses

1. Heavy reliance on the quality of the GPT-generated preference data, and few effort is done to verify the quality/correctness/trustworthiness of the generated data. 2. Overall limited novelty for the whole framework, the idea of length penalty term |y| in the proposed SimulDPO loss function come from [1], and nothing much novel is added here. 3. Scope is very limited to SiMT only, and this method does not seem applicable to more general LLM preference alignment cases. 4. Some design is arbitra

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling