Hierarchical Policy Optimization for Simultaneous Translation of Unbounded Speech

Siqi Ouyang; Shuoyang Ding; Oleksii Hrinchuk; Vitaly Lavrukhin; Brian Yan; Boris Ginsburg; Lei Li

arXiv:2604.21045·cs.CL·April 24, 2026

Hierarchical Policy Optimization for Simultaneous Translation of Unbounded Speech

Siqi Ouyang, Shuoyang Ding, Oleksii Hrinchuk, Vitaly Lavrukhin, Brian Yan, Boris Ginsburg, Lei Li

PDF

1 Repo

TL;DR

This paper introduces Hierarchical Policy Optimization to improve simultaneous speech translation by balancing translation quality and latency, leveraging post-training on imperfect supervised data and hierarchical rewards.

Contribution

It proposes a novel hierarchical reward-based post-training method for SST that enhances translation quality and reduces latency without relying on high-quality supervised dialogue data.

Findings

01

Achieved over +7 COMET score improvement

02

Gained +1.25 MetricX score at 1.5s latency

03

Validated effectiveness through extensive ablation studies

Abstract

Simultaneous speech translation (SST) generates translations while receiving partial speech input. Recent advances show that large language models (LLMs) can substantially improve SST quality, but at the cost of high computational overhead. To reduce this cost, prior work reformulates SST as a multi-turn dialogue task, enabling full reuse of the LLM's key-value (KV) cache and eliminating redundant feature recomputation. However, this approach relies on supervised fine-tuning (SFT) data in dialogue form, for which few human annotations exist, and existing synthesis methods cannot guarantee data quality. In this work, we propose a Hierarchical Policy Optimization (HPO) approach that post-train models trained on imperfect SFT data. We introduce a hierarchical reward that balances translation quality and latency objectives. Experiments on English to Chinese/German/Japanese demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

owaski/HPO
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.