Regularized Entropy Information Adaptation with Temporal-Awareness Networks for Simultaneous Speech Translation

Joseph Liu; Nameer Hirschkind; Xiao Yu; Mahesh Kumar Nandwana

arXiv:2604.09916·cs.LG·April 14, 2026

Regularized Entropy Information Adaptation with Temporal-Awareness Networks for Simultaneous Speech Translation

Joseph Liu, Nameer Hirschkind, Xiao Yu, Mahesh Kumar Nandwana

PDF

TL;DR

This paper enhances information gain-based policies for simultaneous speech translation by incorporating temporal context, leading to improved streaming efficiency and robustness.

Contribution

It introduces two strategies, REINA-SAN and REINA-TAN, that incorporate temporal awareness into information-based policies, outperforming the baseline in efficiency and stability.

Findings

01

REINA-TAN achieves a slightly better Pareto frontier for streaming efficiency.

02

Both methods improve Normalized Streaming Efficiency (NoSE) scores by up to 7.1%.

03

REINA-SAN offers increased robustness against read loops.

Abstract

Simultaneous Speech Translation (SimulST) requires balancing high translation quality with low latency. Recent work introduced REINA, a method that trains a Read/Write policy based on estimating the information gain of reading more audio. However, we find that information-based policies often lack temporal context, leading the policy to bias itself toward reading most of the audio before starting to write. We improve REINA using two distinct strategies: a supervised alignment network (REINA-SAN) and a timestep-augmented network (REINA-TAN). Our results demonstrate that while both methods significantly outperform the baseline and resolve stability issues, REINA-TAN provides a slightly superior Pareto frontier for streaming efficiency, whereas REINA-SAN offers more robustness against 'read loops'. Applied to Whisper, both methods improve the pareto frontier of streaming efficiency as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.