Regularized Entropy Information Adaptation with Temporal-Awareness Networks for Simultaneous Speech Translation
Joseph Liu, Nameer Hirschkind, Xiao Yu, Mahesh Kumar Nandwana

TL;DR
This paper enhances information gain-based policies for simultaneous speech translation by incorporating temporal context, leading to improved streaming efficiency and robustness.
Contribution
It introduces two strategies, REINA-SAN and REINA-TAN, that incorporate temporal awareness into information-based policies, outperforming the baseline in efficiency and stability.
Findings
REINA-TAN achieves a slightly better Pareto frontier for streaming efficiency.
Both methods improve Normalized Streaming Efficiency (NoSE) scores by up to 7.1%.
REINA-SAN offers increased robustness against read loops.
Abstract
Simultaneous Speech Translation (SimulST) requires balancing high translation quality with low latency. Recent work introduced REINA, a method that trains a Read/Write policy based on estimating the information gain of reading more audio. However, we find that information-based policies often lack temporal context, leading the policy to bias itself toward reading most of the audio before starting to write. We improve REINA using two distinct strategies: a supervised alignment network (REINA-SAN) and a timestep-augmented network (REINA-TAN). Our results demonstrate that while both methods significantly outperform the baseline and resolve stability issues, REINA-TAN provides a slightly superior Pareto frontier for streaming efficiency, whereas REINA-SAN offers more robustness against 'read loops'. Applied to Whisper, both methods improve the pareto frontier of streaming efficiency as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
