Overcoming Latency Bottlenecks in On-Device Speech Translation: A Cascaded Approach with Alignment-Based Streaming MT

Zeeshan Ahmed; Frank Seide; Niko Moritz; Ju Lin; Ruiming Xie; Simone Merello; Zhe Liu; Christian Fuegen

arXiv:2508.13358·cs.CL·August 20, 2025

Overcoming Latency Bottlenecks in On-Device Speech Translation: A Cascaded Approach with Alignment-Based Streaming MT

Zeeshan Ahmed, Frank Seide, Niko Moritz, Ju Lin, Ruiming Xie, Simone Merello, Zhe Liu, Christian Fuegen

PDF

TL;DR

This paper presents a novel cascaded approach for real-time on-device speech translation that balances translation quality and latency, leveraging linguistic cues and efficient decoding techniques.

Contribution

It introduces a simultaneous translation method that improves latency and quality in on-device streaming speech translation systems, integrating ASR and MT more effectively.

Findings

01

Outperforms baseline systems in latency and quality

02

Narrows gap with non-streaming translation systems

03

Demonstrates effectiveness on bilingual conversational speech

Abstract

This paper tackles several challenges that arise when integrating Automatic Speech Recognition (ASR) and Machine Translation (MT) for real-time, on-device streaming speech translation. Although state-of-the-art ASR systems based on Recurrent Neural Network Transducers (RNN-T) can perform real-time transcription, achieving streaming translation in real-time remains a significant challenge. To address this issue, we propose a simultaneous translation approach that effectively balances translation quality and latency. We also investigate efficient integration of ASR and MT, leveraging linguistic cues generated by the ASR system to manage context and utilizing efficient beam-search pruning techniques such as time-out and forced finalization to maintain system's real-time factor. We apply our approach to an on-device bilingual conversational speech translation and demonstrate that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.