End-to-End Evaluation for Low-Latency Simultaneous Speech Translation

Christian Huber; Tu Anh Dinh; Carlos Mullov; Ngoc Quan Pham; Thai Binh Nguyen; Fabian Retkowski; Stefan Constantin; Enes Yavuz Ugan; Danni Liu; Zhaolin Li; Sai Koneru; Jan Niehues; Alexander Waibel

arXiv:2308.03415·cs.CL·July 8, 2025

End-to-End Evaluation for Low-Latency Simultaneous Speech Translation

Christian Huber, Tu Anh Dinh, Carlos Mullov, Ngoc Quan Pham, Thai Binh Nguyen, Fabian Retkowski, Stefan Constantin, Enes Yavuz Ugan, Danni Liu, Zhaolin Li, Sai Koneru, Jan Niehues, Alexander Waibel

PDF

Open Access

TL;DR

This paper introduces a comprehensive framework for evaluating low-latency speech translation systems in realistic scenarios, enabling fair comparison of different approaches including end-to-end and cascaded models.

Contribution

It presents the first end-to-end evaluation framework for low-latency speech translation, considering segmentation, runtime, and quality in real-world conditions.

Findings

01

End-to-end framework enables realistic evaluation of latency and quality.

02

Comparison of models with output revision and fixed output.

03

State-of-the-art systems evaluated under consistent conditions.

Abstract

The challenge of low-latency speech translation has recently draw significant interest in the research community as shown by several publications and shared tasks. Therefore, it is essential to evaluate these different approaches in realistic scenarios. However, currently only specific aspects of the systems are evaluated and often it is not possible to compare different approaches. In this work, we propose the first framework to perform and evaluate the various aspects of low-latency speech translation under realistic conditions. The evaluation is carried out in an end-to-end fashion. This includes the segmentation of the audio as well as the run-time of the different components. Secondly, we compare different approaches to low-latency speech translation using this framework. We evaluate models with the option to revise the output as well as methods with fixed output. Furthermore,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems