Simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems
Marco Gaido, Sara Papi, Mauro Cettolo, Matteo Negri, Luisa Bentivogli

TL;DR
Simulstream is an open-source toolkit that enables comprehensive evaluation and live demonstration of streaming speech-to-text translation systems, supporting long-form audio, incremental decoding, re-translation, and interactive demos.
Contribution
It introduces the first unified framework for evaluating and demonstrating StreamST systems, supporting long-form audio, re-translation, and interactive web demos.
Findings
Supports long-form speech processing and re-translation methods.
Enables comparison of quality and latency across different systems.
Provides an interactive web interface for system demonstration.
Abstract
Streaming Speech-to-Text Translation (StreamST) requires producing translations concurrently with incoming speech, imposing strict latency constraints and demanding models that balance partial-information decision-making with high translation quality. Research efforts on the topic have so far relied on the SimulEval repository, which is no longer maintained and does not support systems that revise their outputs. In addition, it has been designed for simulating the processing of short segments, rather than long-form audio streams, and it does not provide an easy method to showcase systems in a demo. As a solution, we introduce simulstream, the first open-source framework dedicated to unified evaluation and demonstration of StreamST systems. Designed for long-form speech processing, it supports not only incremental decoding approaches, but also re-translation methods, enabling for their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems
