How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation   System?

Sara Papi; Peter Polak; Ond\v{r}ej Bojar; Dominik Mach\'a\v{c}ek

arXiv:2412.18495·cs.CL·December 25, 2024

How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?

Sara Papi, Peter Polak, Ond\v{r}ej Bojar, Dominik Mach\'a\v{c}ek

PDF

Open Access 1 Video

TL;DR

This paper critically reviews the current state of real-time speech-to-text translation, highlighting gaps between research and real-world applications, and proposes standardized terminology, analysis, and future directions to improve the field.

Contribution

It introduces a standardized taxonomy for SimulST, analyzes community trends, and provides concrete recommendations to enhance research relevance and practical deployment.

Findings

01

Identified inconsistencies and limitations in current SimulST research

02

Proposed a standardized terminology and taxonomy for the field

03

Offered actionable recommendations for future research and system development

Abstract

Simultaneous speech-to-text translation (SimulST) translates source-language speech into target-language text concurrently with the speaker's speech, ensuring low latency for better user comprehension. Despite its intended application to unbounded speech, most research has focused on human pre-segmented speech, simplifying the task and overlooking significant challenges. This narrow focus, coupled with widespread terminological inconsistencies, is limiting the applicability of research outcomes to real-world applications, ultimately hindering progress in the field. Our extensive literature review of 110 papers not only reveals these critical issues in current research but also serves as the foundation for our key contributions. We 1) define the steps and core components of a SimulST system, proposing a standardized terminology and taxonomy; 2) conduct a thorough analysis of community…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?· underline

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems