Visualization: the missing factor in Simultaneous Speech Translation
Sara Papi, Matteo Negri, Marco Turchi

TL;DR
This paper emphasizes the importance of visualization in Simultaneous Speech Translation systems, analyzing current strengths and weaknesses, and advocating for broader evaluation metrics including user experience and visualization strategies.
Contribution
It provides a comprehensive analysis of existing SimulST systems and highlights the need for incorporating visualization and user-centered evaluation metrics.
Findings
Current SimulST systems lack comprehensive visualization strategies.
Evaluation frameworks should include user experience and visualization effectiveness.
The community has achieved certain goals but still needs to address visualization and broader metrics.
Abstract
Simultaneous speech translation (SimulST) is the task in which output generation has to be performed on partial, incremental speech input. In recent years, SimulST has become popular due to the spread of cross-lingual application scenarios, like international live conferences and streaming lectures, in which on-the-fly speech translation can facilitate users' access to audio-visual content. In this paper, we analyze the characteristics of the SimulST systems developed so far, discussing their strengths and weaknesses. We then concentrate on the evaluation framework required to properly assess systems' effectiveness. To this end, we raise the need for a broader performance analysis, also including the user experience standpoint. SimulST systems, indeed, should be evaluated not only in terms of quality/latency measures, but also via task-oriented metrics accounting, for instance, for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Subtitles and Audiovisual Media
