Multilingual Simultaneous Speech Translation
Shashank Subramanya, Jan Niehues

TL;DR
This paper explores multilingual end-to-end and cascade models for simultaneous speech translation, demonstrating reduced latency and effective adaptation across multiple languages, including zero-shot directions.
Contribution
It investigates adapting offline models for online multilingual speech translation, showing that end-to-end architectures maintain higher translation quality with latency improvements.
Findings
40% relative latency reduction across languages
End-to-end models have smaller quality losses after adaptation
Approach scales to zero-shot translation directions
Abstract
Applications designed for simultaneous speech translation during events such as conferences or meetings need to balance quality and lag while displaying translated text to deliver a good user experience. One common approach to building online spoken language translation systems is by leveraging models built for offline speech translation. Based on a technique to adapt end-to-end monolingual models, we investigate multilingual models and different architectures (end-to-end and cascade) on the ability to perform online speech translation. On the multilingual TEDx corpus, we show that the approach generalizes to different architectures. We see similar gains in latency reduction (40% relative) across languages and architectures. However, the end-to-end architecture leads to smaller translation quality losses after adapting to the online model. Furthermore, the approach even scales to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
