Multilingual Simultaneous Speech Translation

Shashank Subramanya; Jan Niehues

arXiv:2203.14835·cs.CL·March 30, 2022·1 cites

Multilingual Simultaneous Speech Translation

Shashank Subramanya, Jan Niehues

PDF

Open Access

TL;DR

This paper explores multilingual end-to-end and cascade models for simultaneous speech translation, demonstrating reduced latency and effective adaptation across multiple languages, including zero-shot directions.

Contribution

It investigates adapting offline models for online multilingual speech translation, showing that end-to-end architectures maintain higher translation quality with latency improvements.

Findings

01

40% relative latency reduction across languages

02

End-to-end models have smaller quality losses after adaptation

03

Approach scales to zero-shot translation directions

Abstract

Applications designed for simultaneous speech translation during events such as conferences or meetings need to balance quality and lag while displaying translated text to deliver a good user experience. One common approach to building online spoken language translation systems is by leveraging models built for offline speech translation. Based on a technique to adapt end-to-end monolingual models, we investigate multilingual models and different architectures (end-to-end and cascade) on the ability to perform online speech translation. On the multilingual TEDx corpus, we show that the approach generalizes to different architectures. We see similar gains in latency reduction (40% relative) across languages and architectures. However, the end-to-end architecture leads to smaller translation quality losses after adapting to the online model. Furthermore, the approach even scales to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems