Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized   Streaming ASR

Junkun Chen; Mingbo Ma; Renjie Zheng; Liang Huang

arXiv:2106.06636·cs.CL·June 15, 2021

Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR

Junkun Chen, Mingbo Ma, Renjie Zheng, Liang Huang

PDF

Open Access

TL;DR

This paper introduces a synchronized dual-decoder approach for simultaneous speech-to-text translation, combining the benefits of cascaded and end-to-end methods to improve translation quality with low latency.

Contribution

It proposes a novel synchronized decoding paradigm with multitask training, enhancing translation accuracy while maintaining low latency in real-time speech translation.

Findings

01

Achieves better translation quality than traditional methods.

02

Maintains similar latency levels to existing approaches.

03

Demonstrates effectiveness on MuSTC dataset for En-De and En-Es.

Abstract

Simultaneous speech-to-text translation is widely useful in many scenarios. The conventional cascaded approach uses a pipeline of streaming ASR followed by simultaneous MT, but suffers from error propagation and extra latency. To alleviate these issues, recent efforts attempt to directly translate the source speech into target text simultaneously, but this is much harder due to the combination of two separate tasks. We instead propose a new paradigm with the advantages of both cascaded and end-to-end approaches. The key idea is to use two separate, but synchronized, decoders on streaming ASR and direct speech-to-text translation (ST), respectively, and the intermediate results of ASR guide the decoding policy of (but is not fed as input to) ST. During training time, we use multitask learning to jointly learn these two tasks with a shared encoder. En-to-De and En-to-Es experiments on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling