Impact of Encoding and Segmentation Strategies on End-to-End Simultaneous Speech Translation
Ha Nguyen, Yannick Est\`eve, Laurent Besacier

TL;DR
This paper explores how encoding and segmentation strategies affect the performance and efficiency of end-to-end simultaneous speech translation, demonstrating that fixed-size block segmentation with ULSTM encoding improves online translation quality.
Contribution
It extends previous online decoding strategies by showing ULSTM benefits in online mode and identifies fixed-size block segmentation as optimal for English-German speech translation.
Findings
ULSTM improves online translation performance over BLSTM.
Fixed-size block segmentation yields best results.
Segmentation method significantly impacts translation quality.
Abstract
Boosted by the simultaneous translation shared task at IWSLT 2020, promising end-to-end online speech translation approaches were recently proposed. They consist in incrementally encoding a speech input (in a source language) and decoding the corresponding text (in a target language) with the best possible trade-off between latency and translation quality. This paper investigates two key aspects of end-to-end simultaneous speech translation: (a) how to encode efficiently the continuous speech flow, and (b) how to segment the speech flow in order to alternate optimally between reading (R: encoding input) and writing (W: decoding output) operations. We extend our previously proposed end-to-end online decoding strategy and show that while replacing BLSTM by ULSTM encoding degrades performance in offline mode, it actually improves both efficiency and performance in online mode. We also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
