Impact of Encoding and Segmentation Strategies on End-to-End   Simultaneous Speech Translation

Ha Nguyen; Yannick Est\`eve; Laurent Besacier

arXiv:2104.14470·cs.CL·June 15, 2021

Impact of Encoding and Segmentation Strategies on End-to-End Simultaneous Speech Translation

Ha Nguyen, Yannick Est\`eve, Laurent Besacier

PDF

Open Access

TL;DR

This paper explores how encoding and segmentation strategies affect the performance and efficiency of end-to-end simultaneous speech translation, demonstrating that fixed-size block segmentation with ULSTM encoding improves online translation quality.

Contribution

It extends previous online decoding strategies by showing ULSTM benefits in online mode and identifies fixed-size block segmentation as optimal for English-German speech translation.

Findings

01

ULSTM improves online translation performance over BLSTM.

02

Fixed-size block segmentation yields best results.

03

Segmentation method significantly impacts translation quality.

Abstract

Boosted by the simultaneous translation shared task at IWSLT 2020, promising end-to-end online speech translation approaches were recently proposed. They consist in incrementally encoding a speech input (in a source language) and decoding the corresponding text (in a target language) with the best possible trade-off between latency and translation quality. This paper investigates two key aspects of end-to-end simultaneous speech translation: (a) how to encode efficiently the continuous speech flow, and (b) how to segment the speech flow in order to alternate optimally between reading (R: encoding input) and writing (W: decoding output) operations. We extend our previously proposed end-to-end online decoding strategy and show that while replacing BLSTM by ULSTM encoding degrades performance in offline mode, it actually improves both efficiency and performance in online mode. We also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling