An Empirical Study of End-to-end Simultaneous Speech Translation   Decoding Strategies

Ha Nguyen; Yannick Est\`eve; Laurent Besacier

arXiv:2103.03233·cs.CL·March 5, 2021·1 cites

An Empirical Study of End-to-end Simultaneous Speech Translation Decoding Strategies

Ha Nguyen, Yannick Est\`eve, Laurent Besacier

PDF

Open Access

TL;DR

This paper empirically investigates decoding strategies for end-to-end simultaneous speech translation, exploring token granularities and latency trade-offs, achieving competitive results with cascade models on IWSLT 2020.

Contribution

It introduces a decoding approach for end-to-end speech translation that effectively balances translation quality and latency across different language pairs and token granularities.

Findings

01

Decoding strategy controls BLEU/Average Lagging trade-off.

02

Character and BPE tokenizations impact latency and translation quality.

03

Achieves results comparable to cascade models on IWSLT 2020.

Abstract

This paper proposes a decoding strategy for end-to-end simultaneous speech translation. We leverage end-to-end models trained in offline mode and conduct an empirical study for two language pairs (English-to-German and English-to-Portuguese). We also investigate different output token granularities including characters and Byte Pair Encoding (BPE) units. The results show that the proposed decoding approach allows to control BLEU/Average Lagging trade-off along different latency regimes. Our best decoding settings achieve comparable results with a strong cascade model evaluated on the simultaneous translation track of IWSLT 2020 shared task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling