An Empirical Study of End-to-end Simultaneous Speech Translation Decoding Strategies
Ha Nguyen, Yannick Est\`eve, Laurent Besacier

TL;DR
This paper empirically investigates decoding strategies for end-to-end simultaneous speech translation, exploring token granularities and latency trade-offs, achieving competitive results with cascade models on IWSLT 2020.
Contribution
It introduces a decoding approach for end-to-end speech translation that effectively balances translation quality and latency across different language pairs and token granularities.
Findings
Decoding strategy controls BLEU/Average Lagging trade-off.
Character and BPE tokenizations impact latency and translation quality.
Achieves results comparable to cascade models on IWSLT 2020.
Abstract
This paper proposes a decoding strategy for end-to-end simultaneous speech translation. We leverage end-to-end models trained in offline mode and conduct an empirical study for two language pairs (English-to-German and English-to-Portuguese). We also investigate different output token granularities including characters and Byte Pair Encoding (BPE) units. The results show that the proposed decoding approach allows to control BLEU/Average Lagging trade-off along different latency regimes. Our best decoding settings achieve comparable results with a strong cascade model evaluated on the simultaneous translation track of IWSLT 2020 shared task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
