The Volctrans Neural Speech Translation System for IWSLT 2021
Chengqi Zhao, Zhicheng Liu, Jian Tong, Tao Wang, Mingxuan, Wang, Rong Ye, Qianqian Dong, Jun Cao, Lei Li

TL;DR
This paper presents the Volctrans team's speech translation systems for IWSLT 2021, achieving significant BLEU improvements in offline and simultaneous translation tasks, and sharing code for future research and applications.
Contribution
The paper introduces optimized end-to-end speech translation models and best practices for wait-k simultaneous translation, surpassing benchmarks and approaching cascade system performance.
Findings
Offline speech translation BLEU improved by 8.1 over benchmark.
Simultaneous translation BLEU exceeded benchmark by 7 at same latency.
Systems approach approaches strong cascade solution performance.
Abstract
This paper describes the systems submitted to IWSLT 2021 by the Volctrans team. We participate in the offline speech translation and text-to-text simultaneous translation tracks. For offline speech translation, our best end-to-end model achieves 8.1 BLEU improvements over the benchmark on the MuST-C test set and is even approaching the results of a strong cascade solution. For text-to-text simultaneous translation, we explore the best practice to optimize the wait-k model. As a result, our final submitted systems exceed the benchmark at around 7 BLEU on the same latency regime. We will publish our code and model to facilitate both future research works and industrial applications. This paper describes the systems submitted to IWSLT 2021 by the Volctrans team. We participate in the offline speech translation and text-to-text simultaneous translation tracks. For offline speech…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
