The USYD-JD Speech Translation System for IWSLT 2021
Liang Ding, Di Wu, Dacheng Tao

TL;DR
This paper presents a speech translation system for Swahili-English that combines ASR and NMT, employing advanced training strategies and novel pre-training methods to achieve state-of-the-art BLEU scores in low-resource settings.
Contribution
The paper introduces two novel pre-training approaches, de-noising and bidirectional training, and demonstrates their effectiveness in improving speech translation performance.
Findings
Achieved the best scareBLEU score of 25.3 among participants.
Proposed effective strategies like back translation, knowledge distillation, and transductive finetuning.
Final system outperformed baseline by approximately 10.8 BLEU points, setting a new state-of-the-art.
Abstract
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task. We participated in the Swahili-English direction and got the best scareBLEU (25.3) score among all the participants. Our constrained system is based on a pipeline framework, i.e. ASR and NMT. We trained our models with the officially provided ASR and MT datasets. The ASR system is based on the open-sourced tool Kaldi and this work mainly explores how to make the most of the NMT models. To reduce the punctuation errors generated by the ASR model, we employ our previous work SlotRefine to train a punctuation correction model. To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning. For model structure, we tried auto-regressive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
