The USYD-JD Speech Translation System for IWSLT 2021

Liang Ding; Di Wu; Dacheng Tao

arXiv:2107.11572·cs.CL·July 27, 2021·1 cites

The USYD-JD Speech Translation System for IWSLT 2021

Liang Ding, Di Wu, Dacheng Tao

PDF

Open Access

TL;DR

This paper presents a speech translation system for Swahili-English that combines ASR and NMT, employing advanced training strategies and novel pre-training methods to achieve state-of-the-art BLEU scores in low-resource settings.

Contribution

The paper introduces two novel pre-training approaches, de-noising and bidirectional training, and demonstrates their effectiveness in improving speech translation performance.

Findings

01

Achieved the best scareBLEU score of 25.3 among participants.

02

Proposed effective strategies like back translation, knowledge distillation, and transductive finetuning.

03

Final system outperformed baseline by approximately 10.8 BLEU points, setting a new state-of-the-art.

Abstract

This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task. We participated in the Swahili-English direction and got the best scareBLEU (25.3) score among all the participants. Our constrained system is based on a pipeline framework, i.e. ASR and NMT. We trained our models with the officially provided ASR and MT datasets. The ASR system is based on the open-sourced tool Kaldi and this work mainly explores how to make the most of the NMT models. To reduce the punctuation errors generated by the ASR model, we employ our previous work SlotRefine to train a punctuation correction model. To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning. For model structure, we tried auto-regressive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications