IMS' Systems for the IWSLT 2021 Low-Resource Speech Translation Task
Pavel Denisov, Manuel Mager, Ngoc Thang Vu

TL;DR
This paper presents IMS's systems for low-resource speech translation in IWSLT 2021, combining advanced models with data augmentation, multi-task, and transfer learning, achieving top results in Swahili translation tasks.
Contribution
The paper introduces a combination of state-of-the-art models and novel approaches for low-resource speech translation, including exploring end-to-end models with limited data.
Findings
Achieved top BLEU scores for Congolese Swahili-English and French translation.
Demonstrated the effectiveness of data augmentation and transfer learning in low-resource settings.
Explored the feasibility of end-to-end speech translation with limited labeled data.
Abstract
This paper describes the submission to the IWSLT 2021 Low-Resource Speech Translation Shared Task by IMS team. We utilize state-of-the-art models combined with several data augmentation, multi-task and transfer learning approaches for the automatic speech recognition (ASR) and machine translation (MT) steps of our cascaded system. Moreover, we also explore the feasibility of a full end-to-end speech translation (ST) model in the case of very constrained amount of ground truth labeled data. Our best system achieves the best performance among all submitted systems for Congolese Swahili to English and French with BLEU scores 7.7 and 13.7 respectively, and the second best result for Coastal Swahili to English with BLEU score 14.9.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
