Multilingual Speech Translation with Unified Transformer: Huawei Noah's   Ark Lab at IWSLT 2021

Xingshan Zeng; Liangyou Li; Qun Liu

arXiv:2106.00197·cs.CL·June 23, 2021

Multilingual Speech Translation with Unified Transformer: Huawei Noah's Ark Lab at IWSLT 2021

Xingshan Zeng, Liangyou Li, Qun Liu

PDF

Open Access

TL;DR

This paper presents a unified transformer model for multilingual speech translation that leverages multi-task learning and data augmentation to improve performance across multiple languages and tasks, including speech recognition, translation, and speech translation.

Contribution

The paper introduces a unified transformer architecture that processes speech and text inputs jointly for multilingual tasks, enhancing performance through multi-task training and data augmentation techniques.

Findings

01

Outperforms bilingual baselines on supervised language pairs

02

Achieves reasonable results on zero-shot language pairs

03

Effective use of multi-task learning and data augmentation

Abstract

This paper describes the system submitted to the IWSLT 2021 Multilingual Speech Translation (MultiST) task from Huawei Noah's Ark Lab. We use a unified transformer architecture for our MultiST model, so that the data from different modalities (i.e., speech and text) and different tasks (i.e., Speech Recognition, Machine Translation, and Speech Translation) can be exploited to enhance the model's ability. Specifically, speech and text inputs are firstly fed to different feature extractors to extract acoustic and textual features, respectively. Then, these features are processed by a shared encoder--decoder architecture. We apply several training techniques to improve the performance, including multi-task learning, task-level curriculum learning, data augmentation, etc. Our final system achieves significantly better results than bilingual baselines on supervised language pairs and yields…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing