Multilingual End-to-End Speech Translation

Hirofumi Inaguma; Kevin Duh; Tatsuya Kawahara; Shinji Watanabe

arXiv:1910.00254·cs.CL·November 1, 2019·5 cites

Multilingual End-to-End Speech Translation

Hirofumi Inaguma, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe

PDF

Open Access 1 Repo

TL;DR

This paper introduces a universal multilingual end-to-end speech translation framework that directly translates speech in source languages to target languages, outperforming bilingual models and demonstrating transfer learning benefits for low-resource languages.

Contribution

It is the first application of multilingual models to end-to-end speech translation, showing significant improvements over bilingual models in multiple translation scenarios.

Findings

01

Multilingual models outperform bilingual models in speech translation.

02

The approach generalizes well to low-resource language pairs.

03

Code and data are publicly available for further research.

Abstract

In this paper, we propose a simple yet effective framework for multilingual end-to-end speech translation (ST), in which speech utterances in source languages are directly translated to the desired target languages with a universal sequence-to-sequence architecture. While multilingual models have shown to be useful for automatic speech recognition (ASR) and machine translation (MT), this is the first time they are applied to the end-to-end ST problem. We show the effectiveness of multilingual end-to-end ST in two scenarios: one-to-many and many-to-many translations with publicly available data. We experimentally confirm that multilingual end-to-end ST models significantly outperform bilingual ones in both scenarios. The generalization of multilingual training is also evaluated in a transfer learning scenario to a very low-resource language pair. All of our codes and the database are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

espnet/espnet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis