Multilingual End-to-End Speech Translation
Hirofumi Inaguma, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe

TL;DR
This paper introduces a universal multilingual end-to-end speech translation framework that directly translates speech in source languages to target languages, outperforming bilingual models and demonstrating transfer learning benefits for low-resource languages.
Contribution
It is the first application of multilingual models to end-to-end speech translation, showing significant improvements over bilingual models in multiple translation scenarios.
Findings
Multilingual models outperform bilingual models in speech translation.
The approach generalizes well to low-resource language pairs.
Code and data are publicly available for further research.
Abstract
In this paper, we propose a simple yet effective framework for multilingual end-to-end speech translation (ST), in which speech utterances in source languages are directly translated to the desired target languages with a universal sequence-to-sequence architecture. While multilingual models have shown to be useful for automatic speech recognition (ASR) and machine translation (MT), this is the first time they are applied to the end-to-end ST problem. We show the effectiveness of multilingual end-to-end ST in two scenarios: one-to-many and many-to-many translations with publicly available data. We experimentally confirm that multilingual end-to-end ST models significantly outperform bilingual ones in both scenarios. The generalization of multilingual training is also evaluated in a transfer learning scenario to a very low-resource language pair. All of our codes and the database are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
