Data Efficient Direct Speech-to-Text Translation with Modality Agnostic   Meta-Learning

Sathish Indurthi; Houjeung Han; Nikhil Kumar Lakumarapu; Beomseok Lee,; Insoo Chung; Sangha Kim; Chanwoo Kim

arXiv:1911.04283·cs.CL·April 29, 2020·28 cites

Data Efficient Direct Speech-to-Text Translation with Modality Agnostic Meta-Learning

Sathish Indurthi, Houjeung Han, Nikhil Kumar Lakumarapu, Beomseok Lee,, Insoo Chung, Sangha Kim, Chanwoo Kim

PDF

Open Access

TL;DR

This paper introduces a modality agnostic meta-learning approach for end-to-end speech translation that leverages transfer learning from ASR and MT tasks, significantly improving translation quality especially in low-data scenarios.

Contribution

The paper proposes a novel meta-learning framework that trains a multi-task model to transfer knowledge from ASR and MT to speech translation, outperforming previous transfer learning methods.

Findings

01

Achieved state-of-the-art BLEU scores on En-De and En-Fr translation tasks.

02

Outperformed previous transfer learning approaches by large margins.

03

Demonstrated effectiveness in low-resource speech translation scenarios.

Abstract

End-to-end Speech Translation (ST) models have several advantages such as lower latency, smaller model size, and less error compounding over conventional pipelines that combine Automatic Speech Recognition (ASR) and text Machine Translation (MT) models. However, collecting large amounts of parallel data for ST task is more difficult compared to the ASR and MT tasks. Previous studies have proposed the use of transfer learning approaches to overcome the above difficulty. These approaches benefit from weakly supervised training data, such as ASR speech-to-transcript or MT text-to-text translation pairs. However, the parameters in these models are updated independently of each task, which may lead to sub-optimal solutions. In this work, we adopt a meta-learning algorithm to train a modality agnostic multi-task model that transfers knowledge from source tasks=ASR+MT to target task=ST where…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling