Zero-shot Speech Translation

Tu Anh Dinh

arXiv:2107.06010·cs.CL·July 14, 2021

Zero-shot Speech Translation

Tu Anh Dinh

PDF

Open Access

TL;DR

This paper investigates zero-shot speech translation, enabling models trained only on ASR and MT tasks to translate speech between unseen language pairs, addressing data scarcity and error propagation issues.

Contribution

It introduces methods including additional training data and an auxiliary loss to improve zero-shot speech translation performance.

Findings

01

Achieved up to +11.8 BLEU points in zero-shot translation.

02

Significant improvements in few-shot settings with limited data.

03

Proved the feasibility of zero-shot speech translation without direct training data.

Abstract

Speech Translation (ST) is the task of translating speech in one language into text in another language. Traditional cascaded approaches for ST, using Automatic Speech Recognition (ASR) and Machine Translation (MT) systems, are prone to error propagation. End-to-end approaches use only one system to avoid propagating error, yet are difficult to employ due to data scarcity. We explore zero-shot translation, which enables translating a pair of languages that is unseen during training, thus avoid the use of end-to-end ST data. Zero-shot translation has been shown to work for multilingual machine translation, yet has not been studied for speech translation. We attempt to build zero-shot ST models that are trained only on ASR and MT tasks but can do ST task during inference. The challenge is that the representation of text and audio is significantly different, thus the models learn ASR and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis