Direct speech-to-speech translation with a sequence-to-sequence model

Ye Jia; Ron J. Weiss; Fadi Biadsy; Wolfgang Macherey; Melvin Johnson,; Zhifeng Chen; Yonghui Wu

arXiv:1904.06037·cs.CL·June 27, 2019·22 cites

Direct speech-to-speech translation with a sequence-to-sequence model

Ye Jia, Ron J. Weiss, Fadi Biadsy, Wolfgang Macherey, Melvin Johnson,, Zhifeng Chen, Yonghui Wu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces an end-to-end neural network that directly translates speech from one language to another without intermediate text, capable of preserving the speaker's voice in the translated speech.

Contribution

It presents a novel sequence-to-sequence model for direct speech-to-speech translation, bypassing traditional text-based pipelines.

Findings

01

Model can translate speech directly between languages.

02

Slightly underperforms compared to cascade models.

03

Demonstrates feasibility of end-to-end speech translation.

Abstract

We present an attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation. The network is trained end-to-end, learning to map speech spectrograms into target spectrograms in another language, corresponding to the translated content (in a different canonical voice). We further demonstrate the ability to synthesize translated speech using the voice of the source speaker. We conduct experiments on two Spanish-to-English speech translation datasets, and find that the proposed model slightly underperforms a baseline cascade of a direct speech-to-text translation model and a text-to-speech synthesis model, demonstrating the feasibility of the approach on this very challenging task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sam2125/translatotron
pytorch

Videos

All Hail The Mighty Translatotron!· youtube

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling