End-to-End Spoken Language Translation

Michelle Guo; Albert Haque; Prateek Verma

arXiv:1904.10760·cs.CL·April 25, 2019·5 cites

End-to-End Spoken Language Translation

Michelle Guo, Albert Haque, Prateek Verma

PDF

Open Access

TL;DR

This paper introduces an end-to-end model for spoken language translation that directly converts speech in one language to speech in another, trained from scratch and capable of generalizing to unseen speakers.

Contribution

The proposed model combines pyramidal-bidirectional RNNs with convolutional networks for direct speech-to-speech translation, enabling training from scratch and speaker generalization.

Findings

01

Achieves competitive performance with state-of-the-art methods

02

Can be trained completely from scratch

03

Generalizes well to unseen speakers

Abstract

In this paper, we address the task of spoken language understanding. We present a method for translating spoken sentences from one language into spoken sentences in another language. Given spectrogram-spectrogram pairs, our model can be trained completely from scratch to translate unseen sentences. Our method consists of a pyramidal-bidirectional recurrent network combined with a convolutional network to output sentence-level spectrograms in the target language. Empirically, our model achieves competitive performance with state-of-the-art methods on multiple languages and can generalize to unseen speakers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications