Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text   Translation

Alexandre Berard; Olivier Pietquin; Christophe Servan and; Laurent Besacier

arXiv:1612.01744·cs.CL·December 7, 2016·117 cites

Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text Translation

Alexandre Berard, Olivier Pietquin, Christophe Servan and, Laurent Besacier

PDF

Open Access 1 Repo

TL;DR

This paper introduces an end-to-end speech-to-text translation model that bypasses source transcription, promising to simplify data collection especially for under-resourced languages, demonstrated on a small French-English dataset.

Contribution

It presents the first end-to-end speech-to-text translation model that does not rely on source language transcription during training or decoding.

Findings

01

Promising results on a French-English synthetic corpus.

02

Potential to simplify data collection for unwritten languages.

03

Demonstrates feasibility of direct speech-to-text translation.

Abstract

This paper proposes a first attempt to build an end-to-end speech-to-text translation system, which does not use source language transcription during learning or decoding. We propose a model for direct speech-to-text translation, which gives promising results on a small French-English synthetic corpus. Relaxing the need for source language transcription would drastically change the data collection methodology in speech translation, especially in under-resourced scenarios. For instance, in the former project DARPA TRANSTAC (speech translation from spoken Arabic dialects), a large effort was devoted to the collection of speech transcripts (and a prerequisite to obtain transcripts was often a detailed transcription guide for languages with little standardized spelling). Now, if end-to-end approaches for speech-to-text translation are successful, one might consider collecting data by asking…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

eske/seq2seq
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Speech and dialogue systems