Fluent Translations from Disfluent Speech in End-to-End Speech   Translation

Elizabeth Salesky; Matthias Sperber; Alex Waibel

arXiv:1906.00556·cs.CL·June 4, 2019·1 cites

Fluent Translations from Disfluent Speech in End-to-End Speech Translation

Elizabeth Salesky, Matthias Sperber, Alex Waibel

PDF

Open Access

TL;DR

This paper introduces a sequence-to-sequence model that directly translates disfluent speech into fluent text, effectively removing disfluencies during translation, which is a new approach in speech translation tasks.

Contribution

It presents a novel end-to-end model for translating disfluent speech into fluent text, integrating disfluency removal into the translation process.

Findings

01

Model successfully generates fluent translations from noisy speech.

02

Provides a new benchmark for translating conversational speech with disfluency removal.

03

Establishes a baseline for future research in joint disfluency removal and translation.

Abstract

Spoken language translation applications for speech suffer due to conversational speech phenomena, particularly the presence of disfluencies. With the rise of end-to-end speech translation models, processing steps such as disfluency removal that were previously an intermediate step between speech recognition and machine translation need to be incorporated into model architectures. We use a sequence-to-sequence model to translate from noisy, disfluent speech to fluent text with disfluencies removed using the recently collected `copy-edited' references for the Fisher Spanish-English dataset. We are able to directly generate fluent translations and introduce considerations about how to evaluate success on this task. This work provides a baseline for a new task, the translation of conversational speech with joint removal of disfluencies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques