Towards Unsupervised Speech-to-Text Translation

Yu-An Chung; Wei-Hung Weng; Schrasing Tong; James Glass

arXiv:1811.01307·cs.CL·November 6, 2018·5 cites

Towards Unsupervised Speech-to-Text Translation

Yu-An Chung, Wei-Hung Weng, Schrasing Tong, James Glass

PDF

Open Access

TL;DR

This paper introduces an unsupervised speech-to-text translation framework that leverages monolingual speech and text data, enabling translation without any labeled or parallel corpora, and achieves results comparable to supervised models.

Contribution

The work presents a novel unsupervised approach for speech-to-text translation that does not require any labeled data, using cross-modal dictionaries and language models to enable translation.

Findings

01

Achieves BLEU scores comparable to supervised models

02

Effective in low-resource language pairs

03

Component ablation shows importance of each module

Abstract

We present a framework for building speech-to-text translation (ST) systems using only monolingual speech and text corpora, in other words, speech utterances from a source language and independent text from a target language. As opposed to traditional cascaded systems and end-to-end architectures, our system does not require any labeled data (i.e., transcribed source audio or parallel source and target text corpora) during training, making it especially applicable to language pairs with very few or even zero bilingual resources. The framework initializes the ST system with a cross-modal bilingual dictionary inferred from the monolingual corpora, that maps every source speech segment corresponding to a spoken word to its target text translation. For unseen source speech utterances, the system first performs word-by-word translation on each speech segment in the utterance. The translation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis

MethodsDenoising Autoencoder · Solana Customer Service Number +1-833-534-1729