Generating Synthetic Speech from SpokenVocab for Speech Translation

Jinming Zhao; Gholamreza Haffar; Ehsan Shareghi

arXiv:2210.08174·cs.CL·February 9, 2023

Generating Synthetic Speech from SpokenVocab for Speech Translation

Jinming Zhao, Gholamreza Haffar, Ehsan Shareghi

PDF

Open Access 1 Repo

TL;DR

This paper introduces SpokenVocab, a scalable data augmentation method that converts machine translation data into speech translation data by stitching audio snippets, improving translation quality without relying on slow TTS systems.

Contribution

The paper presents SpokenVocab, a novel on-the-fly data augmentation technique for speech translation that outperforms baselines and matches TTS-based methods, especially useful for low-resource languages.

Findings

01

SpokenVocab outperforms strong baselines by 1.83 BLEU on average.

02

It performs as well as TTS-generated speech in experiments.

03

Effective for code-switching speech translation where TTS is unavailable.

Abstract

Training end-to-end speech translation (ST) systems requires sufficiently large-scale data, which is unavailable for most language pairs and domains. One practical solution to the data scarcity issue is to convert machine translation data (MT) to ST data via text-to-speech (TTS) systems. Yet, using TTS systems can be tedious and slow, as the conversion needs to be done for each MT dataset. In this work, we propose a simple, scalable and effective data augmentation technique, i.e., SpokenVocab, to convert MT data to ST data on-the-fly. The idea is to retrieve and stitch audio snippets from a SpokenVocab bank according to words in an MT sequence. Our experiments on multiple language pairs from Must-C show that this method outperforms strong baselines by an average of 1.83 BLEU scores, and it performs equally well as TTS-generated speech. We also showcase how SpokenVocab can be applied in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mingzi151/spokenvocab
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Speech and dialogue systems