Sample, Translate, Recombine: Leveraging Audio Alignments for Data   Augmentation in End-to-end Speech Translation

Tsz Kin Lam; Shigehiko Schamoni; Stefan Riezler

arXiv:2203.08757·cs.CL·June 12, 2023

Sample, Translate, Recombine: Leveraging Audio Alignments for Data Augmentation in End-to-end Speech Translation

Tsz Kin Lam, Shigehiko Schamoni, Stefan Riezler

PDF

TL;DR

This paper introduces a novel data augmentation method for end-to-end speech translation that uses audio alignments and translation to improve model performance without extensive fine-tuning.

Contribution

The proposed approach leverages audio alignments, linguistic properties, and translation for data augmentation, achieving consistent BLEU score improvements across multiple language pairs.

Findings

01

Up to 0.9 BLEU point improvement on CoVoST 2

02

Up to 1.1 BLEU point improvement on Europarl-ST

03

Method requires similar resources as knowledge distillation

Abstract

End-to-end speech translation relies on data that pair source-language speech inputs with corresponding translations into a target language. Such data are notoriously scarce, making synthetic data augmentation by back-translation or knowledge distillation a necessary ingredient of end-to-end training. In this paper, we present a novel approach to data augmentation that leverages audio alignments, linguistic properties, and translation. First, we augment a transcription by sampling from a suffix memory that stores text and audio data. Second, we translate the augmented transcript. Finally, we recombine concatenated audio segments and the generated translation. Besides training an MT-system, we only use basic off-the-shelf components without fine-tuning. While having similar resource demands as knowledge distillation, adding our method delivers consistent improvements of up to 0.9 and 1.1…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsKnowledge Distillation