SegAugment: Maximizing the Utility of Speech Translation Data with   Segmentation-based Augmentations

Ioannis Tsiamas; Jos\'e A. R. Fonollosa; Marta R. Costa-juss\`a

arXiv:2212.09699·cs.CL·November 2, 2023·1 cites

SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based Augmentations

Ioannis Tsiamas, Jos\'e A. R. Fonollosa, Marta R. Costa-juss\`a

PDF

Open Access 1 Repo

TL;DR

SegAugment introduces a segmentation-based data augmentation method for speech translation that generates multiple sentence-level variants, improving translation quality across multiple languages and closing the gap between manual and automatic segmentation.

Contribution

The paper presents a novel segmentation-based augmentation strategy, SegAugment, which enhances speech translation datasets by creating diverse sentence-level versions, leading to improved performance.

Findings

01

Consistent BLEU score improvements across eight language pairs.

02

Up to 5 BLEU points gain in low-resource scenarios.

03

State-of-the-art results on MuST-C dataset.

Abstract

End-to-end Speech Translation is hindered by a lack of available data resources. While most of them are based on documents, a sentence-level version is available, which is however single and static, potentially impeding the usefulness of the data. We propose a new data augmentation strategy, SegAugment, to address this issue by generating multiple alternative sentence-level versions of a dataset. Our method utilizes an Audio Segmentation system, which re-segments the speech of each document with different length constraints, after which we obtain the target text via alignment methods. Experiments demonstrate consistent gains across eight language pairs in MuST-C, with an average increase of 2.5 BLEU points, and up to 5 BLEU for low-resource scenarios in mTEDx. Furthermore, when combined with a strong system, SegAugment establishes new state-of-the-art results in MuST-C. Finally, we show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mt-upc/SegAugment
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Music and Audio Processing