Leveraging Synthetic Audio Data for End-to-End Low-Resource Speech   Translation

Yasmin Moslem

arXiv:2406.17363·cs.CL·June 28, 2024

Leveraging Synthetic Audio Data for End-to-End Low-Resource Speech Translation

Yasmin Moslem

PDF

Open Access 4 Datasets

TL;DR

This paper explores the use of synthetic audio data and augmentation techniques to improve end-to-end Irish-to-English speech translation systems based on Whisper, demonstrating how data diversity enhances translation performance.

Contribution

It introduces novel data augmentation strategies, including speech back-translation and noise augmentation, for low-resource speech translation systems.

Findings

01

Synthetic data improves translation accuracy

02

Data augmentation enhances signal diversity

03

End-to-end models outperform traditional pipelines

Abstract

This paper describes our system submission to the International Conference on Spoken Language Translation (IWSLT 2024) for Irish-to-English speech translation. We built end-to-end systems based on Whisper, and employed a number of data augmentation techniques, such as speech back-translation and noise augmentation. We investigate the effect of using synthetic audio data and discuss several methods for enriching signal diversity.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques