A High-Quality and Large-Scale Dataset for English-Vietnamese Speech   Translation

Linh The Nguyen; Nguyen Luong Tran; Long Doan; Manh Luong; Dat Quoc; Nguyen

arXiv:2208.04243·cs.CL·August 9, 2022

A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation

Linh The Nguyen, Nguyen Luong Tran, Long Doan, Manh Luong, Dat Quoc, Nguyen

PDF

Open Access 1 Repo

TL;DR

This paper introduces a large-scale, high-quality English-Vietnamese speech translation dataset and compares traditional and modern translation approaches, finding the traditional method still performs better.

Contribution

It provides the first large-scale English-Vietnamese speech translation dataset and offers empirical insights into approach performance.

Findings

01

Traditional cascaded approach outperforms end-to-end methods

02

The dataset contains 508 hours of speech and 331K triplets

03

First large-scale English-Vietnamese speech translation study

Abstract

In this paper, we introduce a high-quality and large-scale benchmark dataset for English-Vietnamese speech translation with 508 audio hours, consisting of 331K triplets of (sentence-lengthed audio, English source transcript sentence, Vietnamese target subtitle sentence). We also conduct empirical experiments using strong baselines and find that the traditional "Cascaded" approach still outperforms the modern "End-to-End" approach. To the best of our knowledge, this is the first large-scale English-Vietnamese speech translation study. We hope both our publicly available dataset and study can serve as a starting point for future research and applications on English-Vietnamese speech translation. Our dataset is available at https://github.com/VinAIResearch/PhoST

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vinairesearch/phost
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Multimodal Machine Learning Applications