CoVoST 2 and Massively Multilingual Speech-to-Text Translation
Changhan Wang, Anne Wu, Juan Pino

TL;DR
This paper introduces CoVoST 2, a large-scale multilingual speech translation dataset covering 21 languages into English and 15 from English, enabling research in low-resource and multilingual speech translation.
Contribution
The paper releases the largest open multilingual speech translation dataset to date, with quality checks and baseline models for speech recognition and translation tasks.
Findings
CoVoST 2 covers 21 languages into English and 15 from English.
The dataset is the largest in volume and language coverage.
Baseline models demonstrate the dataset's utility for speech translation research.
Abstract
Speech translation has recently become an increasingly popular topic of research, partly due to the development of benchmark datasets. Nevertheless, current datasets cover a limited number of languages. With the aim to foster research in massive multilingual speech translation and speech translation for low resource language pairs, we release CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. This represents the largest open dataset available to date from total volume and language coverage perspective. Data sanity checks provide evidence about the quality of the data, which is released under CC0 license. We also provide extensive speech recognition, bilingual and multilingual machine translation and speech translation baselines with open-source implementation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗nvidia/parakeet-tdt-0.6b-v3model· 254k dl· ♡ 747254k dl♡ 747
- 🤗nvidia/canary-1b-v2model· 123k dl· ♡ 371123k dl♡ 371
- 🤗SoSolaris/parakeet-tdt-0.6b-v3model· 7 dl7 dl
- 🤗ManuelZnnmc/parakeet-tdt-0.6b-v3model· 1 dl1 dl
- 🤗MadnessOverflow/parakeet-tdt-0.6b-v3-bpe-vocabmodel
- 🤗Endy2001/parakeet-tdt-0.6b-v3model· 3 dl3 dl
- 🤗everyscribe/parakeet-tdt-0.6b-v3model· 9 dl9 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
