CVSS Corpus and Massively Multilingual Speech-to-Speech Translation
Ye Jia, Michelle Tadmor Ramanovich, Quan Wang, Heiga Zen

TL;DR
This paper introduces CVSS, a large multilingual speech-to-speech translation corpus derived from existing speech and translation datasets, enabling improved S2ST model training and benchmarking.
Contribution
The creation of CVSS, a multilingual S2ST corpus with canonical and voice-transferred speech versions, and baseline models demonstrating its effectiveness.
Findings
Baseline models trained on CVSS outperform previous state-of-the-art.
Direct S2ST models approach cascade model performance when trained from scratch.
The corpus facilitates effective multilingual speech translation research.
Abstract
We introduce CVSS, a massively multilingual-to-English speech-to-speech translation (S2ST) corpus, covering sentence-level parallel S2ST pairs from 21 languages into English. CVSS is derived from the Common Voice speech corpus and the CoVoST 2 speech-to-text translation (ST) corpus, by synthesizing the translation text from CoVoST 2 into speech using state-of-the-art TTS systems. Two versions of translation speeches are provided: 1) CVSS-C: All the translation speeches are in a single high-quality canonical voice; 2) CVSS-T: The translation speeches are in voices transferred from the corresponding source speeches. In addition, CVSS provides normalized translation text which matches the pronunciation in the translation speech. On each version of CVSS, we built baseline multilingual direct S2ST models and cascade S2ST models, verifying the effectiveness of the corpus. To build strong…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling
