Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for   Large-Scale Speech Generation

Haorui He; Zengqiang Shang; Chaoren Wang; Xuyuan Li; Yicheng Gu; Hua; Hua; Liwei Liu; Chen Yang; Jiaqi Li; Peiyang Shi; Yuancheng Wang; Kai Chen,; Pengyuan Zhang; Zhizheng Wu

arXiv:2407.05361·eess.AS·September 10, 2024·2 cites

Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation

Haorui He, Zengqiang Shang, Chaoren Wang, Xuyuan Li, Yicheng Gu, Hua, Hua, Liwei Liu, Chen Yang, Jiaqi Li, Peiyang Shi, Yuancheng Wang, Kai Chen,, Pengyuan Zhang, Zhizheng Wu

PDF

Open Access 1 Repo 5 Models 3 Datasets

TL;DR

Emilia is a large-scale, multilingual speech dataset with diverse speaking styles, designed to advance natural and spontaneous speech generation models, supported by an open-source preprocessing pipeline called Emilia-Pipe.

Contribution

The paper introduces Emilia, the first extensive multilingual speech dataset with diverse styles, and Emilia-Pipe, an open-source tool for efficient data preprocessing.

Findings

01

Emilia enables more natural speech generation models.

02

Emilia-Pipe improves data preprocessing efficiency.

03

Experimental results validate dataset and pipeline effectiveness.

Abstract

Recent advancements in speech generation models have been significantly driven by the use of large-scale training data. However, producing highly spontaneous, human-like speech remains a challenge due to the scarcity of large, diverse, and spontaneous speech datasets. In response, we introduce Emilia, the first large-scale, multilingual, and diverse speech generation dataset. Emilia starts with over 101k hours of speech across six languages, covering a wide range of speaking styles to enable more natural and spontaneous speech generation. To facilitate the scale-up of Emilia, we also present Emilia-Pipe, the first open-source preprocessing pipeline designed to efficiently transform raw, in-the-wild speech data into high-quality training data with speech annotations. Experimental results demonstrate the effectiveness of both Emilia and Emilia-Pipe. Demos are available at:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

open-mmlab/Amphion/blob/main/preprocessors/Emilia/README.md
pytorchOfficial

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Topic Modeling