MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset
Kailin Liang, Bin Liu, Yifan Hu, Rui Liu, Feilong Bao, Guanglai Gao

TL;DR
This paper introduces MnTTS2, an open-source multi-speaker Mongolian TTS dataset, along with baseline models, to support research in low-resource language speech synthesis and demonstrate its effectiveness for real-world applications.
Contribution
The creation and release of MnTTS2, a comprehensive multi-speaker Mongolian TTS dataset, with baseline models based on FastSpeech2 and HiFi-GAN for low-resource language synthesis.
Findings
MnTTS2 dataset enables robust multi-speaker TTS for Mongolian.
Baseline models achieve promising synthesis quality.
Open-source resources facilitate future research in low-resource TTS.
Abstract
Text-to-Speech (TTS) synthesis for low-resource languages is an attractive research issue in academia and industry nowadays. Mongolian is the official language of the Inner Mongolia Autonomous Region and a representative low-resource language spoken by over 10 million people worldwide. However, there is a relative lack of open-source datasets for Mongolian TTS. Therefore, we make public an open-source multi-speaker Mongolian TTS dataset, named MnTTS2, for the benefit of related researchers. In this work, we prepare the transcription from various topics and invite three professional Mongolian announcers to form a three-speaker TTS dataset, in which each announcer records 10 hours of speeches in Mongolian, resulting 30 hours in total. Furthermore, we build the baseline system based on the state-of-the-art FastSpeech2 model and HiFi-GAN vocoder. The experimental results suggest that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling
MethodsHiFi-GAN
