JVS corpus: free Japanese multi-speaker voice corpus
Shinnosuke Takamichi, Kentaro Mitsui, Yuki Saito, Tomoki Koriyama,, Naoko Tanji, Hiroshi Saruwatari

TL;DR
This paper introduces the JVS corpus, a comprehensive Japanese multi-speaker voice dataset with 30 hours of diverse speech data from 100 speakers, aimed at advancing speech synthesis research.
Contribution
The paper presents the design and specifications of the new JVS corpus, expanding resources for multi-speaker and style-varied speech synthesis research.
Findings
Contains 30 hours of voice data from 100 speakers
Includes three speech styles: normal, whisper, and falsetto
Provides 22 hours of parallel normal voice data
Abstract
Thanks to improvements in machine learning techniques, including deep learning, speech synthesis is becoming a machine learning task. To accelerate speech synthesis research, we are developing Japanese voice corpora reasonably accessible from not only academic institutions but also commercial companies. In 2017, we released the JSUT corpus, which contains 10 hours of reading-style speech uttered by a single speaker, for end-to-end text-to-speech synthesis. For more general use in speech synthesis research, e.g., voice conversion and multi-speaker modeling, in this paper, we construct the JVS corpus, which contains voice data of 100 speakers in three styles (normal, whisper, and falsetto). The corpus contains 30 hours of voice data including 22 hours of parallel normal voices. This paper describes how we designed the corpus and summarizes the specifications. The corpus is available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Phonetics and Phonology Research
