JVS corpus: free Japanese multi-speaker voice corpus

Shinnosuke Takamichi; Kentaro Mitsui; Yuki Saito; Tomoki Koriyama,; Naoko Tanji; Hiroshi Saruwatari

arXiv:1908.06248·cs.SD·August 20, 2019·41 cites

JVS corpus: free Japanese multi-speaker voice corpus

Shinnosuke Takamichi, Kentaro Mitsui, Yuki Saito, Tomoki Koriyama,, Naoko Tanji, Hiroshi Saruwatari

PDF

Open Access 1 Datasets

TL;DR

This paper introduces the JVS corpus, a comprehensive Japanese multi-speaker voice dataset with 30 hours of diverse speech data from 100 speakers, aimed at advancing speech synthesis research.

Contribution

The paper presents the design and specifications of the new JVS corpus, expanding resources for multi-speaker and style-varied speech synthesis research.

Findings

01

Contains 30 hours of voice data from 100 speakers

02

Includes three speech styles: normal, whisper, and falsetto

03

Provides 22 hours of parallel normal voice data

Abstract

Thanks to improvements in machine learning techniques, including deep learning, speech synthesis is becoming a machine learning task. To accelerate speech synthesis research, we are developing Japanese voice corpora reasonably accessible from not only academic institutions but also commercial companies. In 2017, we released the JSUT corpus, which contains 10 hours of reading-style speech uttered by a single speaker, for end-to-end text-to-speech synthesis. For more general use in speech synthesis research, e.g., voice conversion and multi-speaker modeling, in this paper, we construct the JVS corpus, which contains voice data of 100 speakers in three styles (normal, whisper, and falsetto). The corpus contains 30 hours of voice data including 22 hours of parallel normal voices. This paper describes how we designed the corpus and summarizes the specifications. The corpus is available at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

sbintuitions/voicebench-ja
dataset· 53 dl
53 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Phonetics and Phonology Research