JSUT corpus: free large-scale Japanese speech corpus for end-to-end   speech synthesis

Ryosuke Sonobe; Shinnosuke Takamichi; Hiroshi Saruwatari

arXiv:1711.00354·cs.CL·November 2, 2017·88 cites

JSUT corpus: free large-scale Japanese speech corpus for end-to-end speech synthesis

Ryosuke Sonobe, Shinnosuke Takamichi, Hiroshi Saruwatari

PDF

Open Access 1 Repo 1 Datasets

TL;DR

The JSUT corpus is a large-scale, freely available Japanese speech dataset designed to facilitate end-to-end speech synthesis research, covering all main pronunciations and consisting of 10 hours of speech data.

Contribution

This paper introduces the first comprehensive large-scale Japanese speech corpus specifically designed for end-to-end speech synthesis, filling a significant resource gap.

Findings

01

Corpus covers all main Japanese pronunciations

02

Contains 10 hours of speech data

03

Freely available online

Abstract

Thanks to improvements in machine learning techniques including deep learning, a free large-scale speech corpus that can be shared between academic institutions and commercial companies has an important role. However, such a corpus for Japanese speech synthesis does not exist. In this paper, we designed a novel Japanese speech corpus, named the "JSUT corpus," that is aimed at achieving end-to-end speech synthesis. The corpus consists of 10 hours of reading-style speech data and its transcription and covers all of the main pronunciations of daily-use Japanese characters. In this paper, we describe how we designed and analyzed the corpus. The corpus is freely available online.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tarepan/jsut
pytorch

Datasets

enactic/avsr-leaderboard
dataset· 9 dl
9 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling