JIS: A Speech Corpus of Japanese Idol Speakers with Various Speaking Styles

Yuto Kondo; Hirokazu Kameoka; Kou Tanaka; Takuhiro Kaneko

arXiv:2506.18296·cs.SD·July 17, 2025

JIS: A Speech Corpus of Japanese Idol Speakers with Various Speaking Styles

Yuto Kondo, Hirokazu Kameoka, Kou Tanaka, Takuhiro Kaneko

PDF

Open Access

TL;DR

The paper introduces the Japanese Idol Speech Corpus (JIS), a specialized speech dataset of young female idols designed to enhance speech generation AI research, especially in TTS and voice conversion, by providing a unique, culturally specific resource.

Contribution

It presents the creation of JIS, a novel speech corpus of Japanese idol speakers, enabling more accurate evaluation and development of speech synthesis and voice conversion systems.

Findings

01

JIS contains recordings of young female idols with diverse speaking styles.

02

JIS facilitates research on speaker similarity and voice customization.

03

The corpus is freely available for non-commercial research.

Abstract

We construct Japanese Idol Speech Corpus (JIS) to advance research in speech generation AI, including text-to-speech synthesis (TTS) and voice conversion (VC). JIS will facilitate more rigorous evaluations of speaker similarity in TTS and VC systems since all speakers in JIS belong to a highly specific category: "young female live idols" in Japan, and each speaker is identified by a stage name, enabling researchers to recruit listeners familiar with these idols for listening experiments. With its unique speaker attributes, JIS will foster compelling research, including generating voices tailored to listener preferences-an area not yet widely studied. JIS will be distributed free of charge to promote research in speech generation AI, with usage restricted to non-commercial, basic research. We describe the construction of JIS, provide an overview of Japanese live idol culture to support…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis