How Similar or Different Is Rakugo Speech Synthesizer to Professional   Performers?

Shuhei Kato; Yusuke Yasuda; Xin Wang; Erica Cooper; Junichi Yamagishi

arXiv:2010.11549·eess.AS·October 23, 2020

How Similar or Different Is Rakugo Speech Synthesizer to Professional Performers?

Shuhei Kato, Yusuke Yasuda, Xin Wang, Erica Cooper, Junichi Yamagishi

PDF

Open Access

TL;DR

This study evaluates how closely rakugo speech synthesis mimics professional performers, revealing that while naturalness is comparable, entertainment quality and character distinguishability need significant improvement for authentic entertainment.

Contribution

The paper introduces a novel evaluation methodology for rakugo speech synthesis comparing synthesized and real performances, highlighting key factors affecting entertainment quality.

Findings

01

Synthesized speech has similar naturalness to human speech.

02

Entertainment level of synthesized speech is lower than professional performers.

03

Understanding and character distinguishability are critical for entertainment quality.

Abstract

We have been working on speech synthesis for rakugo (a traditional Japanese form of verbal entertainment similar to one-person stand-up comedy) toward speech synthesis that authentically entertains audiences. In this paper, we propose a novel evaluation methodology using synthesized rakugo speech and real rakugo speech uttered by professional performers of three different ranks. The naturalness of the synthesized speech was comparable to that of the human speech, but the synthesized speech entertained listeners less than the performers of any rank. However, we obtained some interesting insights into challenges to be solved in order to achieve a truly entertaining rakugo synthesizer. For example, naturalness was not the most important factor, even though it has generally been emphasized as the most important point to be evaluated in the conventional speech synthesis field. More important…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing