HumanGAN: generative adversarial network with human-based discriminator   and its evaluation in speech perception modeling

Kazuki Fujii; Yuki Saito; Shinnosuke Takamichi; Yukino Baba; Hiroshi; Saruwatari

arXiv:1909.11391·cs.SD·September 26, 2019

HumanGAN: generative adversarial network with human-based discriminator and its evaluation in speech perception modeling

Kazuki Fujii, Yuki Saito, Shinnosuke Takamichi, Yukino Baba, Hiroshi, Saruwatari

PDF

Open Access

TL;DR

HumanGAN introduces a novel GAN framework that uses human perception as a black-box discriminator, enabling modeling of broader, human-acceptable speech distributions beyond real data.

Contribution

This paper presents the HumanGAN, integrating human perception into GAN training to capture wider, human-acceptable speech distributions, unlike traditional GANs.

Findings

01

HumanGAN can model a broader speech distribution than real data.

02

Crowdsourcing effectively trains the human perception discriminator.

03

HumanGAN outperforms basic GANs in speech naturalness modeling.

Abstract

We propose the HumanGAN, a generative adversarial network (GAN) incorporating human perception as a discriminator. A basic GAN trains a generator to represent a real-data distribution by fooling the discriminator that distinguishes real and generated data. Therefore, the basic GAN cannot represent the outside of a real-data distribution. In the case of speech perception, humans can recognize not only human voices but also processed (i.e., a non-existent human) voices as human voice. Such a human-acceptable distribution is typically wider than a real-data one and cannot be modeled by the basic GAN. To model the human-acceptable distribution, we formulate a backpropagation-based generator training algorithm by regarding human perception as a black-boxed discriminator. The training efficiently iterates generator training by using a computer and discrimination by crowdsourcing. We evaluate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Music Technology and Sound Studies

MethodsConvolution · Dogecoin Customer Service Number +1-833-534-1729