HumanGAN: generative adversarial network with human-based discriminator and its evaluation in speech perception modeling
Kazuki Fujii, Yuki Saito, Shinnosuke Takamichi, Yukino Baba, Hiroshi, Saruwatari

TL;DR
HumanGAN introduces a novel GAN framework that uses human perception as a black-box discriminator, enabling modeling of broader, human-acceptable speech distributions beyond real data.
Contribution
This paper presents the HumanGAN, integrating human perception into GAN training to capture wider, human-acceptable speech distributions, unlike traditional GANs.
Findings
HumanGAN can model a broader speech distribution than real data.
Crowdsourcing effectively trains the human perception discriminator.
HumanGAN outperforms basic GANs in speech naturalness modeling.
Abstract
We propose the HumanGAN, a generative adversarial network (GAN) incorporating human perception as a discriminator. A basic GAN trains a generator to represent a real-data distribution by fooling the discriminator that distinguishes real and generated data. Therefore, the basic GAN cannot represent the outside of a real-data distribution. In the case of speech perception, humans can recognize not only human voices but also processed (i.e., a non-existent human) voices as human voice. Such a human-acceptable distribution is typically wider than a real-data one and cannot be modeled by the basic GAN. To model the human-acceptable distribution, we formulate a backpropagation-based generator training algorithm by regarding human perception as a black-boxed discriminator. The training efficiently iterates generator training by using a computer and discrimination by crowdsourcing. We evaluate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Music Technology and Sound Studies
MethodsConvolution · Dogecoin Customer Service Number +1-833-534-1729
