HumanACGAN: conditional generative adversarial network with human-based   auxiliary classifier and its evaluation in phoneme perception

Yota Ueda; Kazuki Fujii; Yuki Saito; Shinnosuke Takamichi; Yukino; Baba; Hiroshi Saruwatari

arXiv:2102.04051·cs.HC·February 9, 2021

HumanACGAN: conditional generative adversarial network with human-based auxiliary classifier and its evaluation in phoneme perception

Yota Ueda, Kazuki Fujii, Yuki Saito, Shinnosuke Takamichi, Yukino, Baba, Hiroshi Saruwatari

PDF

Open Access

TL;DR

This paper introduces HumanACGAN, a novel conditional GAN model that incorporates human perceptual evaluations as both discriminator and auxiliary classifier, enabling the generation of phonemes aligned with human perception.

Contribution

The paper extends HumanGAN to HumanACGAN, allowing conditional data generation based on human perceptual acceptability, which was not possible with previous models.

Findings

01

Successfully trained a conditional generator for phoneme perception.

02

Demonstrated the effectiveness of human-based discriminator and classifier.

03

Validated the model's ability to produce human-acceptable phonemes.

Abstract

We propose a conditional generative adversarial network (GAN) incorporating humans' perceptual evaluations. A deep neural network (DNN)-based generator of a GAN can represent a real-data distribution accurately but can never represent a human-acceptable distribution, which are ranges of data in which humans accept the naturalness regardless of whether the data are real or not. A HumanGAN was proposed to model the human-acceptable distribution. A DNN-based generator is trained using a human-based discriminator, i.e., humans' perceptual evaluations, instead of the GAN's DNN-based discriminator. However, the HumanGAN cannot represent conditional distributions. This paper proposes the HumanACGAN, a theoretical extension of the HumanGAN, to deal with conditional human-acceptable distributions. Our HumanACGAN trains a DNN-based conditional generator by regarding humans as not only a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis