StarGAN-based Emotional Voice Conversion for Japanese Phrases

Asuka Moritani; Ryo Ozaki; Shoki Sakamoto; Hirokazu Kameoka; Tadahiro; Taniguchi

arXiv:2104.01807·cs.SD·April 6, 2021·6 cites

StarGAN-based Emotional Voice Conversion for Japanese Phrases

Asuka Moritani, Ryo Ozaki, Shoki Sakamoto, Hirokazu Kameoka, Tadahiro, Taniguchi

PDF

Open Access

TL;DR

This paper demonstrates that StarGAN-VC can be effectively applied to emotional voice conversion for Japanese phrases, achieving promising subjective evaluation results with minimal processing.

Contribution

It is the first to apply StarGAN-VC directly to Japanese emotional voice conversion, showing its capability for non-parallel multi-emotional VC with subjective evaluation.

Findings

01

StarGAN-EVC achieved high neutrality and similarity scores.

02

Subjective classification confirmed effective emotional conversion.

03

Interdependence between source and target emotions was analyzed.

Abstract

This paper shows that StarGAN-VC, a spectral envelope transformation method for non-parallel many-to-many voice conversion (VC), is capable of emotional VC (EVC). Although StarGAN-VC has been shown to enable speaker identity conversion, its capability for EVC for Japanese phrases has not been clarified. In this paper, we describe the direct application of StarGAN-VC to an EVC task with minimal fundamental frequency and aperiodicity processing. Through subjective evaluation experiments, we evaluated the performance of our StarGAN-EVC system in terms of its ability to achieve EVC for Japanese phrases. The subjective evaluation is conducted in terms of subjective classification and mean opinion score of neutrality and similarity. In addition, the interdependence between the source and target emotional domains was investigated from the perspective of the quality of EVC.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing