StarGAN-based Emotional Voice Conversion for Japanese Phrases
Asuka Moritani, Ryo Ozaki, Shoki Sakamoto, Hirokazu Kameoka, Tadahiro, Taniguchi

TL;DR
This paper demonstrates that StarGAN-VC can be effectively applied to emotional voice conversion for Japanese phrases, achieving promising subjective evaluation results with minimal processing.
Contribution
It is the first to apply StarGAN-VC directly to Japanese emotional voice conversion, showing its capability for non-parallel multi-emotional VC with subjective evaluation.
Findings
StarGAN-EVC achieved high neutrality and similarity scores.
Subjective classification confirmed effective emotional conversion.
Interdependence between source and target emotions was analyzed.
Abstract
This paper shows that StarGAN-VC, a spectral envelope transformation method for non-parallel many-to-many voice conversion (VC), is capable of emotional VC (EVC). Although StarGAN-VC has been shown to enable speaker identity conversion, its capability for EVC for Japanese phrases has not been clarified. In this paper, we describe the direct application of StarGAN-VC to an EVC task with minimal fundamental frequency and aperiodicity processing. Through subjective evaluation experiments, we evaluated the performance of our StarGAN-EVC system in terms of its ability to achieve EVC for Japanese phrases. The subjective evaluation is conducted in terms of subjective classification and mean opinion score of neutrality and similarity. In addition, the interdependence between the source and target emotional domains was investigated from the perspective of the quality of EVC.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
