Eigenvoice Synthesis based on Model Editing for Speaker Generation
Masato Murata, Koichi Miyazaki, Tomoki Koriyama, Tomoki Toda

TL;DR
This paper introduces a novel DNN-based eigenvoice synthesis method that defines a speaker space within model parameters, enabling diverse speaker generation and attribute control without reference speech.
Contribution
It proposes a new approach to define speaker space in DNN parameters for speaker synthesis, extending traditional eigenvoice methods with model editing techniques.
Findings
Successfully generated diverse speaker voices.
Discovered a gender-dominant axis in the speaker space.
Demonstrated potential for attribute control in speaker synthesis.
Abstract
Speaker generation task aims to create unseen speaker voice without reference speech. The key to the task is defining a speaker space that represents diverse speakers to determine the generated speaker trait. However, the effective way to define this speaker space remains unclear. Eigenvoice synthesis is one of the promising approaches in the traditional parametric synthesis framework, such as HMM-based methods, which define a low-dimensional speaker space using pre-stored speaker features. This study proposes a novel DNN-based eigenvoice synthesis method via model editing. Unlike prior methods, our method defines a speaker space in the DNN model parameter space. By directly sampling new DNN model parameters in this space, we can create diverse speaker voices. Experimental results showed the capability of our method to generate diverse speakers' speech. Moreover, we discovered a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
