A comparative study of generative models for child voice conversion
Protima Nomo Sudro, Anton Ragni, Thomas Hain

TL;DR
This paper compares four generative models for adult-to-child voice conversion, introducing a frequency warping technique to improve speaker similarity, and evaluates their performance using objective and subjective measures.
Contribution
It provides a comparative analysis of diffusion, flow-based, VAE, and GAN models for child voice conversion and proposes a frequency warping method to enhance speaker similarity.
Findings
All models produce plausible speech but lack sufficient similarity to target speakers.
Frequency warping significantly reduces mismatch between adult and child speech.
Objective and subjective evaluations show varying effectiveness of models with the proposed technique.
Abstract
Generative models are a popular choice for adult-to-adult voice conversion (VC) because of their efficient way of modelling unlabelled data. To this point their usefulness in producing children speech and in particular adult to child VC has not been investigated. For adult to child VC, four generative models are compared: diffusion model, flow based model, variational autoencoders, and generative adversarial network. Results show that although converted speech outputs produce by those models appear plausible, they exhibit insufficient similarity with the target speaker characteristics. We introduce an efficient frequency warping technique that can be applied to the output of models, and which shows significant reduction of the mismatch between adult and child. The output of all the models are evaluated using both objective and subjective measures. In particular we compare specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Language Development and Disorders
