Singing Voice Synthesis with Vibrato Modeling and Latent Energy Representation
Yingjie Song, Wei Song, Wei Zhang, Zhengchen Zhang, Dan Zeng, Zhi Liu,, Yang Yu

TL;DR
This paper introduces an expressive singing voice synthesis system that models vibrato explicitly and uses latent energy representations to enhance naturalness and expressiveness, validated by experiments on an open dataset.
Contribution
It presents a novel deep learning vibrato model with automatic labeling and an autoencoder-based latent energy feature for improved singing voice synthesis.
Findings
Vibrato modeling significantly improves naturalness.
Latent energy representation enhances expressiveness.
Experimental results confirm effectiveness on NUS48E dataset.
Abstract
This paper proposes an expressive singing voice synthesis system by introducing explicit vibrato modeling and latent energy representation. Vibrato is essential to the naturalness of synthesized sound, due to the inherent characteristics of human singing. Hence, a deep learning-based vibrato model is introduced in this paper to control the vibrato's likeliness, rate, depth and phase in singing, where the vibrato likeliness represents the existence probability of vibrato and it would help improve the singing voice's naturalness. Actually, there is no annotated label about vibrato likeliness in existing singing corpus. We adopt a novel vibrato likeliness labeling method to label the vibrato likeliness automatically. Meanwhile, the power spectrogram of audio contains rich information that can improve the expressiveness of singing. An autoencoder-based latent energy bottleneck feature is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing
