Learning the Beauty in Songs: Neural Singing Voice Beautifier
Jinglin Liu, Chengxi Li, Yi Ren, Zhiying Zhu, Zhou Zhao

TL;DR
This paper introduces Neural Singing Voice Beautifier (NSVB), a generative model that enhances amateur singing by improving pitch and vocal tone using novel time-warping and latent-mapping techniques, validated on Chinese and English songs.
Contribution
The paper presents the first generative model for singing voice beautifying, incorporating a novel time-warping method and a latent-mapping algorithm, along with a new parallel singing dataset.
Findings
Effective in improving vocal tone and intonation
Works on both Chinese and English songs
Outperforms existing methods in objective and subjective metrics
Abstract
We are interested in a novel task, singing voice beautifying (SVB). Given the singing voice of an amateur singer, SVB aims to improve the intonation and vocal tone of the voice, while keeping the content and vocal timbre. Current automatic pitch correction techniques are immature, and most of them are restricted to intonation but ignore the overall aesthetic quality. Hence, we introduce Neural Singing Voice Beautifier (NSVB), the first generative model to solve the SVB task, which adopts a conditional variational autoencoder as the backbone and learns the latent representations of vocal tone. In NSVB, we propose a novel time-warping approach for pitch correction: Shape-Aware Dynamic Time Warping (SADTW), which ameliorates the robustness of existing time-warping approaches, to synchronize the amateur recording with the template pitch curve. Furthermore, we propose a latent-mapping…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Time Series Analysis and Forecasting
