WeSinger: Data-augmented Singing Voice Synthesis with Auxiliary Losses
Zewang Zhang, Yibin Zheng, Xinhui Li, Li Lu

TL;DR
WeSinger is a novel multi-singer Chinese neural singing voice synthesis system that leverages data augmentation, advanced models, and high-quality vocoding to achieve state-of-the-art naturalness and accuracy.
Contribution
It introduces a comprehensive SVS system combining 24 kHz LPCNet vocoder, multi-singer pre-training, and novel modules for improved singing synthesis.
Findings
Achieves state-of-the-art performance on Opencpop corpus
Demonstrates high naturalness and accuracy in synthesis
First to combine 24 kHz LPCNet with multi-singer pre-training
Abstract
In this paper, we develop a new multi-singer Chinese neural singing voice synthesis (SVS) system named WeSinger. To improve the accuracy and naturalness of synthesized singing voice, we design several specifical modules and techniques: 1) A deep bi-directional LSTM-based duration model with multi-scale rhythm loss and post-processing step; 2) A Transformer-alike acoustic model with progressive pitch-weighted decoder loss; 3) a 24 kHz pitch-aware LPCNet neural vocoder to produce high-quality singing waveforms; 4) A novel data augmentation method with multi-singer pre-training for stronger robustness and naturalness. To our knowledge, WeSinger is the first SVS system to adopt 24 kHz LPCNet and multi-singer pre-training simultaneously. Both quantitative and qualitative evaluation results demonstrate the effectiveness of WeSinger in terms of accuracy and naturalness, and WeSinger achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
