SingAug: Data Augmentation for Singing Voice Synthesis with   Cycle-consistent Training Strategy

Shuai Guo; Jiatong Shi; Tao Qian; Shinji Watanabe; Qin Jin

arXiv:2203.17001·eess.AS·July 7, 2022

SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy

Shuai Guo, Jiatong Shi, Tao Qian, Shinji Watanabe, Qin Jin

PDF

Open Access

TL;DR

This paper introduces SingAug, a data augmentation approach combined with cycle-consistent training to improve singing voice synthesis quality, especially with limited data, showing significant performance gains.

Contribution

The paper proposes novel data augmentation strategies and a cycle-consistent training method specifically tailored for singing voice synthesis.

Findings

01

Enhanced synthesis quality with limited data.

02

Significant improvements in objective and subjective evaluations.

03

Effective augmentation strategies for SVS systems.

Abstract

Deep learning based singing voice synthesis (SVS) systems have been demonstrated to flexibly generate singing with better qualities, compared to conventional statistical parametric based methods. However, neural systems are generally data-hungry and have difficulty to reach reasonable singing quality with limited public available training data. In this work, we explore different data augmentation methods to boost the training of SVS systems, including several strategies customized to SVS based on pitch augmentation and mix-up augmentation. To further stabilize the training, we introduce the cycle-consistent training strategy. Extensive experiments on two public singing databases demonstrate that our proposed augmentation methods and the stabilizing training strategy can significantly improve the performance on both objective and subjective evaluations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing