SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion

Bingsong Bai; Fengping Wang; Yingming Gao; Ya Li

arXiv:2406.05692·cs.SD·June 12, 2024

SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion

Bingsong Bai, Fengping Wang, Yingming Gao, Ya Li

PDF

Open Access

TL;DR

This paper introduces SPA-SVC, a self-supervised pitch augmentation technique that improves singing voice conversion quality, especially in cross-domain scenarios with pitch disparities, without extra data or model complexity.

Contribution

It proposes a novel cycle pitch shifting training strategy and SSIM loss integration into SVC models, enhancing performance in challenging cross-domain singing voice conversion tasks.

Findings

01

Significant improvement in voice quality in SVC tasks.

02

Enhanced performance in cross-domain scenarios.

03

Effective without additional data or increased model parameters.

Abstract

Diffusion-based singing voice conversion (SVC) models have shown better synthesis quality compared to traditional methods. However, in cross-domain SVC scenarios, where there is a significant disparity in pitch between the source and target voice domains, the models tend to generate audios with hoarseness, posing challenges in achieving high-quality vocal outputs. Therefore, in this paper, we propose a Self-supervised Pitch Augmentation method for Singing Voice Conversion (SPA-SVC), which can enhance the voice quality in SVC tasks without requiring additional data or increasing model parameters. We innovatively introduce a cycle pitch shifting training strategy and Structural Similarity Index (SSIM) loss into our SVC model, effectively enhancing its performance. Experimental results on the public singing datasets M4Singer indicate that our proposed method significantly improves model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing