SYKI-SVC: Advancing Singing Voice Conversion with Post-Processing   Innovations and an Open-Source Professional Testset

Yiquan Zhou; Wenyu Wang; Hongwu Ding; Jiacheng Xu; Jihua Zhu; Xin Gao,; Shihao Li

arXiv:2501.02953·cs.SD·January 7, 2025

SYKI-SVC: Advancing Singing Voice Conversion with Post-Processing Innovations and an Open-Source Professional Testset

Yiquan Zhou, Wenyu Wang, Hongwu Ding, Jiacheng Xu, Jihua Zhu, Xin Gao,, Shihao Li

PDF

Open Access

TL;DR

This paper introduces a high-fidelity singing voice conversion system that leverages advanced feature extraction, a novel post-processing step, and a new open-source dataset, achieving high naturalness in converted singing voices.

Contribution

The paper presents a novel singing voice conversion system with a post-processing module and provides an open-source professional test set for evaluation.

Findings

01

Achieves high naturalness in singing voice conversion

02

Effective use of ContentVec and Whisper models for feature extraction

03

Open-source dataset facilitates standardized evaluation

Abstract

Singing voice conversion aims to transform a source singing voice into that of a target singer while preserving the original lyrics, melody, and various vocal techniques. In this paper, we propose a high-fidelity singing voice conversion system. Our system builds upon the SVCC T02 framework and consists of three key components: a feature extractor, a voice converter, and a post-processor. The feature extractor utilizes the ContentVec and Whisper models to derive F0 contours and extract speaker-independent linguistic features from the input singing voice. The voice converter then integrates the extracted timbre, F0, and linguistic content to synthesize the target speaker's waveform. The post-processor augments high-frequency information directly from the source through simple and effective signal processing to enhance audio quality. Due to the lack of a standardized professional dataset…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research