Leveraging Diverse Semantic-based Audio Pretrained Models for Singing   Voice Conversion

Xueyao Zhang; Zihao Fang; Yicheng Gu; Haopeng Chen; Lexiao Zou; Junan; Zhang; Liumeng Xue; Zhizheng Wu

arXiv:2310.11160·cs.SD·September 17, 2024·1 cites

Leveraging Diverse Semantic-based Audio Pretrained Models for Singing Voice Conversion

Xueyao Zhang, Zihao Fang, Yicheng Gu, Haopeng Chen, Lexiao Zou, Junan, Zhang, Liumeng Xue, Zhizheng Wu

PDF

Open Access 1 Repo

TL;DR

This paper explores the use of diverse semantic-based pretrained audio models for singing voice conversion, demonstrating that their complementary knowledge improves conversion quality, especially in real-world scenarios.

Contribution

It introduces a novel framework, DSFF-SVC, that fuses diverse semantic features from pretrained models to enhance singing voice conversion performance.

Findings

01

Diverse semantic models provide complementary information for SVC.

02

DSFF-SVC improves existing SVC models in real-world tasks.

03

The framework generalizes well across different SVC scenarios.

Abstract

Singing Voice Conversion (SVC) is a technique that enables any singer to perform any song. To achieve this, it is essential to obtain speaker-agnostic representations from the source audio, which poses a significant challenge. A common solution involves utilizing a semantic-based audio pretrained model as a feature extractor. However, the degree to which the extracted features can meet the SVC requirements remains an open question. This includes their capability to accurately model melody and lyrics, the speaker-independency of their underlying acoustic information, and their robustness for in-the-wild acoustic environments. In this study, we investigate the knowledge within classical semantic-based pretrained models in much detail. We discover that the knowledge of different models is diverse and can be complementary for SVC. Based on the above, we design a Singing Voice Conversion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

open-mmlab/Amphion/tree/main/egs/svc/MultipleContentsSVC
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing