FreeSVC: Towards Zero-shot Multilingual Singing Voice Conversion

Alef Iury Siqueira Ferreira; Lucas Rafael Gris; Augusto Seben da Rosa,; Frederico Santos de Oliveira; Edresson Casanova; Rafael Teixeira Sousa,; Arnaldo Candido Junior; Anderson da Silva Soares; Arlindo Galv\~ao Filho

arXiv:2501.05586·cs.SD·March 18, 2025

FreeSVC: Towards Zero-shot Multilingual Singing Voice Conversion

Alef Iury Siqueira Ferreira, Lucas Rafael Gris, Augusto Seben da Rosa,, Frederico Santos de Oliveira, Edresson Casanova, Rafael Teixeira Sousa,, Arnaldo Candido Junior, Anderson da Silva Soares, Arlindo Galv\~ao Filho

PDF

Open Access 1 Repo 2 Models

TL;DR

FreeSVC introduces a zero-shot multilingual singing voice conversion system that leverages advanced models and embeddings to enable cross-lingual conversion without extensive training, improving content representation and speaker disentanglement.

Contribution

It presents a novel zero-shot multilingual singing voice conversion method using enhanced VITS, speaker-invariant clustering, and language embeddings for improved cross-lingual performance.

Findings

01

Effective zero-shot cross-lingual conversion demonstrated

02

Multilingual content extractor improves conversion quality

03

Publicly available source code and models

Abstract

This work presents FreeSVC, a promising multilingual singing voice conversion approach that leverages an enhanced VITS model with Speaker-invariant Clustering (SPIN) for better content representation and the State-of-the-Art (SOTA) speaker encoder ECAPA2. FreeSVC incorporates trainable language embeddings to handle multiple languages and employs an advanced speaker encoder to disentangle speaker characteristics from linguistic content. Designed for zero-shot learning, FreeSVC enables cross-lingual singing voice conversion without extensive language-specific training. We demonstrate that a multilingual content extractor is crucial for optimal cross-language conversion. Our source code and models are publicly available.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

freds0/free-svc
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing