CrossSinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice   Synthesizer Trained on Monolingual Singers

Xintong Wang; Chang Zeng; Jun Chen; Chunhui Wang

arXiv:2309.12672·cs.SD·September 25, 2023

CrossSinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice Synthesizer Trained on Monolingual Singers

Xintong Wang, Chang Zeng, Jun Chen, Chunhui Wang

PDF

Open Access 1 Repo

TL;DR

CrossSinger is a novel cross-lingual singing voice synthesis system that achieves high fidelity and multi-singer capabilities by unifying language representations and removing singer biases, enabling effective synthesis across multiple languages including code-switch scenarios.

Contribution

The paper introduces CrossSinger, a new model that uses IPA-based representation, conditional layer normalization, and GRL to enable cross-lingual, multi-singer high-fidelity singing synthesis from monolingual data.

Findings

01

Successfully synthesizes high-fidelity singing voices across multiple languages.

02

Demonstrates effective handling of code-switch singing scenarios.

03

Reduces singer bias in monolingual training data.

Abstract

It is challenging to build a multi-singer high-fidelity singing voice synthesis system with cross-lingual ability by only using monolingual singers in the training stage. In this paper, we propose CrossSinger, which is a cross-lingual singing voice synthesizer based on Xiaoicesing2. Specifically, we utilize International Phonetic Alphabet to unify the representation for all languages of the training data. Moreover, we leverage conditional layer normalization to incorporate the language information into the model for better pronunciation when singers meet unseen languages. Additionally, gradient reversal layer (GRL) is utilized to remove singer biases included in lyrics since all singers are monolingual, which indicates singer's identity is implicitly associated with the text. The experiment is conducted on a combination of three singing voice datasets containing Japanese Kiritan…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zengchang233/CrossSinger
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing