BiSinger: Bilingual Singing Voice Synthesis

Huali Zhou; Yueqian Lin; Yao Shi; Peng Sun; Ming Li

arXiv:2309.14089·eess.AS·January 10, 2024

BiSinger: Bilingual Singing Voice Synthesis

Huali Zhou, Yueqian Lin, Yao Shi, Peng Sun, Ming Li

PDF

Open Access 1 Repo

TL;DR

BiSinger introduces a novel bilingual singing voice synthesis system that effectively models English and Chinese Mandarin voices within a single shared framework, enabling high-quality code-switch singing synthesis.

Contribution

The paper proposes a shared language-independent representation for bilingual SVS and combines datasets using voice conversion, advancing multilingual singing voice synthesis technology.

Findings

01

Effective bilingual voice modeling in a single system

02

Improved performance in English and code-switch SVS

03

Maintains Chinese singing quality

Abstract

Although Singing Voice Synthesis (SVS) has made great strides with Text-to-Speech (TTS) techniques, multilingual singing voice modeling remains relatively unexplored. This paper presents BiSinger, a bilingual pop SVS system for English and Chinese Mandarin. Current systems require separate models per language and cannot accurately represent both Chinese and English, hindering code-switch SVS. To address this gap, we design a shared representation between Chinese and English singing voices, achieved by using the CMU dictionary with mapping rules. We fuse monolingual singing datasets with open-source singing voice conversion techniques to generate bilingual singing voices while also exploring the potential use of bilingual speech data. Experiments affirm that our language-independent representation and incorporation of related datasets enable a single model with enhanced performance in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BiSinger-SVS/BiSinger
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques