BiSinger: Bilingual Singing Voice Synthesis
Huali Zhou, Yueqian Lin, Yao Shi, Peng Sun, Ming Li

TL;DR
BiSinger introduces a novel bilingual singing voice synthesis system that effectively models English and Chinese Mandarin voices within a single shared framework, enabling high-quality code-switch singing synthesis.
Contribution
The paper proposes a shared language-independent representation for bilingual SVS and combines datasets using voice conversion, advancing multilingual singing voice synthesis technology.
Findings
Effective bilingual voice modeling in a single system
Improved performance in English and code-switch SVS
Maintains Chinese singing quality
Abstract
Although Singing Voice Synthesis (SVS) has made great strides with Text-to-Speech (TTS) techniques, multilingual singing voice modeling remains relatively unexplored. This paper presents BiSinger, a bilingual pop SVS system for English and Chinese Mandarin. Current systems require separate models per language and cannot accurately represent both Chinese and English, hindering code-switch SVS. To address this gap, we design a shared representation between Chinese and English singing voices, achieved by using the CMU dictionary with mapping rules. We fuse monolingual singing datasets with open-source singing voice conversion techniques to generate bilingual singing voices while also exploring the potential use of bilingual speech data. Experiments affirm that our language-independent representation and incorporation of related datasets enable a single model with enhanced performance in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques
