NHSS: A Speech and Singing Parallel Database
Bidisha Sharma, Xiaoxue Gao, Karthika Vijayan, Xiaohai Tian, Haizhou, Li

TL;DR
This paper introduces the NHSS database, a comprehensive collection of parallel speech and singing recordings designed to facilitate research in voice conversion, synthesis, and comparative analysis of speech and singing attributes.
Contribution
The paper presents a new publicly available parallel speech and singing database with detailed annotations and benchmark systems for speech-to-singing conversion tasks.
Findings
Analysis of acoustic similarities and differences between speech and singing voices.
Development of benchmark systems for speech-to-singing alignment and spectral mapping.
Provision of a large, annotated dataset for research in voice conversion and synthesis.
Abstract
We present a database of parallel recordings of speech and singing, collected and released by the Human Language Technology (HLT) laboratory at the National University of Singapore (NUS), that is called NUS-HLT Speak-Sing (NHSS) database. We release this database to the public to support research activities, that include, but not limited to comparative studies of acoustic attributes of speech and singing signals, cooperative synthesis of speech and singing voices, and speech-to-singing conversion. This database consists of recordings of sung vocals of English pop songs, the spoken counterpart of lyrics of the songs read by the singers in their natural reading manner, and manually prepared utterance-level and word-level annotations. The audio recordings in the NHSS database correspond to 100 songs sung and spoken by 10 singers, resulting in a total of 7 hours of audio data. There are 5…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
