JukeBox: A Multilingual Singer Recognition Dataset
Anurag Chowdhury, Austin Cozzo, Arun Ross

TL;DR
JukeBox introduces a multilingual singing voice dataset to evaluate and improve speaker recognition systems, highlighting challenges and effects of gender and language on performance across singing and spoken voices.
Contribution
The paper presents the first large-scale, labeled singing voice dataset for speaker recognition, enabling research beyond traditional spoken voice data.
Findings
Speaker recognition is more challenging on singing voice than spoken voice.
Gender and language significantly affect recognition accuracy.
Models trained on spoken voice perform poorly on singing voice.
Abstract
A text-independent speaker recognition system relies on successfully encoding speech factors such as vocal pitch, intensity, and timbre to achieve good performance. A majority of such systems are trained and evaluated using spoken voice or everyday conversational voice data. Spoken voice, however, exhibits a limited range of possible speaker dynamics, thus constraining the utility of the derived speaker recognition models. Singing voice, on the other hand, covers a broader range of vocal and ambient factors and can, therefore, be used to evaluate the robustness of a speaker recognition system. However, a majority of existing speaker recognition datasets only focus on the spoken voice. In comparison, there is a significant shortage of labeled singing voice data suitable for speaker recognition research. To address this issue, we assemble \textit{JukeBox} - a speaker recognition dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
