SingMOS: An extensive Open-Source Singing Voice Dataset for MOS   Prediction

Yuxun Tang; Jiatong Shi; Yuning Wu; Qin Jin

arXiv:2406.10911·cs.SD·June 21, 2024·3 cites

SingMOS: An extensive Open-Source Singing Voice Dataset for MOS Prediction

Yuxun Tang, Jiatong Shi, Yuning Wu, Qin Jin

PDF

Open Access 2 Datasets

TL;DR

SingMOS is a comprehensive open-source dataset of singing voice samples with human ratings, designed to improve MOS prediction models in the singing domain, addressing data scarcity and copyright issues.

Contribution

This paper introduces SingMOS, a high-quality, diverse singing voice dataset with human annotations, filling a critical gap in singing MOS prediction research.

Findings

01

Dataset covers Chinese and Japanese singing samples.

02

Data analysis confirms diversity and reliability.

03

Provides insights for future singing MOS prediction models.

Abstract

In speech generation tasks, human subjective ratings, usually referred to as the opinion score, are considered the "gold standard" for speech quality evaluation, with the mean opinion score (MOS) serving as the primary evaluation metric. Due to the high cost of human annotation, several MOS prediction systems have emerged in the speech domain, demonstrating good performance. These MOS prediction models are trained using annotations from previous speech-related challenges. However, compared to the speech domain, the singing domain faces data scarcity and stricter copyright protections, leading to a lack of high-quality MOS-annotated datasets for singing. To address this, we propose SingMOS, a high-quality and diverse MOS dataset for singing, covering a range of Chinese and Japanese datasets. These synthesized vocals are generated using state-of-the-art models in singing synthesis,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing