HSD: A hierarchical singing annotation dataset
Xiao Fu, Xin Yuan, Jinglu Hu

TL;DR
This paper introduces a hierarchical singing annotation dataset for pop songs, capturing detailed musical and lyrical information to better represent the inherent musical structure, validated by high labeling accuracy.
Contribution
The paper presents a new hierarchical singing annotation dataset with detailed annotations, filling a gap in existing datasets that lack structural musical information.
Findings
Dataset achieves high labeling accuracy comparable to automatic singing transcription datasets.
Captures hierarchical musical structure including onset, offset, pitch, duration, and lyrics.
Provides a two-stage annotation process for improved accuracy.
Abstract
Commonly music has an obvious hierarchical structure, especially for the singing parts which usually act as the main melody in pop songs. However, most of the current singing annotation datasets only record symbolic information of music notes, ignoring the structure of music. In this paper, we propose a hierarchical singing annotation dataset that consists of 68 pop songs from Youtube. This dataset records the onset/offset time, pitch, duration, and lyric of each musical note in an enhanced LyRiCs format to present the hierarchical structure of music. We annotate each song in a two-stage process: first, create initial labels with the corresponding musical notation and lyrics file; second, manually calibrate these labels referring to the raw audio. We mainly validate the labeling accuracy of the proposed dataset by comparing it with an automatic singing transcription (AST) dataset. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music History and Culture · Music Technology and Sound Studies
