LargeSHS: A large-scale dataset of music adaptation

Chih-Pin Tan; Hsuan-Kai Kao; Li Su; Yi-Hsuan Yang

arXiv:2511.15270·cs.SD·November 25, 2025

LargeSHS: A large-scale dataset of music adaptation

Chih-Pin Tan, Hsuan-Kai Kao, Li Su, Yi-Hsuan Yang

PDF

Open Access

TL;DR

LargeSHS is a comprehensive large-scale dataset of music adaptations, enabling advanced research in cover song generation and reference-based music tasks by providing structured relationships and extensive metadata.

Contribution

The paper introduces LargeSHS, a novel dataset with over 1.7 million entries, including structured adaptation relationships, supporting new AI research in music generation and MIR.

Findings

01

LargeSHS contains over 1.7 million entries and 900k audio links.

02

Includes structured adaptation relationships like cover song trees.

03

Highlights the dataset's scale and potential for new research directions.

Abstract

Recent advances in AI-based music generation have focused heavily on text-conditioned models, with less attention given to reference-based generation such as song adaptation. To support this line of research, we introduce LargeSHS, a large-scale dataset derived from SecondHandSongs, containing over 1.7 million metadata entries and approximately 900k publicly accessible audio links. Unlike existing datasets, LargeSHS includes structured adaptation relationships between musical works, enabling the construction of adaptation trees and performance clusters that represent cover song families. We provide comprehensive statistics and comparisons with existing datasets, highlighting the unique scale and richness of LargeSHS. This dataset paves the way for new research in cover song generation, reference-based music generation, and adaptation-aware MIR tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Artificial Intelligence in Games