LargeSHS: A large-scale dataset of music adaptation
Chih-Pin Tan, Hsuan-Kai Kao, Li Su, Yi-Hsuan Yang

TL;DR
LargeSHS is a comprehensive large-scale dataset of music adaptations, enabling advanced research in cover song generation and reference-based music tasks by providing structured relationships and extensive metadata.
Contribution
The paper introduces LargeSHS, a novel dataset with over 1.7 million entries, including structured adaptation relationships, supporting new AI research in music generation and MIR.
Findings
LargeSHS contains over 1.7 million entries and 900k audio links.
Includes structured adaptation relationships like cover song trees.
Highlights the dataset's scale and potential for new research directions.
Abstract
Recent advances in AI-based music generation have focused heavily on text-conditioned models, with less attention given to reference-based generation such as song adaptation. To support this line of research, we introduce LargeSHS, a large-scale dataset derived from SecondHandSongs, containing over 1.7 million metadata entries and approximately 900k publicly accessible audio links. Unlike existing datasets, LargeSHS includes structured adaptation relationships between musical works, enabling the construction of adaptation trees and performance clusters that represent cover song families. We provide comprehensive statistics and comparisons with existing datasets, highlighting the unique scale and richness of LargeSHS. This dataset paves the way for new research in cover song generation, reference-based music generation, and adaptation-aware MIR tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Artificial Intelligence in Games
