VisionScores -- A system-segmented image score dataset for deep learning tasks
Alejandro Romero Amezcua, Mariano Jos\'e Juan Rivera Meraz

TL;DR
VisionScores introduces a unique, system-segmented image dataset of piano scores, emphasizing structure and composition patterns to enhance deep learning applications in music and image analysis.
Contribution
It provides the first system-segmented image score dataset with detailed metadata, supporting diverse scenarios based on composer and composition type.
Findings
Dataset contains 24.8k samples of piano scores.
Includes both segmented images and full-page scores.
Supports analysis of graphic similarity and composition patterns.
Abstract
VisionScores presents a novel proposal being the first system-segmented image score dataset, aiming to offer structure-rich, high information-density images for machine and deep learning tasks. Delimited to two-handed piano pieces, it was built to consider not only certain graphic similarity but also composition patterns, as this creative process is highly instrument-dependent. It provides two scenarios in relation to composer and composition type. The first, formed by 14k samples, considers works from different authors but the same composition type, specifically, Sonatinas. The latter, consisting of 10.8K samples, presents the opposite case, various composition types from the same author, being the one selected Franz Liszt. All of the 24.8k samples are formatted as grayscale jpg images of pixels. VisionScores supplies the users not only the formatted samples but the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis
