VisionScores -- A system-segmented image score dataset for deep learning tasks

Alejandro Romero Amezcua; Mariano Jos\'e Juan Rivera Meraz

arXiv:2506.23030·cs.CV·July 1, 2025

VisionScores -- A system-segmented image score dataset for deep learning tasks

Alejandro Romero Amezcua, Mariano Jos\'e Juan Rivera Meraz

PDF

Open Access 1 Datasets

TL;DR

VisionScores introduces a unique, system-segmented image dataset of piano scores, emphasizing structure and composition patterns to enhance deep learning applications in music and image analysis.

Contribution

It provides the first system-segmented image score dataset with detailed metadata, supporting diverse scenarios based on composer and composition type.

Findings

01

Dataset contains 24.8k samples of piano scores.

02

Includes both segmented images and full-page scores.

03

Supports analysis of graphic similarity and composition patterns.

Abstract

VisionScores presents a novel proposal being the first system-segmented image score dataset, aiming to offer structure-rich, high information-density images for machine and deep learning tasks. Delimited to two-handed piano pieces, it was built to consider not only certain graphic similarity but also composition patterns, as this creative process is highly instrument-dependent. It provides two scenarios in relation to composer and composition type. The first, formed by 14k samples, considers works from different authors but the same composition type, specifically, Sonatinas. The latter, consisting of 10.8K samples, presents the opposite case, various composition types from the same author, being the one selected Franz Liszt. All of the 24.8k samples are formatted as grayscale jpg images of $128 \times 512$ pixels. VisionScores supplies the users not only the formatted samples but the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

alromz/VisionScores
dataset· 24 dl
24 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis