Direct content-based retrieval from music scores images

Noelia Luna-Barahona; Antonio R\'ios-Vila; David Rizo; Jorge Calvo-Zaragoza

arXiv:2605.22255·cs.CV·May 22, 2026

Direct content-based retrieval from music scores images

Noelia Luna-Barahona, Antonio R\'ios-Vila, David Rizo, Jorge Calvo-Zaragoza

PDF

TL;DR

This paper explores content-based retrieval methods for music score images, comparing transcription-based, transcription-free, and language model approaches across diverse datasets.

Contribution

It systematically evaluates multiple retrieval techniques, including a novel transcription-free Transformer model and a dataset construction method from annotated corpora.

Findings

01

OMR-based methods excel in in-domain retrieval

02

Transcription-free models better handle domain variability

03

Different methods perform best under different dataset conditions

Abstract

The digitization of musical scores plays a crucial role in their preservation and accessibility, yet information retrieval still depends mainly on metadata searches, such as by title or composer. Content based search in music score images remains underexplored compared to text documents, despite its potential value for musicians, musicologists, and educators. This work contributes to the field by first studying which characteristics of a score are most relevant for search and by defining a systematic method to build query datasets from any annotated corpus. We also consider diverse methods for content-based search on music score images, ranging from transcription-based approaches relying on Optical Music Recognition (OMR), to a transcription-free Transformer model trained to recognize queries directly from score images, and a text-prompted Large Language Model. Our experiments evaluate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.