Direct content-based retrieval from music scores images
Noelia Luna-Barahona, Antonio R\'ios-Vila, David Rizo, Jorge Calvo-Zaragoza

TL;DR
This paper explores content-based retrieval methods for music score images, comparing transcription-based, transcription-free, and language model approaches across diverse datasets.
Contribution
It systematically evaluates multiple retrieval techniques, including a novel transcription-free Transformer model and a dataset construction method from annotated corpora.
Findings
OMR-based methods excel in in-domain retrieval
Transcription-free models better handle domain variability
Different methods perform best under different dataset conditions
Abstract
The digitization of musical scores plays a crucial role in their preservation and accessibility, yet information retrieval still depends mainly on metadata searches, such as by title or composer. Content based search in music score images remains underexplored compared to text documents, despite its potential value for musicians, musicologists, and educators. This work contributes to the field by first studying which characteristics of a score are most relevant for search and by defining a systematic method to build query datasets from any annotated corpus. We also consider diverse methods for content-based search on music score images, ranging from transcription-based approaches relying on Optical Music Recognition (OMR), to a transcription-free Transformer model trained to recognize queries directly from score images, and a text-prompted Large Language Model. Our experiments evaluate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
