Towards Robust and Truly Large-Scale Audio-Sheet Music Retrieval
Luis Carvalho, Gerhard Widmer

TL;DR
This paper reviews the progress and challenges in developing deep learning-based methods for large-scale, robust audio-sheet music retrieval, aiming to improve cross-modal music identification in real-world scenarios.
Contribution
It provides an insightful analysis of current deep learning approaches, identifies key challenges, and proposes future directions for scalable and reliable cross-modal music retrieval.
Findings
Identified main challenges in large-scale cross-modal music retrieval.
Documented step-by-step improvements in existing methods.
Highlighted remaining issues and potential solutions for robustness.
Abstract
A range of applications of multi-modal music information retrieval is centred around the problem of connecting large collections of sheet music (images) to corresponding audio recordings, that is, identifying pairs of audio and score excerpts that refer to the same musical content. One of the typical and most recent approaches to this task employs cross-modal deep learning architectures to learn joint embedding spaces that link the two distinct modalities - audio and sheet music images. While there has been steady improvement on this front over the past years, a number of open problems still prevent large-scale employment of this methodology. In this article we attempt to provide an insightful examination of the current developments on audio-sheet music retrieval via deep learning methods. We first identify a set of main challenges on the road towards robust and large-scale cross-modal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing
