Self-Supervised Contrastive Learning for Robust Audio-Sheet Music   Retrieval Systems

Luis Carvalho; Tobias Wash\"uttl; Gerhard Widmer

arXiv:2309.12134·cs.SD·September 22, 2023

Self-Supervised Contrastive Learning for Robust Audio-Sheet Music Retrieval Systems

Luis Carvalho, Tobias Wash\"uttl, Gerhard Widmer

PDF

TL;DR

This paper explores self-supervised contrastive learning to improve cross-modal music retrieval by pre-training on real music data, significantly enhancing snippet retrieval and piece identification accuracy in audio-sheet music systems.

Contribution

It demonstrates that self-supervised contrastive pre-training on real music data improves cross-modal retrieval performance and generalizes better to real-world scenarios.

Findings

01

Pre-trained models show better retrieval precision across scenarios.

02

Retrieval quality improves from 30% to 100% with real data.

03

Self-supervised learning alleviates data scarcity in music retrieval.

Abstract

Linking sheet music images to audio recordings remains a key problem for the development of efficient cross-modal music retrieval systems. One of the fundamental approaches toward this task is to learn a cross-modal embedding space via deep neural networks that is able to connect short snippets of audio and sheet music. However, the scarcity of annotated data from real musical content affects the capability of such methods to generalize to real retrieval scenarios. In this work, we investigate whether we can mitigate this limitation with self-supervised contrastive learning, by exposing a network to a large amount of real music data as a pre-training step, by contrasting randomly augmented views of snippets of both modalities, namely audio and sheet images. Through a number of experiments on synthetic and real piano data, we show that pre-trained models are able to retrieve snippets…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Learning