Exploiting Temporal Dependencies for Cross-Modal Music Piece   Identification

Luis Carvalho; Gerhard Widmer

arXiv:2105.12536·eess.AS·May 27, 2021

Exploiting Temporal Dependencies for Cross-Modal Music Piece Identification

Luis Carvalho, Gerhard Widmer

PDF

Open Access

TL;DR

This paper enhances cross-modal music identification by incorporating temporal alignment and attention mechanisms into deep learning models, significantly improving retrieval accuracy between audio recordings and sheet music images.

Contribution

It introduces temporal sequence alignment and an attention mechanism to improve cross-modal music retrieval, addressing limitations of previous embedding-based methods.

Findings

01

Significant improvement in piece and fragment retrieval accuracy

02

Effective handling of tempo variations with attention mechanisms

03

Scalability considerations for large-scale applications

Abstract

This paper addresses the problem of cross-modal musical piece identification and retrieval: finding the appropriate recording(s) from a database given a sheet music query, and vice versa, working directly with audio and scanned sheet music images. The fundamental approach to this is to learn a cross-modal embedding space with a suitable similarity structure for audio and sheet image snippets, using a deep neural network, and identifying candidate pieces by cross-modal near neighbour search in this space. However, this method is oblivious of temporal aspects of music. In this paper, we introduce two strategies that address this shortcoming. First, we present a strategy that aligns sequences of embeddings learned from sheet music scans and audio snippets. A series of experiments on whole piece and fragment-level retrieval on 24 hours worth of classical piano recordings demonstrates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing