Towards End-to-End Audio-Sheet-Music Retrieval
Matthias Dorfer, Andreas Arzt, Gerhard Widmer

TL;DR
This paper explores a novel end-to-end method for cross-modal retrieval between audio snippets and sheet music images using deep learning, enabling music content search without symbolic representations.
Contribution
It introduces a DCCA-based approach for learning correlated latent spaces for audio and sheet music retrieval without relying on symbolic music data.
Findings
Initial experiments show promising retrieval accuracy.
Method works for simple monophonic music.
Cross-modality retrieval is feasible without symbolic scores.
Abstract
This paper demonstrates the feasibility of learning to retrieve short snippets of sheet music (images) when given a short query excerpt of music (audio) -- and vice versa --, without any symbolic representation of music or scores. This would be highly useful in many content-based musical retrieval scenarios. Our approach is based on Deep Canonical Correlation Analysis (DCCA) and learns correlated latent spaces allowing for cross-modality retrieval in both directions. Initial experiments with relatively simple monophonic music show promising results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing
