Multi-modal Conditional Bounding Box Regression for Music Score   Following

Florian Henkel; Gerhard Widmer

arXiv:2105.04309·cs.SD·May 11, 2021

Multi-modal Conditional Bounding Box Regression for Music Score Following

Florian Henkel, Gerhard Widmer

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel neural network approach inspired by object detection for real-time score following in sheet music images, achieving state-of-the-art accuracy and robustness on synthetic and real piano recordings.

Contribution

A new conditional neural network architecture for on-line audio-to-score alignment that directly predicts score positions from sheet images, improving accuracy over existing methods.

Findings

01

Achieves state-of-the-art results on synthetic datasets.

02

Significantly improves real-world piano alignment with data augmentation.

03

Outperforms existing score following approaches and OMR baselines.

Abstract

This paper addresses the problem of sheet-image-based on-line audio-to-score alignment also known as score following. Drawing inspiration from object detection, a conditional neural network architecture is proposed that directly predicts x,y coordinates of the matching positions in a complete score sheet image at each point in time for a given musical performance. Experiments are conducted on a synthetic polyphonic piano benchmark dataset and the new method is compared to several existing approaches from the literature for sheet-image-based score following as well as an Optical Music Recognition baseline. The proposed approach achieves new state-of-the-art results and furthermore significantly improves the alignment performance on a set of real-world piano recordings by applying Impulse Responses as a data augmentation technique.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

CPJKU/cyolo_score_following
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing