A Convolutional-Attentional Neural Framework for Structure-Aware Performance-Score Synchronization
Ruchit Agrawal, Daniel Wolff, Simon Dixon

TL;DR
This paper introduces a novel data-driven, structure-aware neural framework that improves performance-score synchronization across various modalities and acoustic conditions, outperforming traditional methods.
Contribution
The paper presents a convolutional-attentional neural architecture with a custom loss for robust, structure-aware synchronization across multiple score modalities and conditions.
Findings
Outperforms state-of-the-art synchronization methods
Robust to structural differences between performance and score
Effective across different score modalities and acoustic environments
Abstract
Performance-score synchronization is an integral task in signal processing, which entails generating an accurate mapping between an audio recording of a performance and the corresponding musical score. Traditional synchronization methods compute alignment using knowledge-driven and stochastic approaches, and are typically unable to generalize well to different domains and modalities. We present a novel data-driven method for structure-aware performance-score synchronization. We propose a convolutional-attentional architecture trained with a custom loss based on time-series divergence. We conduct experiments for the audio-to-MIDI and audio-to-image alignment tasks pertained to different score modalities. We validate the effectiveness of our method via ablation studies and comparisons with state-of-the-art alignment approaches. We demonstrate that our approach outperforms previous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Neuroscience and Music Perception
