Audio-to-Score Alignment Using Deep Automatic Music Transcription

Federico Simonetta; Stavros Ntalampiras; Federico Avanzini

arXiv:2107.12854·cs.SD·January 3, 2022

Audio-to-Score Alignment Using Deep Automatic Music Transcription

Federico Simonetta, Stavros Ntalampiras, Federico Avanzini

PDF

Open Access 1 Repo

TL;DR

This paper presents a novel note-level audio-to-score alignment method leveraging deep learning-based automatic music transcription and HMM-based score alignment, significantly advancing the state-of-the-art through extensive testing.

Contribution

It introduces a new approach combining deep AMT models with HMM alignment for improved note-level accuracy in audio-to-score alignment.

Findings

01

Achieved significant improvement over previous methods

02

Demonstrated robustness across multiple datasets

03

Provided a systematic procedure for large unaligned datasets

Abstract

Audio-to-score alignment (A2SA) is a multimodal task consisting in the alignment of audio signals to music scores. Recent literature confirms the benefits of Automatic Music Transcription (AMT) for A2SA at the frame-level. In this work, we aim to elaborate on the exploitation of AMT Deep Learning (DL) models for achieving alignment at the note-level. We propose a method which benefits from HMM-based score-to-score alignment and AMT, showing a remarkable advancement beyond the state-of-the-art. We design a systematic procedure to take advantage of large datasets which do not offer an aligned score. Finally, we perform a thorough comparison and extensive tests on multiple datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LIMUNIMI/MMSP2021-Audio2ScoreAlignment
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing