Deep-Learning Architectures for Multi-Pitch Estimation: Towards Reliable   Evaluation

Christof Wei{\ss}; Geoffroy Peeters

arXiv:2202.09198·cs.SD·February 21, 2022

Deep-Learning Architectures for Multi-Pitch Estimation: Towards Reliable Evaluation

Christof Wei{\ss}, Geoffroy Peeters

PDF

Open Access

TL;DR

This paper compares various deep-learning architectures for multi-pitch estimation in polyphonic music, highlighting the importance of dataset splits and evaluation strategies for reliable performance assessment.

Contribution

It introduces modifications to CNN, U-net, and self-attention architectures, and proposes a best-practice dataset splitting strategy for more reliable evaluation.

Findings

01

Larger models tend to perform better.

02

Results are highly sensitive to dataset splits and randomization.

03

Dataset choice significantly affects perceived architecture performance.

Abstract

Extracting pitch information from music recordings is a challenging but important problem in music signal processing. Frame-wise transcription or multi-pitch estimation aims for detecting the simultaneous activity of pitches in polyphonic music recordings and has recently seen major improvements thanks to deep-learning techniques, with a variety of proposed network architectures. In this paper, we realize different architectures based on CNNs, the U-net structure, and self-attention components. We propose several modifications to these architectures including self-attention modules for skip connections, recurrent layers to replace the self-attention, and a multi-task strategy with simultaneous prediction of the degree of polyphony. We compare variants of these architectures in different sizes for multi-pitch estimation, focusing on Western classical music beyond the piano-solo scenario…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Diverse Musicological Studies

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Max Pooling · Convolution · U-Net