Deep-Learning Architectures for Multi-Pitch Estimation: Towards Reliable Evaluation
Christof Wei{\ss}, Geoffroy Peeters

TL;DR
This paper compares various deep-learning architectures for multi-pitch estimation in polyphonic music, highlighting the importance of dataset splits and evaluation strategies for reliable performance assessment.
Contribution
It introduces modifications to CNN, U-net, and self-attention architectures, and proposes a best-practice dataset splitting strategy for more reliable evaluation.
Findings
Larger models tend to perform better.
Results are highly sensitive to dataset splits and randomization.
Dataset choice significantly affects perceived architecture performance.
Abstract
Extracting pitch information from music recordings is a challenging but important problem in music signal processing. Frame-wise transcription or multi-pitch estimation aims for detecting the simultaneous activity of pitches in polyphonic music recordings and has recently seen major improvements thanks to deep-learning techniques, with a variety of proposed network architectures. In this paper, we realize different architectures based on CNNs, the U-net structure, and self-attention components. We propose several modifications to these architectures including self-attention modules for skip connections, recurrent layers to replace the self-attention, and a multi-task strategy with simultaneous prediction of the degree of polyphony. We compare variants of these architectures in different sizes for multi-pitch estimation, focusing on Western classical music beyond the piano-solo scenario…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Diverse Musicological Studies
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Max Pooling · Convolution · U-Net
