Investigating an Overfitting and Degeneration Phenomenon in Self-Supervised Multi-Pitch Estimation
Frank Cwitkowitz, Zhiyao Duan

TL;DR
This paper explores the use of self-supervised learning in multi-pitch estimation, revealing a phenomenon where models overfit supervised data while degenerating on self-supervised data, and offers insights into this issue.
Contribution
It introduces self-supervised objectives into the MPE paradigm and investigates the overfitting and degeneration phenomena encountered during joint training.
Findings
Self-supervised objectives improve performance under closed conditions.
Overfitting to supervised data causes degeneration on self-supervised data.
Insights into the overfitting and degeneration phenomena are provided.
Abstract
Multi-Pitch Estimation (MPE) continues to be a sought after capability of Music Information Retrieval (MIR) systems, and is critical for many applications and downstream tasks involving pitch, including music transcription. However, existing methods are largely based on supervised learning, and there are significant challenges in collecting annotated data for the task. Recently, self-supervised techniques exploiting intrinsic properties of pitch and harmonic signals have shown promise for both monophonic and polyphonic pitch estimation, but these still remain inferior to supervised methods. In this work, we extend the classic supervised MPE paradigm by incorporating several self-supervised objectives based on pitch-invariant and pitch-equivariant properties. This joint training results in a substantial improvement under closed training conditions, which naturally suggests that applying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Neuroscience and Music Perception · Music Technology and Sound Studies
