Sound and Music Biases in Deep Music Transcription Models: A Systematic Analysis
Luk\'a\v{s} Samuel Mart\'ak, Patricia Hu, Gerhard Widmer

TL;DR
This paper systematically analyzes how deep music transcription models perform across different musical genres, dynamics, and polyphony levels, revealing significant performance drops and highlighting the impact of dataset biases.
Contribution
It introduces the MDS corpus for evaluating distribution shifts in music transcription and provides a comprehensive analysis of model robustness across musical variations.
Findings
Performance drops of up to 20 percentage points due to sound variations.
Dynamics estimation is more vulnerable than onset prediction.
Musically informed metrics reveal factors affecting model performance.
Abstract
Automatic Music Transcription (AMT) -- the task of converting music audio into note representations -- has seen rapid progress, driven largely by deep learning systems. Due to the limited availability of richly annotated music datasets, much of the progress in AMT has been concentrated on classical piano music, and even a few very specific datasets. Whether these systems can generalize effectively to other musical contexts remains an open question. Complementing recent studies on distribution shifts in sound (e.g., recording conditions), in this work we investigate the musical dimension -- specifically, variations in genre, dynamics, and polyphony levels. To this end, we introduce the MDS corpus, comprising three distinct subsets -- (1) Genre, (2) Random, and (3) MAEtest -- to emulate different axes of distribution shift. We evaluate the performance of several state-of-the-art AMT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Neuroscience and Music Perception
