Evaluation of CNN-based Automatic Music Tagging Models
Minz Won, Andres Ferraro, Dmitry Bogdanov, Xavier Serra

TL;DR
This paper provides a consistent evaluation of CNN-based music tagging models across multiple datasets, analyzing their robustness to input perturbations and offering reproducible implementations for future research.
Contribution
It offers a standardized comparison framework for CNN music tagging models and assesses their generalization under various input perturbations.
Findings
Models achieve comparable performance on standard metrics.
Perturbations reduce model accuracy, indicating sensitivity.
Reproducible pre-trained models are provided for future research.
Abstract
Recent advances in deep learning accelerated the development of content-based automatic music tagging systems. Music information retrieval (MIR) researchers proposed various architecture designs, mainly based on convolutional neural networks (CNNs), that achieve state-of-the-art results in this multi-label binary classification task. However, due to the differences in experimental setups followed by researchers, such as using different dataset splits and software versions for evaluation, it is difficult to compare the proposed architectures directly with each other. To facilitate further research, in this paper we conduct a consistent evaluation of different music tagging models on three datasets (MagnaTagATune, Million Song Dataset, and MTG-Jamendo) and provide reference results using common evaluation metrics (ROC-AUC and PR-AUC). Furthermore, all the models are evaluated with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Video Analysis and Summarization · Speech Recognition and Synthesis
