Automatic tagging using deep convolutional neural networks
Keunwoo Choi, George Fazekas, Mark Sandler

TL;DR
This paper introduces a deep convolutional neural network approach for automatic music tagging, demonstrating that deeper models and mel-spectrogram inputs improve tagging accuracy on large datasets.
Contribution
It presents the first comprehensive evaluation of fully convolutional neural networks for music tagging, highlighting the effectiveness of deeper architectures and mel-spectrograms.
Findings
Deeper models outperform shallower ones on large datasets.
Mel-spectrograms are effective for music tagging.
State-of-the-art performance achieved with 4-layer architecture.
Abstract
We present a content-based automatic music tagging algorithm using fully convolutional neural networks (FCNs). We evaluate different architectures consisting of 2D convolutional layers and subsampling layers only. In the experiments, we measure the AUC-ROC scores of the architectures with different complexities and input types using the MagnaTagATune dataset, where a 4-layer architecture shows state-of-the-art performance with mel-spectrogram input. Furthermore, we evaluated the performances of the architectures with varying the number of layers on a larger dataset (Million Song Dataset), and found that deeper models outperformed the 4-layer architecture. The experiments show that mel-spectrogram is an effective time-frequency representation for automatic tagging and that more complex models benefit from more training data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Video Analysis and Summarization
