Multi-label Music Genre Classification from Audio, Text, and Images Using Deep Features
Sergio Oramas, Oriol Nieto, Francesco Barbieri, and Xavier Serra

TL;DR
This paper introduces MuMu, a large multi-modal dataset for multi-label music genre classification, and proposes a deep learning approach that combines audio, text, and image features to improve classification accuracy.
Contribution
The paper presents MuMu, a new multi-modal dataset with 31k albums across 250 genres, and a deep learning method that effectively combines multiple data modalities for multi-label genre classification.
Findings
Combining modalities improves classification accuracy.
Significant differences observed between data modalities.
New baselines established for multi-label genre classification.
Abstract
Music genres allow to categorize musical items that share common characteristics. Although these categories are not mutually exclusive, most related research is traditionally focused on classifying tracks into a single class. Furthermore, these categories (e.g., Pop, Rock) tend to be too broad for certain applications. In this work we aim to expand this task by categorizing musical items into multiple and fine-grained labels, using three different data modalities: audio, text, and images. To this end we present MuMu, a new dataset of more than 31k albums classified into 250 genre classes. For every album we have collected the cover image, text reviews, and audio tracks. Additionally, we propose an approach for multi-label genre classification based on the combination of feature embeddings learned with state-of-the-art deep learning methodologies. Experiments show major differences…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Diverse Musicological Studies · Music Technology and Sound Studies
