Multi-task Learning with Metadata for Music Mood Classification
Rajnish Kumar, Manjeet Dahiya

TL;DR
This paper explores using metadata like artist and year in a multi-task learning framework to enhance music mood classification, resulting in significant performance improvements across datasets.
Contribution
It introduces a multi-task learning approach that leverages readily available metadata to improve music mood classification accuracy.
Findings
Up to 8.7 points improvement in average precision
Consistent performance gains across multiple datasets
Effective integration of metadata with CNN models
Abstract
Mood recognition is an important problem in music informatics and has key applications in music discovery and recommendation. These applications have become even more relevant with the rise of music streaming. Our work investigates the research question of whether we can leverage audio metadata such as artist and year, which is readily available, to improve the performance of mood classification models. To this end, we propose a multi-task learning approach in which a shared model is simultaneously trained for mood and metadata prediction tasks with the goal to learn richer representations. Experimentally, we demonstrate that applying our technique on the existing state-of-the-art convolutional neural networks for mood classification improves their performances consistently. We conduct experiments on multiple datasets and report that our approach can lead to improvements in the average…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
