CNN based music emotion classification
Xin Liu, Qingcai Chen, Xiangping Wu, Yan Liu, Yang Liu

TL;DR
This paper introduces a CNN-based approach for music emotion recognition that directly uses spectrograms, eliminating the need for manual feature extraction, and demonstrates superior performance on standard datasets.
Contribution
The paper presents a novel CNN method that automatically learns relevant features from spectrograms for music emotion classification, outperforming existing techniques.
Findings
Outperforms state-of-the-art methods on CAL500 datasets
Uses spectrograms to capture both time and frequency domain information
Eliminates manual feature extraction by training CNN directly on spectrograms
Abstract
Music emotion recognition (MER) is usually regarded as a multi-label tagging task, and each segment of music can inspire specific emotion tags. Most researchers extract acoustic features from music and explore the relations between these features and their corresponding emotion tags. Considering the inconsistency of emotions inspired by the same music segment for human beings, seeking for the key acoustic features that really affect on emotions is really a challenging task. In this paper, we propose a novel MER method by using deep convolutional neural network (CNN) on the music spectrograms that contains both the original time and frequency domain information. By the proposed method, no additional effort on extracting specific features required, which is left to the training procedure of the CNN model. Experiments are conducted on the standard CAL500 and CAL500exp dataset. Results show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
