ADFF: Attention Based Deep Feature Fusion Approach for Music Emotion Recognition
Zi Huang, Shulei Ji, Zhilan Hu, Chuangjian Cai, Jing Luo, Xinyu Yang

TL;DR
This paper introduces ADFF, an attention-based deep feature fusion method for music emotion recognition that leverages log Mel-spectrograms and a multi-level feature extraction process to improve emotion classification accuracy.
Contribution
It proposes an end-to-end deep learning framework combining spatial and temporal feature learning with attention mechanisms for MER, along with a novel multi-channel data processing technique.
Findings
Achieves 10.43% and 4.82% relative improvements in valence and arousal prediction.
Performs better on datasets with different scales and in multi-task learning.
Outperforms state-of-the-art models in MER accuracy.
Abstract
Music emotion recognition (MER), a sub-task of music information retrieval (MIR), has developed rapidly in recent years. However, the learning of affect-salient features remains a challenge. In this paper, we propose an end-to-end attention-based deep feature fusion (ADFF) approach for MER. Only taking log Mel-spectrogram as input, this method uses adapted VGGNet as spatial feature learning module (SFLM) to obtain spatial features across different levels. Then, these features are fed into squeeze-and-excitation (SE) attention-based temporal feature learning module (TFLM) to get multi-level emotion-related spatial-temporal features (ESTFs), which can discriminate emotions well in the final emotion space. In addition, a novel data processing is devised to cut the single-channel input into multi-channel to improve calculative efficiency while ensuring the quality of MER. Experiments show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing
