ADFF: Attention Based Deep Feature Fusion Approach for Music Emotion   Recognition

Zi Huang; Shulei Ji; Zhilan Hu; Chuangjian Cai; Jing Luo; Xinyu Yang

arXiv:2204.05649·cs.SD·July 1, 2022·1 cites

ADFF: Attention Based Deep Feature Fusion Approach for Music Emotion Recognition

Zi Huang, Shulei Ji, Zhilan Hu, Chuangjian Cai, Jing Luo, Xinyu Yang

PDF

Open Access

TL;DR

This paper introduces ADFF, an attention-based deep feature fusion method for music emotion recognition that leverages log Mel-spectrograms and a multi-level feature extraction process to improve emotion classification accuracy.

Contribution

It proposes an end-to-end deep learning framework combining spatial and temporal feature learning with attention mechanisms for MER, along with a novel multi-channel data processing technique.

Findings

01

Achieves 10.43% and 4.82% relative improvements in valence and arousal prediction.

02

Performs better on datasets with different scales and in multi-task learning.

03

Outperforms state-of-the-art models in MER accuracy.

Abstract

Music emotion recognition (MER), a sub-task of music information retrieval (MIR), has developed rapidly in recent years. However, the learning of affect-salient features remains a challenge. In this paper, we propose an end-to-end attention-based deep feature fusion (ADFF) approach for MER. Only taking log Mel-spectrogram as input, this method uses adapted VGGNet as spatial feature learning module (SFLM) to obtain spatial features across different levels. Then, these features are fed into squeeze-and-excitation (SE) attention-based temporal feature learning module (TFLM) to get multi-level emotion-related spatial-temporal features (ESTFs), which can discriminate emotions well in the final emotion space. In addition, a novel data processing is devised to cut the single-channel input into multi-channel to improve calculative efficiency while ensuring the quality of MER. Experiments show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing