Let the Model Learn to Feel: Mode-Guided Tonality Injection for Symbolic Music Emotion Recognition
Haiying Xia, Zhongyi Huang, Yumei Tan, Shuxiang Song

TL;DR
This paper enhances symbolic music emotion recognition by integrating musical mode information into pre-trained models, significantly improving their ability to capture emotion-mode relationships and boosting accuracy on benchmark datasets.
Contribution
It introduces a mode-guided feature injection framework that explicitly incorporates mode features into MIDIBERT, addressing its limitations in encoding emotion-mode correlations.
Findings
Mode injection improves emotion recognition accuracy
Enhanced model captures emotion-mode associations more effectively
Significant performance gains on EMOPIA and VGMIDI datasets
Abstract
Music emotion recognition is a key task in symbolic music understanding (SMER). Recent approaches have shown promising results by fine-tuning large-scale pre-trained models (e.g., MIDIBERT, a benchmark in symbolic music understanding) to map musical semantics to emotional labels. While these models effectively capture distributional musical semantics, they often overlook tonal structures, particularly musical modes, which play a critical role in emotional perception according to music psychology. In this paper, we investigate the representational capacity of MIDIBERT and identify its limitations in capturing mode-emotion associations. To address this issue, we propose a Mode-Guided Enhancement (MoGE) strategy that incorporates psychological insights on mode into the model. Specifically, we first conduct a mode augmentation analysis, which reveals that MIDIBERT fails to effectively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMusic and Audio Processing · Neuroscience and Music Perception · Emotion and Mood Recognition
