Let the Model Learn to Feel: Mode-Guided Tonality Injection for Symbolic Music Emotion Recognition

Haiying Xia; Zhongyi Huang; Yumei Tan; Shuxiang Song

arXiv:2512.17946·cs.SD·December 23, 2025

Let the Model Learn to Feel: Mode-Guided Tonality Injection for Symbolic Music Emotion Recognition

Haiying Xia, Zhongyi Huang, Yumei Tan, Shuxiang Song

PDF

Open Access 1 Video

TL;DR

This paper enhances symbolic music emotion recognition by integrating musical mode information into pre-trained models, significantly improving their ability to capture emotion-mode relationships and boosting accuracy on benchmark datasets.

Contribution

It introduces a mode-guided feature injection framework that explicitly incorporates mode features into MIDIBERT, addressing its limitations in encoding emotion-mode correlations.

Findings

01

Mode injection improves emotion recognition accuracy

02

Enhanced model captures emotion-mode associations more effectively

03

Significant performance gains on EMOPIA and VGMIDI datasets

Abstract

Music emotion recognition is a key task in symbolic music understanding (SMER). Recent approaches have shown promising results by fine-tuning large-scale pre-trained models (e.g., MIDIBERT, a benchmark in symbolic music understanding) to map musical semantics to emotional labels. While these models effectively capture distributional musical semantics, they often overlook tonal structures, particularly musical modes, which play a critical role in emotional perception according to music psychology. In this paper, we investigate the representational capacity of MIDIBERT and identify its limitations in capturing mode-emotion associations. To address this issue, we propose a Mode-Guided Enhancement (MoGE) strategy that incorporates psychological insights on mode into the model. Specifically, we first conduct a mode augmentation analysis, which reveals that MIDIBERT fails to effectively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Let the Model Learn to Feel: Mode-Guided Tonality Injection for Symbolic Music Emotion Recognition· underline

Taxonomy

TopicsMusic and Audio Processing · Neuroscience and Music Perception · Emotion and Mood Recognition