Emotion4MIDI: a Lyrics-based Emotion-Labeled Symbolic Music Dataset
Serkan Sulun, Pedro Oliveira, Paula Viana

TL;DR
Emotion4MIDI is a large-scale dataset of 12,000 MIDI songs labeled with fine-grained emotions, created by applying advanced emotion classification models trained on the GoEmotions dataset to lyrics, facilitating research on emotion-driven music generation.
Contribution
This paper introduces a novel large-scale emotion-labeled symbolic music dataset derived from lyrics, enabling improved emotion-aware music generation models.
Findings
Achieved state-of-the-art emotion classification results on GoEmotions.
Successfully labeled 12k MIDI songs with diverse emotions.
Provided open access to code, models, and datasets.
Abstract
We present a new large-scale emotion-labeled symbolic music dataset consisting of 12k MIDI songs. To create this dataset, we first trained emotion classification models on the GoEmotions dataset, achieving state-of-the-art results with a model half the size of the baseline. We then applied these models to lyrics from two large-scale MIDI datasets. Our dataset covers a wide range of fine-grained emotions, providing a valuable resource to explore the connection between music and emotions and, especially, to develop models that can generate music based on specific emotions. Our code for inference, trained models, and datasets are available online.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Neuroscience and Music Perception
