Story2MIDI: Emotionally Aligned Music Generation from Text
Mohammad Shokri, Alexandra C. Salem, Gabriel Levine, Johanna Devaney, Sarah Ita Levitan

TL;DR
Story2MIDI is a Transformer-based model that generates music aligned with emotions described in text, using a new dataset linking text and music emotions, and is validated through objective metrics and human studies.
Contribution
We introduce the first dataset linking text and music emotions and develop a Transformer model that generates emotion-aligned music from text descriptions.
Findings
Model effectively captures emotion cues in music.
Generated music shows diverse emotional responses.
Human and objective evaluations confirm emotional alignment.
Abstract
In this paper, we introduce Story2MIDI, a sequence-to-sequence Transformer-based model for generating emotion-aligned music from a given piece of text. To develop this model, we construct the Story2MIDI dataset by merging existing datasets for sentiment analysis from text and emotion classification in music. The resulting dataset contains pairs of text blurbs and music pieces that evoke the same emotions in the reader or listener. Despite the small scale of our dataset and limited computational resources, our results indicate that our model effectively learns emotion-relevant features in music and incorporates them into its generation process, producing samples with diverse emotional responses. We evaluate the generated outputs using objective musical metrics and a human listening study, confirming the model's ability to capture intended emotional cues.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Sentiment Analysis and Opinion Mining · Emotion and Mood Recognition
