TL;DR
This paper introduces a transformer-based model for generating multi-instrument symbolic music conditioned on continuous emotion labels, supported by a new dataset and quantitative evaluations showing improved performance over existing methods.
Contribution
It presents a novel method for emotion-conditioned music generation using continuous valence-arousal labels and provides a large-scale paired dataset.
Findings
Outperforms control token conditioning in note prediction accuracy
Achieves better emotion regulation in generated music
Demonstrates effective continuous emotion conditioning
Abstract
In this paper we present a new approach for the generation of multi-instrument symbolic music driven by musical emotion. The principal novelty of our approach centres on conditioning a state-of-the-art transformer based on continuous-valued valence and arousal labels. In addition, we provide a new large-scale dataset of symbolic music paired with emotion labels in terms of valence and arousal. We evaluate our approach in a quantitative manner in two ways, first by measuring its note prediction accuracy, and second via a regression task in the valence-arousal plane. Our results demonstrate that our proposed approaches outperform conditioning using control tokens which is representative of the current state of the art.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
