Language Models for Music Medicine Generation
Emmanouil Nikolakakis, Joann Ching, Emmanouil Karystinaios, Gabrielle, Sipin, Gerhard Widmer, Razvan Marinescu

TL;DR
This paper introduces a novel method using fine-tuned MusicGen transformer models to generate emotion-guided musical clips for music therapy, aiming to aid emotional regulation in medical patients.
Contribution
It is the first to leverage language models for creating music medicine tailored to emotional transition in therapeutic contexts.
Findings
Generated clips follow the iso principle for emotional guidance.
Music emotion recognition confirms alignment with target emotions.
Concatenated clips form a 15-minute therapeutic session.
Abstract
Music therapy has been shown in recent years to provide multiple health benefits related to emotional wellness. In turn, maintaining a healthy emotional state has proven to be effective for patients undergoing treatment, such as Parkinson's patients or patients suffering from stress and anxiety. We propose fine-tuning MusicGen, a music-generating transformer model, to create short musical clips that assist patients in transitioning from negative to desired emotional states. Using low-rank decomposition fine-tuning on the MTG-Jamendo Dataset with emotion tags, we generate 30-second clips that adhere to the iso principle, guiding patients through intermediate states in the valence-arousal circumplex. The generated music is evaluated using a music emotion recognition model to ensure alignment with intended emotions. By concatenating these clips, we produce a 15-minute "music medicine"…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing
