Bridging Paintings and Music -- Exploring Emotion based Music Generation through Paintings
Tanisha Hisariya, Huan Zhang, Jinhua Liang

TL;DR
This paper presents a novel AI model that generates emotionally resonant music from paintings by converting visual art into descriptive text and then into musical compositions, supported by a new paired dataset.
Contribution
The study introduces a dual-stage framework and the Emotion Painting Music Dataset, enabling effective music generation from visual art with minimal data.
Findings
High alignment between generated music and emotional descriptions confirmed by CLAP
Effective music generation demonstrated using FAD, THD, IS, and KL metrics
Enhanced accessibility and multi-sensory experiences for visually impaired users
Abstract
Rapid advancements in artificial intelligence have significantly enhanced generative tasks involving music and images, employing both unimodal and multimodal approaches. This research develops a model capable of generating music that resonates with the emotions depicted in visual arts, integrating emotion labeling, image captioning, and language models to transform visual inputs into musical compositions. Addressing the scarcity of aligned art and music data, we curated the Emotion Painting Music Dataset, pairing paintings with corresponding music for effective training and evaluation. Our dual-stage framework converts images to text descriptions of emotional content and then transforms these descriptions into music, facilitating efficient learning with minimal data. Performance is evaluated using metrics such as Fr\'echet Audio Distance (FAD), Total Harmonic Distortion (THD), Inception…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArt Education and Development
