Art2Mus: Bridging Visual Arts and Music through Cross-Modal Generation
Ivan Rinaldi, Nicola Fanelli, Giovanna Castellano, Gennaro Vessio

TL;DR
Art2Mus introduces a novel AI model that generates music from digitized artworks or text, extending cross-modal creativity beyond simple images and enabling new multimedia artistic applications.
Contribution
It extends the AudioLDM 2 architecture to create a model capable of generating music from complex digitized artworks, using curated datasets from ImageBind.
Findings
Successfully generates music aligned with input artworks
Demonstrates potential for multimedia art and interactive installations
Shows promising results in cross-modal creative applications
Abstract
Artificial Intelligence and generative models have revolutionized music creation, with many models leveraging textual or visual prompts for guidance. However, existing image-to-music models are limited to simple images, lacking the capability to generate music from complex digitized artworks. To address this gap, we introduce , a novel model designed to create music from digitized artworks or text inputs. extends the AudioLDM~2 architecture, a text-to-audio model, and employs our newly curated datasets, created via ImageBind, which pair digitized artworks with music. Experimental results demonstrate that can generate music that resonates with the input stimuli. These findings suggest promising applications in multimedia art, interactive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Human Motion and Animation · Computer Graphics and Visualization Techniques
