TL;DR
This paper introduces a deep learning approach for translating music into sentiment-aware visual stories, aiming to evoke similar feelings in viewers as the original songs do, by leveraging cross-modal translation techniques.
Contribution
It presents the first deep learning method for music-to-image translation that captures sentiment, addressing modality mapping challenges with a trainable cross-modal approach.
Findings
Effective synthesis of sentiment-aligned visual stories from music
Demonstrated robustness across different songs and genres
Enhanced emotional communication through visual storytelling
Abstract
In this paper we propose a deep learning method for performing attributed-based music-to-image translation. The proposed method is applied for synthesizing visual stories according to the sentiment expressed by songs. The generated images aim to induce the same feelings to the viewers, as the original song does, reinforcing the primary aim of music, i.e., communicating feelings. The process of music-to-image translation poses unique challenges, mainly due to the unstable mapping between the different modalities involved in this process. In this paper, we employ a trainable cross-modal translation method to overcome this limitation, leading to the first, to the best of our knowledge, deep learning method for generating sentiment-aware visual stories. Various aspects of the proposed method are extensively evaluated and discussed using different songs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
