Translating Visual Art into Music

Maximilian M\"uller-Eberstein; Nanne van Noord

arXiv:1909.01218·cs.CV·September 15, 2019

Translating Visual Art into Music

Maximilian M\"uller-Eberstein, Nanne van Noord

PDF

3 Repos

TL;DR

This paper presents SynVAE, a model that translates visual art into music by learning a shared latent space, achieving high consistency and human-matching accuracy without paired datasets.

Contribution

The research introduces SynVAE, a novel variational autoencoder that enables cross-modal translation between images and music without requiring paired training data.

Findings

01

SynVAE maintains high information content during translation.

02

The model achieves up to 73% accuracy in human matching tasks.

03

It demonstrates effective cross-modal latent space consistency.

Abstract

The Synesthetic Variational Autoencoder (SynVAE) introduced in this research is able to learn a consistent mapping between visual and auditive sensory modalities in the absence of paired datasets. A quantitative evaluation on MNIST as well as the Behance Artistic Media dataset (BAM) shows that SynVAE is capable of retaining sufficient information content during the translation while maintaining cross-modal latent space consistency. In a qualitative evaluation trial, human evaluators were furthermore able to match musical samples with the images which generated them with accuracies of up to 73%.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSolana Customer Service Number +1-833-534-1729