The artificial synesthete: Image-melody translations with variational autoencoders
Karl Wienand, Wolfgang M. Heckl

TL;DR
This paper introduces a neural network system that translates images into melodies and vice versa, creating an artificial synesthete that interprets visual and musical data through learned correspondences.
Contribution
It presents a novel neural network architecture combining autoencoders and translation networks to generate cross-modal representations between images and melodies.
Findings
Generates melodies inspired by images and images from music.
Demonstrates learned correspondences between visual and musical concepts.
Provides a new perspective on machine perception and interpretation.
Abstract
Abstract This project presents a system of neural networks to translate between images and melodies. Autoencoders compress the information in samples to abstract representation. A translation network learns a set of correspondences between musical and visual concepts from repeated joint exposure. The resulting "artificial synesthete" generates simple melodies inspired by images, and images from music. These are novel interpretation (not transposed data), expressing the machine' perception and understanding. Observing the work, one explores the machine's perception and thus, by contrast, one's own.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies
