Vis2Mus: Exploring Multimodal Representation Mapping for Controllable   Music Generation

Runbang Zhang; Yixiao Zhang; Kai Shao; Ying Shan; Gus Xia

arXiv:2211.05543·cs.SD·November 11, 2022·1 cites

Vis2Mus: Exploring Multimodal Representation Mapping for Controllable Music Generation

Runbang Zhang, Yixiao Zhang, Kai Shao, Ying Shan, Gus Xia

PDF

Open Access 1 Repo

TL;DR

This paper introduces Vis2Mus, a system that maps visual art representations to music, enabling controllable music generation through interpretable visual transformations, using an analysis-by-synthesis approach with user studies.

Contribution

It presents a novel visual-to-music representation mapping method that is interpretable and does not require extensive paired data, enhancing controllability in music generation.

Findings

01

Visual-to-music mapping exhibits equivariant properties.

02

Transformations in images can control corresponding musical features.

03

The Vis2Mus system enables user-controlled symbolic music generation.

Abstract

In this study, we explore the representation mapping from the domain of visual arts to the domain of music, with which we can use visual arts as an effective handle to control music generation. Unlike most studies in multimodal representation learning that are purely data-driven, we adopt an analysis-by-synthesis approach that combines deep music representation learning with user studies. Such an approach enables us to discover \textit{interpretable} representation mapping without a huge amount of paired data. In particular, we discover that visual-to-music mapping has a nice property similar to equivariant. In other words, we can use various image transformations, say, changing brightness, changing contrast, style transfer, to control the corresponding transformations in the music domain. In addition, we released the Vis2Mus system as a controllable interface for symbolic music…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ldzhangyx/vis2mus
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis