Mozart's Touch: A Lightweight Multi-modal Music Generation Framework   Based on Pre-Trained Large Models

Jiajun Li; Tianze Xu; Xuesong Chen; Xinrui Yao; Shuchang Liu

arXiv:2405.02801·cs.SD·November 26, 2024

Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models

Jiajun Li, Tianze Xu, Xuesong Chen, Xinrui Yao, Shuchang Liu

PDF

Open Access 1 Repo

TL;DR

Mozart's Touch is a multi-modal music generation framework that leverages large language models to interpret visual and textual inputs, enabling efficient and transparent creation of emotionally aligned music without extensive model training.

Contribution

The paper introduces Mozart's Touch, a novel multi-modal music generation framework that uses LLMs for interpretation, avoiding fine-tuning and enhancing efficiency and transparency.

Findings

01

Outperforms state-of-the-art models in evaluations

02

Effectively interprets cross-modal inputs with LLM-Bridge

03

Provides a transparent, efficient music generation process

Abstract

In recent years, AI-Generated Content (AIGC) has witnessed rapid advancements, facilitating the creation of music, images, and other artistic forms across a wide range of industries. However, current models for image- and video-to-music synthesis struggle to capture the nuanced emotions and atmosphere conveyed by visual content. To fill this gap, we propose Mozart's Touch, a multi-modal music generation framework capable of generating music aligned with cross-modal inputs such as images, videos, and text. The framework consists of three key components: Multi-modal Captioning Module, Large Language Model (LLM) understanding \& Bridging Module, and Music Generation Module. Unlike traditional end-to-end methods, Mozart's Touch uses LLMs to accurately interpret visual elements without requiring the training or fine-tuning of music generation models, providing efficiency and transparency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tiffanyblews/mozartstouch
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Human Motion and Animation