Improving Controllability and Editability for Pretrained Text-to-Music   Generation Models

Yixiao Zhang

arXiv:2411.12641·cs.SD·November 22, 2024

Improving Controllability and Editability for Pretrained Text-to-Music Generation Models

Yixiao Zhang

PDF

Open Access

TL;DR

This paper introduces systems to improve control and editing capabilities in pretrained text-to-music models, enabling iterative refinement and attribute-specific edits while maintaining musical coherence.

Contribution

It presents Loop Copilot for interactive music creation and MusicMagus for zero-shot attribute editing, advancing controllability and editability in text-to-music generation.

Findings

01

Loop Copilot enables iterative music refinement with attribute coherence.

02

MusicMagus allows style-preserving edits without retraining.

03

The systems improve user control and flexibility in AI-generated music.

Abstract

The field of AI-assisted music creation has made significant strides, yet existing systems often struggle to meet the demands of iterative and nuanced music production. These challenges include providing sufficient control over the generated content and allowing for flexible, precise edits. This thesis tackles these issues by introducing a series of advancements that progressively build upon each other, enhancing the controllability and editability of text-to-music generation models. First, we introduce Loop Copilot, a system that tries to address the need for iterative refinement in music creation. Loop Copilot leverages a large language model (LLM) to coordinate multiple specialised AI models, enabling users to generate and refine music interactively through a conversational interface. Central to this system is the Global Attribute Table, which records and maintains key musical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies