Improving Controllability and Editability for Pretrained Text-to-Music Generation Models
Yixiao Zhang

TL;DR
This paper introduces systems to improve control and editing capabilities in pretrained text-to-music models, enabling iterative refinement and attribute-specific edits while maintaining musical coherence.
Contribution
It presents Loop Copilot for interactive music creation and MusicMagus for zero-shot attribute editing, advancing controllability and editability in text-to-music generation.
Findings
Loop Copilot enables iterative music refinement with attribute coherence.
MusicMagus allows style-preserving edits without retraining.
The systems improve user control and flexibility in AI-generated music.
Abstract
The field of AI-assisted music creation has made significant strides, yet existing systems often struggle to meet the demands of iterative and nuanced music production. These challenges include providing sufficient control over the generated content and allowing for flexible, precise edits. This thesis tackles these issues by introducing a series of advancements that progressively build upon each other, enhancing the controllability and editability of text-to-music generation models. First, we introduce Loop Copilot, a system that tries to address the need for iterative refinement in music creation. Loop Copilot leverages a large language model (LLM) to coordinate multiple specialised AI models, enabling users to generate and refine music interactively through a conversational interface. Central to this system is the Global Attribute Table, which records and maintains key musical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies
