Integrating Text-to-Music Models with Language Models: Composing Long   Structured Music Pieces

Lilac Atassi

arXiv:2410.00344·cs.SD·October 8, 2024

Integrating Text-to-Music Models with Language Models: Composing Long Structured Music Pieces

Lilac Atassi

PDF

Open Access

TL;DR

This paper introduces a method combining text-to-music and language models to generate long, structured, and cohesive musical pieces exceeding previous context limitations, demonstrating significant improvements in musical organization.

Contribution

It presents a novel integration of text-to-music and language models to enable long-scale, structured music generation beyond existing transformer limitations.

Findings

01

Generated 2.5-minute-long music pieces

02

Music exhibits high structure and cohesion

03

Method outperforms previous short-context models

Abstract

Recent music generation methods based on transformers have a context window of up to a minute. The music generated by these methods is largely unstructured beyond the context window. With a longer context window, learning long-scale structures from musical data is a prohibitively challenging problem. This paper proposes integrating a text-to-music model with a large language model to generate music with form. The papers discusses the solutions to the challenges of such integration. The experimental results show that the proposed method can generate 2.5-minute-long music that is highly structured, strongly organized, and cohesive.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Topic Modeling · Music Technology and Sound Studies