Joint Audio and Symbolic Conditioning for Temporally Controlled   Text-to-Music Generation

Or Tal; Alon Ziv; Itai Gat; Felix Kreuk; Yossi Adi

arXiv:2406.10970·cs.SD·June 18, 2024

Joint Audio and Symbolic Conditioning for Temporally Controlled Text-to-Music Generation

Or Tal, Alon Ziv, Itai Gat, Felix Kreuk, Yossi Adi

PDF

Open Access 8 Models

TL;DR

JASCO is a novel text-to-music generation model that combines symbolic and audio conditions for fine-grained, controllable music synthesis, leveraging flow matching and information bottleneck techniques.

Contribution

The paper introduces JASCO, a new model that integrates symbolic and audio-based controls for temporally precise music generation using a flow matching framework.

Findings

01

JASCO achieves comparable quality to baseline models.

02

JASCO provides significantly better control over generated music.

03

Human evaluations favor JASCO's controllability.

Abstract

We present JASCO, a temporally controlled text-to-music generation model utilizing both symbolic and audio-based conditions. JASCO can generate high-quality music samples conditioned on global text descriptions along with fine-grained local controls. JASCO is based on the Flow Matching modeling paradigm together with a novel conditioning method. This allows music generation controlled both locally (e.g., chords) and globally (text description). Specifically, we apply information bottleneck layers in conjunction with temporal blurring to extract relevant information with respect to specific controls. This allows the incorporation of both symbolic and audio-based conditions in the same text-to-music model. We experiment with various symbolic control signals (e.g., chords, melody), as well as with audio representations (e.g., separated drum tracks, full-mix). We evaluate JASCO considering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Human Motion and Animation