TOMI: Transforming and Organizing Music Ideas for Multi-Track Compositions with Full-Song Structure

Qi He; Gus Xia; Ziyu Wang

arXiv:2506.23094·cs.SD·July 1, 2025

TOMI: Transforming and Organizing Music Ideas for Multi-Track Compositions with Full-Song Structure

Qi He, Gus Xia, Ziyu Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces TOMI, a novel deep learning approach that uses instruction-tuned foundation LLMs to generate, transform, and organize multi-track electronic music with full-song structure, enabling interactive co-creation.

Contribution

The paper presents a new concept hierarchy framework for music generation and a TOMI-based model that integrates with digital audio workstations for improved music composition.

Findings

01

Produces higher-quality electronic music than baselines.

02

Generates music with stronger structural coherence.

03

Enables interactive human-AI co-creation.

Abstract

Hierarchical planning is a powerful approach to model long sequences structurally. Aside from considering hierarchies in the temporal structure of music, this paper explores an even more important aspect: concept hierarchy, which involves generating music ideas, transforming them, and ultimately organizing them--across musical time and space--into a complete composition. To this end, we introduce TOMI (Transforming and Organizing Music Ideas) as a novel approach in deep music generation and develop a TOMI-based model via instruction-tuned foundation LLM. Formally, we represent a multi-track composition process via a sparse, four-dimensional space characterized by clips (short audio or MIDI segments), sections (temporal positions), tracks (instrument layers), and transformations (elaboration methods). Our model is capable of generating multi-track electronic music with full-song…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

heqi201255/tomi
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies · Music and Audio Processing · Artificial Intelligence in Games