TOMI: Transforming and Organizing Music Ideas for Multi-Track Compositions with Full-Song Structure
Qi He, Gus Xia, Ziyu Wang

TL;DR
This paper introduces TOMI, a novel deep learning approach that uses instruction-tuned foundation LLMs to generate, transform, and organize multi-track electronic music with full-song structure, enabling interactive co-creation.
Contribution
The paper presents a new concept hierarchy framework for music generation and a TOMI-based model that integrates with digital audio workstations for improved music composition.
Findings
Produces higher-quality electronic music than baselines.
Generates music with stronger structural coherence.
Enables interactive human-AI co-creation.
Abstract
Hierarchical planning is a powerful approach to model long sequences structurally. Aside from considering hierarchies in the temporal structure of music, this paper explores an even more important aspect: concept hierarchy, which involves generating music ideas, transforming them, and ultimately organizing them--across musical time and space--into a complete composition. To this end, we introduce TOMI (Transforming and Organizing Music Ideas) as a novel approach in deep music generation and develop a TOMI-based model via instruction-tuned foundation LLM. Formally, we represent a multi-track composition process via a sparse, four-dimensional space characterized by clips (short audio or MIDI segments), sections (temporal positions), tracks (instrument layers), and transformations (elaboration methods). Our model is capable of generating multi-track electronic music with full-song…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Music and Audio Processing · Artificial Intelligence in Games
