ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation

Junmin Gong; Yulin Song; Wenxiao Zhao; Sen Wang; Shengyuan Xu; Jing Guo; Xuerui Yang

arXiv:2602.00744·cs.SD·February 9, 2026

ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation

Junmin Gong, Yulin Song, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo, Xuerui Yang

PDF

Open Access 10 Models

TL;DR

ACE-Step 1.5 is an efficient open-source music generation model that produces high-quality, customizable songs quickly on consumer hardware, integrating advanced planning, synthesis, and editing features.

Contribution

It introduces a hybrid architecture with a language model as a planner and a diffusion transformer for music synthesis, enabling fast, high-quality, and customizable music generation.

Findings

01

Achieves commercial-grade quality on standard metrics

02

Generates full songs in under 2 seconds on high-end hardware

03

Supports lightweight personalization with minimal data

Abstract

We present ACE-Step v1.5, a highly efficient open-source music foundation model that brings commercial-grade generation to consumer hardware. On commonly used evaluation metrics, ACE-Step v1.5 achieves quality beyond most commercial music models while remaining extremely fast -- under 2 seconds per full song on an A100 and under 10 seconds on an RTX 3090. The model runs locally with less than 4GB of VRAM, and supports lightweight personalization: users can train a LoRA from just a few songs to capture their own style. At its core lies a novel hybrid architecture where the Language Model (LM) functions as an omni-capable planner: it transforms simple user queries into comprehensive song blueprints -- scaling from short loops to 10-minute compositions -- while synthesizing metadata, lyrics, and captions via Chain-of-Thought to guide the Diffusion Transformer (DiT). Uniquely, this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies · Music and Audio Processing · Artificial Intelligence in Games