ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation
Junmin Gong, Yulin Song, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo, Xuerui Yang

TL;DR
ACE-Step 1.5 is an efficient open-source music generation model that produces high-quality, customizable songs quickly on consumer hardware, integrating advanced planning, synthesis, and editing features.
Contribution
It introduces a hybrid architecture with a language model as a planner and a diffusion transformer for music synthesis, enabling fast, high-quality, and customizable music generation.
Findings
Achieves commercial-grade quality on standard metrics
Generates full songs in under 2 seconds on high-end hardware
Supports lightweight personalization with minimal data
Abstract
We present ACE-Step v1.5, a highly efficient open-source music foundation model that brings commercial-grade generation to consumer hardware. On commonly used evaluation metrics, ACE-Step v1.5 achieves quality beyond most commercial music models while remaining extremely fast -- under 2 seconds per full song on an A100 and under 10 seconds on an RTX 3090. The model runs locally with less than 4GB of VRAM, and supports lightweight personalization: users can train a LoRA from just a few songs to capture their own style. At its core lies a novel hybrid architecture where the Language Model (LM) functions as an omni-capable planner: it transforms simple user queries into comprehensive song blueprints -- scaling from short loops to 10-minute compositions -- while synthesizing metadata, lyrics, and captions via Chain-of-Thought to guide the Diffusion Transformer (DiT). Uniquely, this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗ACE-Step/Ace-Step1.5model· 37k dl· ♡ 67237k dl♡ 672
- 🤗ACE-Step/acestep-v15-basemodel· 3.7k dl· ♡ 553.7k dl♡ 55
- 🤗ACE-Step/acestep-5Hz-lm-0.6Bmodel· 4.6k dl· ♡ 114.6k dl♡ 11
- 🤗ACE-Step/acestep-5Hz-lm-4Bmodel· 4.1k dl· ♡ 414.1k dl♡ 41
- 🤗ACE-Step/acestep-v15-sftmodel· 3.0k dl· ♡ 433.0k dl♡ 43
- 🤗ACE-Step/acestep-captionermodel· 8.4k dl· ♡ 428.4k dl♡ 42
- 🤗ACE-Step/acestep-transcribermodel· 2.7k dl· ♡ 462.7k dl♡ 46
- 🤗ACE-Step/acestep-v15-turbo-shift3model· 1.0k dl· ♡ 111.0k dl♡ 11
- 🤗ACE-Step/acestep-v15-turbo-shift1model· 465 dl· ♡ 12465 dl♡ 12
- 🤗ACE-Step/acestep-v15-turbo-continuousmodel· 614 dl· ♡ 13614 dl♡ 13
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Music and Audio Processing · Artificial Intelligence in Games
