SymphonyGen: 3D Hierarchical Orchestral Generation with Controllable Harmony Skeleton

Xuzheng He; Nan Nan; Zhilin Wang; Ziyue Kang; Zhuoru Mo; Ao Li; Yu Pan; Xiaobing Li; Feng Yu; Xiaohong Guan

arXiv:2604.25498·cs.SD·April 29, 2026

SymphonyGen: 3D Hierarchical Orchestral Generation with Controllable Harmony Skeleton

Xuzheng He, Nan Nan, Zhilin Wang, Ziyue Kang, Zhuoru Mo, Ao Li, Yu Pan, Xiaobing Li, Feng Yu, Xiaohong Guan

PDF

1 Repo

TL;DR

SymphonyGen is a hierarchical model for orchestral music generation that combines structural control, harmonic refinement, and perceptual alignment to produce more musical and preferred symphonic outputs.

Contribution

It introduces a novel hierarchical framework with a beat-quantized harmony skeleton and reinforcement learning techniques for improved orchestral music synthesis.

Findings

01

Objective evaluations show improved harmonic cleanliness.

02

Subjective evaluations favor SymphonyGen over baselines.

03

Dissonance-averse sampling reduces tonal clashes.

Abstract

Generating symphonic music requires simultaneously managing high-level structural form and dense, multi-track orchestration. Existing symbolic models often struggle with a "complexity-control imbalance", in which scaling bottlenecks limit long-term granular steerability. We present SymphonyGen, a 3D hierarchical framework for contemporary cinematic orchestration. SymphonyGen employs a cascading decoder architecture that decomposes the Bar, Track, and Event axes, improving computational efficiency and scalability over conventional 1D or 2D models. We introduce "short-score" conditioning via a beat-quantized multi-voice harmony skeleton, enabling outline control while preserving textural diversity. The model is further refined using Group Relative Policy Optimization (GRPO) with a cross-modal audio-perceptual reward, aligning symbolic output with modern acoustic expectations.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://symphonygen.github.io
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.