TL;DR
SymphonyGen is a hierarchical model for orchestral music generation that combines structural control, harmonic refinement, and perceptual alignment to produce more musical and preferred symphonic outputs.
Contribution
It introduces a novel hierarchical framework with a beat-quantized harmony skeleton and reinforcement learning techniques for improved orchestral music synthesis.
Findings
Objective evaluations show improved harmonic cleanliness.
Subjective evaluations favor SymphonyGen over baselines.
Dissonance-averse sampling reduces tonal clashes.
Abstract
Generating symphonic music requires simultaneously managing high-level structural form and dense, multi-track orchestration. Existing symbolic models often struggle with a "complexity-control imbalance", in which scaling bottlenecks limit long-term granular steerability. We present SymphonyGen, a 3D hierarchical framework for contemporary cinematic orchestration. SymphonyGen employs a cascading decoder architecture that decomposes the Bar, Track, and Event axes, improving computational efficiency and scalability over conventional 1D or 2D models. We introduce "short-score" conditioning via a beat-quantized multi-voice harmony skeleton, enabling outline control while preserving textural diversity. The model is further refined using Group Relative Policy Optimization (GRPO) with a cross-modal audio-perceptual reward, aligning symbolic output with modern acoustic expectations.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
