AllocMV: Optimal Resource Allocation for Music Video Generation via Structured Persistent State
Huimin Wang, Leilei Ouyang, Chang Xia, Yongqi Kang, Yu Fu, Yuqi Ouyang

TL;DR
AllocMV is a hierarchical framework that optimally allocates resources for long-horizon music video generation, balancing quality and computational costs through structured persistent state and dynamic programming.
Contribution
It introduces a novel resource allocation approach formulated as a Multiple-Choice Knapsack Problem, enabling efficient and consistent music video synthesis.
Findings
AllocMV achieves an optimal trade-off between quality and resource use.
The framework effectively maintains cross-shot consistency and motif continuity.
It reduces computational costs while preserving visual quality.
Abstract
Generating long-horizon music videos (MVs) is frequently constrained by prohibitive computational costs and difficulty maintaining cross-shot consistency. We propose AllocMV, a hierarchical framework formulating music video synthesis as a Multiple-Choice Knapsack Problem (MCKP). AllocMV represents the video's persistent state as a compact, structured object comprising character entities, scene priors, and sharing graphs, produced by a global planner prior to realization. By estimating segment saliency from multimodal cues, a group-level MCKP solver based on dynamic programming optimally allocates resources across High-Gen, Mid-Gen, and Reuse branches. For repetitive musical motifs, we implement a divergence-based forking strategy that reuses visual prefixes to reduce costs while ensuring motif-level continuity. Evaluated via the Cost-Quality Ratio (CQR), AllocMV achieves an optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
