RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling

Bingjie Gao; Qianli Ma; Xiaoxue Wu; Shuai Yang; Guanzhou Lan; Haonan Zhao; Jiaxuan Chen; Qingyang Liu; Yu Qiao; Xinyuan Chen; Yaohui Wang; Li Niu

arXiv:2510.20206·cs.CV·May 15, 2026

RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling

Bingjie Gao, Qianli Ma, Xiaoxue Wu, Shuai Yang, Guanzhou Lan, Haonan Zhao, Jiaxuan Chen, Qingyang Liu, Yu Qiao, Xinyuan Chen, Yaohui Wang, Li Niu

PDF

1 Repo

TL;DR

RAPO++ is a comprehensive prompt optimization framework for text-to-video generation that enhances prompt quality through data alignment, iterative refinement, and LLM fine-tuning, significantly improving output quality across multiple models and benchmarks.

Contribution

It introduces a three-stage prompt optimization approach that unifies data-aligned refinement, test-time scaling, and LLM fine-tuning without altering the generative backbone.

Findings

01

Achieves significant improvements in semantic alignment and video quality.

02

Outperforms existing methods on five benchmarks and five T2V models.

03

Demonstrates the effectiveness of prompt optimization in T2V tasks.

Abstract

Prompt design plays a crucial role in text-to-video (T2V) generation, yet user-provided prompts are often short, unstructured, and misaligned with training data, limiting the generative potential of diffusion-based T2V models. We present \textbf{RAPO++}, a cross-stage prompt optimization framework that unifies training-data--aligned refinement, test-time iterative scaling, and large language model (LLM) fine-tuning to substantially improve T2V generation without modifying the underlying generative backbone. In \textbf{Stage 1}, Retrieval-Augmented Prompt Optimization (RAPO) enriches user prompts with semantically relevant modifiers retrieved from a relation graph and refactors them to match training distributions, enhancing compositionality and multi-object fidelity. \textbf{Stage 2} introduces Sample-Specific Prompt Optimization (SSPO), a closed-loop mechanism that iteratively refines…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Vchitect/RAPO
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.