TL;DR
SCMAPR is a multi-agent framework that refines prompts for text-to-video generation in complex scenarios, improving alignment and quality through structured, scenario-aware prompt revision.
Contribution
It introduces a novel multi-agent prompt refinement process and a benchmark for evaluating T2V in complex scenarios, advancing the robustness of text-to-video models.
Findings
SCMAPR improves text-video alignment scores on multiple benchmarks.
The framework achieves up to 3.28% gains in overall quality metrics.
A new complex-scenario T2V benchmark (T2V-Complexity) is proposed.
Abstract
Text-to-Video (T2V) generation has benefited from recent advances in diffusion models, yet current systems still struggle under complex scenarios, which are generally exacerbated by the ambiguity and underspecification of text prompts. In this work, we formulate complex-scenario prompt refinement as a stage-wise multi-agent refinement process and propose SCMAPR, i.e., a scenario-aware and Self-Correcting Multi-Agent Prompt Refinement framework for T2V prompting. SCMAPR coordinates specialized agents to (i) route each prompt to a taxonomy-grounded scenario for strategy selection, (ii) synthesize scenario-aware rewriting policies and perform policy-conditioned refinement, and (iii) conduct structured semantic verification that triggers conditional revision when violations are detected. To clarify what constitutes complex scenarios in T2V prompting, provide representative examples, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
