Planning with Sketch-Guided Verification for Physics-Aware Video Generation
Yidong Huang, Zun Wang, Han Lin, Dong-Ki Kim, Shayegan Omidshafiei, Jaehong Yoon, Yue Zhang, Mohit Bansal

TL;DR
This paper introduces SketchVerify, a planning framework for physics-aware video generation that uses sketch-based verification to produce more physically plausible and instruction-consistent motions efficiently before full video synthesis.
Contribution
The paper presents a training-free, test-time sketch-verification method that improves motion planning quality in video generation by ranking and refining candidate trajectories with a lightweight verification process.
Findings
Significantly improves motion quality and physical realism in generated videos.
Achieves better long-term consistency compared to baseline methods.
Enhances efficiency by avoiding expensive diffusion-based synthesis during planning.
Abstract
Recent video generation approaches increasingly rely on planning intermediate control signals such as object trajectories to improve temporal coherence and motion fidelity. However, these methods mostly employ single-shot plans that are typically limited to simple motions, or iterative refinement which requires multiple calls to the video generator, incuring high computational cost. To overcome these limitations, we propose SketchVerify, a training-free, sketch-verification-based planning framework that improves motion planning quality with more dynamically coherent trajectories (i.e., physically plausible and instruction-consistent motions) prior to full video generation by introducing a test-time sampling and verification loop. Given a prompt and a reference image, our method predicts multiple candidate motion plans and ranks them using a vision-language verifier that jointly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Generative Adversarial Networks and Image Synthesis · Robot Manipulation and Learning
