Grounding Generated Videos in Feasible Plans via World Models

Christos Ziakas; Amir Bar; Alessandra Russo

arXiv:2602.01960·cs.LG·March 17, 2026

Grounding Generated Videos in Feasible Plans via World Models

Christos Ziakas, Amir Bar, Alessandra Russo

PDF

Open Access

TL;DR

This paper introduces GVP-WM, a planning method that grounds video-generated plans into feasible action sequences using world models, improving physical and temporal consistency in generated videos.

Contribution

The paper proposes a novel approach to ground video plans into feasible actions via world models, enabling physically consistent long-horizon planning from zero-shot video generation.

Findings

01

Recovers feasible plans from zero-shot generated videos.

02

Improves temporal and physical consistency in video plans.

03

Effective in navigation and manipulation tasks.

Abstract

Large-scale video generative models have shown emerging capabilities as zero-shot visual planners, yet video-generated plans often violate temporal consistency and physical constraints, leading to failures when mapped to executable actions. To address this, we propose Grounding Video Plans with World Models (GVP-WM), a planning method that grounds video-generated plans into feasible action sequences using a learned action-conditioned world model. At test-time, GVP-WM first generates a video plan from initial and goal observations, then projects the video guidance onto the manifold of dynamically feasible latent trajectories via video-guided latent collocation. In particular, we formulate grounding as a goal-conditioned latent-space trajectory optimization problem that jointly optimizes latent states and actions under world-model dynamics, while preserving semantic alignment with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robotic Path Planning Algorithms · AI-based Problem Solving and Planning