DreamToNav: Generalizable Navigation for Robots via Generative Video Planning

Valerii Serpiva; Jeffrin Sam; Chidera Simon; Hajira Amjad; Iana Zhura; Artem Lykov; and Dzmitry Tsetserukou

arXiv:2603.06190·cs.RO·March 9, 2026

DreamToNav: Generalizable Navigation for Robots via Generative Video Planning

Valerii Serpiva, Jeffrin Sam, Chidera Simon, Hajira Amjad, Iana Zhura, Artem Lykov, and Dzmitry Tsetserukou

PDF

Open Access

TL;DR

DreamToNav introduces a robot navigation framework that uses generative video models to translate natural language prompts into executable paths, enabling flexible, human-in-the-loop control without task-specific engineering.

Contribution

It leverages generative video models for planning and control, allowing robots to 'dream' behaviors from natural language instructions, a novel approach in robot navigation.

Findings

01

Achieved 76.7% success rate in indoor navigation tasks.

02

Generated trajectories had errors typically below 0.15 meters.

03

Demonstrated cross-platform applicability on wheeled and quadruped robots.

Abstract

We present DreamToNav, a novel autonomous robot framework that uses generative video models to enable intuitive, human-in-the-loop control. Instead of relying on rigid waypoint navigation, users provide natural language prompts (e.g. ``Follow the person carefully''), which the system translates into executable motion. Our pipeline first employs Qwen 2.5-VL-7B-Instruct to refine vague user instructions into precise visual descriptions. These descriptions condition NVIDIA Cosmos 2.5, a state-of-the-art video foundation model, to synthesize a physically consistent video sequence of the robot performing the task. From this synthetic video, we extract a valid kinematic path using visual pose estimation, robot detection and trajectory recovery. By treating video generation as a planning engine, DreamToNav allows robots to visually "dream" complex behaviors before executing them, providing a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotic Path Planning Algorithms · Human Motion and Animation · Social Robot Interaction and HRI