DreamToNav: Generalizable Navigation for Robots via Generative Video Planning
Valerii Serpiva, Jeffrin Sam, Chidera Simon, Hajira Amjad, Iana Zhura, Artem Lykov, and Dzmitry Tsetserukou

TL;DR
DreamToNav introduces a robot navigation framework that uses generative video models to translate natural language prompts into executable paths, enabling flexible, human-in-the-loop control without task-specific engineering.
Contribution
It leverages generative video models for planning and control, allowing robots to 'dream' behaviors from natural language instructions, a novel approach in robot navigation.
Findings
Achieved 76.7% success rate in indoor navigation tasks.
Generated trajectories had errors typically below 0.15 meters.
Demonstrated cross-platform applicability on wheeled and quadruped robots.
Abstract
We present DreamToNav, a novel autonomous robot framework that uses generative video models to enable intuitive, human-in-the-loop control. Instead of relying on rigid waypoint navigation, users provide natural language prompts (e.g. ``Follow the person carefully''), which the system translates into executable motion. Our pipeline first employs Qwen 2.5-VL-7B-Instruct to refine vague user instructions into precise visual descriptions. These descriptions condition NVIDIA Cosmos 2.5, a state-of-the-art video foundation model, to synthesize a physically consistent video sequence of the robot performing the task. From this synthetic video, we extract a valid kinematic path using visual pose estimation, robot detection and trajectory recovery. By treating video generation as a planning engine, DreamToNav allows robots to visually "dream" complex behaviors before executing them, providing a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Path Planning Algorithms · Human Motion and Animation · Social Robot Interaction and HRI
