PathPainter: Transferring the Generalization Ability of Image Generation Models to Embodied Navigation

Yijin Wang; Yuru Tian; Xijie Huang; Weiqi Gai; Mo Zhu; Xin Zhou; Yuze Wu; Fei Gao

arXiv:2605.07496·cs.RO·May 11, 2026

PathPainter: Transferring the Generalization Ability of Image Generation Models to Embodied Navigation

Yijin Wang, Yuru Tian, Xijie Huang, Weiqi Gai, Mo Zhu, Xin Zhou, Yuze Wu, Fei Gao

PDF

TL;DR

This paper introduces a navigation system leveraging bird's-eye-view images and foundation models to enhance robot navigation, demonstrating successful long-range outdoor UAV navigation and improved localization.

Contribution

It presents a novel approach that transfers the generalization capabilities of image generation models to embodied navigation using BEV images and cross-view localization.

Findings

01

UAV successfully completed 160-meter outdoor navigation.

02

The system effectively interprets natural language to identify destinations.

03

Cross-view localization reduces odometry drift during navigation.

Abstract

Bird's-eye-view (BEV) images have been widely demonstrated to provide valuable prior information for navigation. Given the global information provided by such views, two key challenges remain: how to fully exploit this information and how to reliably use it during execution. In this paper, we propose a navigation system that uses BEV images as global priors and is designed for ground and near-ground robotic platforms. The system employs an image generation model to interpret human intent from natural language, identify the target destination, and generate traversability masks. During execution, we introduce cross-view localization to align the robot's odometry with the BEV map and mitigate long-term drift in conventional odometry. We conduct extensive benchmark experiments to evaluate the proposed method and further validate it on a UAV platform. Using only a conventional local motion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.