PathPainter: Transferring the Generalization Ability of Image Generation Models to Embodied Navigation
Yijin Wang, Yuru Tian, Xijie Huang, Weiqi Gai, Mo Zhu, Xin Zhou, Yuze Wu, Fei Gao

TL;DR
This paper introduces a navigation system leveraging bird's-eye-view images and foundation models to enhance robot navigation, demonstrating successful long-range outdoor UAV navigation and improved localization.
Contribution
It presents a novel approach that transfers the generalization capabilities of image generation models to embodied navigation using BEV images and cross-view localization.
Findings
UAV successfully completed 160-meter outdoor navigation.
The system effectively interprets natural language to identify destinations.
Cross-view localization reduces odometry drift during navigation.
Abstract
Bird's-eye-view (BEV) images have been widely demonstrated to provide valuable prior information for navigation. Given the global information provided by such views, two key challenges remain: how to fully exploit this information and how to reliably use it during execution. In this paper, we propose a navigation system that uses BEV images as global priors and is designed for ground and near-ground robotic platforms. The system employs an image generation model to interpret human intent from natural language, identify the target destination, and generate traversability masks. During execution, we introduce cross-view localization to align the robot's odometry with the BEV map and mitigate long-term drift in conventional odometry. We conduct extensive benchmark experiments to evaluate the proposed method and further validate it on a UAV platform. Using only a conventional local motion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
