Learning Category-level Last-meter Navigation from RGB Demonstrations of a Single-instance
Tzu-Hsien Lee, Fidan Mahmudova, Karthik Desingh

TL;DR
This paper presents a novel RGB-based imitation learning approach for precise last-meter navigation of a mobile manipulator, enabling accurate object positioning without depth or map priors, and generalizing across unseen objects and environments.
Contribution
It introduces an object-centric imitation learning framework conditioned on goal images, multi-view RGB observations, and text prompts, with explicit object grounding and pose reasoning, for category-level last-meter navigation.
Findings
Achieves 73.47% success in edge-alignment
Achieves 96.94% success in object-alignment
Generalizes to unseen objects across diverse environments
Abstract
Achieving precise positioning of the mobile manipulator's base is essential for successful manipulation actions that follow. Most of the RGB-based navigation systems only guarantee coarse, meter-level accuracy, making them less suitable for the precise positioning phase of mobile manipulation. This gap prevents manipulation policies from operating within the distribution of their training demonstrations, resulting in frequent execution failures. We address this gap by introducing an object-centric imitation learning framework for last-meter navigation, enabling a quadruped mobile manipulator robot to achieve manipulation-ready positioning using only RGB observations from its onboard cameras. Our method conditions the navigation policy on three inputs: goal images, multi-view RGB observations from the onboard cameras, and a text prompt specifying the target object. A language-driven…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Robotic Path Planning Algorithms · Multimodal Machine Learning Applications
