VLPG-Nav: Object Navigation Using Visual Language Pose Graph and Object Localization Probability Maps
Senthil Hariharan Arul, Dhruva Kumar, Vivek Sugirtharaj, Richard Kim,, Xuewei (Tony) Qi, Rajasimman Madhivanan, Arnie Sen, Dinesh Manocha

TL;DR
VLPG-Nav is a novel robot navigation method that uses visual language pose graphs and object localization probability maps to effectively locate and center objects within household scenes, handling occlusions and localization errors.
Contribution
The paper introduces VLPG-Nav, a new approach combining visual language pose graphs and probability maps for improved object navigation and centering in household environments.
Findings
Outperforms baseline methods in object localization accuracy
Successfully navigates around occlusions and displacement
Effectively centers objects within the camera view in real-world tests
Abstract
We present VLPG-Nav, a visual language navigation method for guiding robots to specified objects within household scenes. Unlike existing methods primarily focused on navigating the robot toward objects, our approach considers the additional challenge of centering the object within the robot's camera view. Our method builds a visual language pose graph (VLPG) that functions as a spatial map of VL embeddings. Given an open vocabulary object query, we plan a viewpoint for object navigation using the VLPG. Despite navigating to the viewpoint, real-world challenges like object occlusion, displacement, and the robot's localization error can prevent visibility. We build an object localization probability map that leverages the robot's current observations and prior VLPG. When the object isn't visible, the probability map is updated and an alternate viewpoint is computed. In addition, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Speech and dialogue systems · Advanced Image and Video Retrieval Techniques
