VLPG-Nav: Object Navigation Using Visual Language Pose Graph and Object   Localization Probability Maps

Senthil Hariharan Arul; Dhruva Kumar; Vivek Sugirtharaj; Richard Kim,; Xuewei (Tony) Qi; Rajasimman Madhivanan; Arnie Sen; Dinesh Manocha

arXiv:2408.08301·cs.RO·August 16, 2024

VLPG-Nav: Object Navigation Using Visual Language Pose Graph and Object Localization Probability Maps

Senthil Hariharan Arul, Dhruva Kumar, Vivek Sugirtharaj, Richard Kim,, Xuewei (Tony) Qi, Rajasimman Madhivanan, Arnie Sen, Dinesh Manocha

PDF

Open Access

TL;DR

VLPG-Nav is a novel robot navigation method that uses visual language pose graphs and object localization probability maps to effectively locate and center objects within household scenes, handling occlusions and localization errors.

Contribution

The paper introduces VLPG-Nav, a new approach combining visual language pose graphs and probability maps for improved object navigation and centering in household environments.

Findings

01

Outperforms baseline methods in object localization accuracy

02

Successfully navigates around occlusions and displacement

03

Effectively centers objects within the camera view in real-world tests

Abstract

We present VLPG-Nav, a visual language navigation method for guiding robots to specified objects within household scenes. Unlike existing methods primarily focused on navigating the robot toward objects, our approach considers the additional challenge of centering the object within the robot's camera view. Our method builds a visual language pose graph (VLPG) that functions as a spatial map of VL embeddings. Given an open vocabulary object query, we plan a viewpoint for object navigation using the VLPG. Despite navigating to the viewpoint, real-world challenges like object occlusion, displacement, and the robot's localization error can prevent visibility. We build an object localization probability map that leverages the robot's current observations and prior VLPG. When the object isn't visible, the probability map is updated and an alternate viewpoint is computed. In addition, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Speech and dialogue systems · Advanced Image and Video Retrieval Techniques