Robot Navigation Using Physically Grounded Vision-Language Models in   Outdoor Environments

Mohamed Elnoor; Kasun Weerakoon; Gershom Seneviratne; Ruiqi Xian,; Tianrui Guan; Mohamed Khalid M Jaffar; Vignesh Rajagopal; Dinesh Manocha

arXiv:2409.20445·cs.RO·October 1, 2024·2 cites

Robot Navigation Using Physically Grounded Vision-Language Models in Outdoor Environments

Mohamed Elnoor, Kasun Weerakoon, Gershom Seneviratne, Ruiqi Xian,, Tianrui Guan, Mohamed Khalid M Jaffar, Vignesh Rajagopal, Dinesh Manocha

PDF

Open Access

TL;DR

This paper introduces VLM-GroNav, a novel outdoor robot navigation method that combines vision-language models with physical terrain grounding to improve traversability assessment and navigation success.

Contribution

It presents a new approach integrating VLMs with proprioceptive data for real-time terrain understanding and dynamic path planning in outdoor environments.

Findings

01

Up to 50% increase in navigation success rate.

02

Effective handling of deformable and slippery terrains.

03

Validated on both legged and wheeled robots.

Abstract

We present a novel autonomous robot navigation algorithm for outdoor environments that is capable of handling diverse terrain traversability conditions. Our approach, VLM-GroNav, uses vision-language models (VLMs) and integrates them with physical grounding that is used to assess intrinsic terrain properties such as deformability and slipperiness. We use proprioceptive-based sensing, which provides direct measurements of these physical properties, and enhances the overall semantic understanding of the terrains. Our formulation uses in-context learning to ground the VLM's semantic understanding with proprioceptive data to allow dynamic updates of traversability estimates based on the robot's real-time physical interactions with the environment. We use the updated traversability estimations to inform both the local and global planners for real-time trajectory replanning. We validate our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Robotics and Automated Systems