Commonsense Reasoning for Legged Robot Adaptation with Vision-Language   Models

Annie S. Chen; Alec M. Lessing; Andy Tang; Govind Chada; Laura Smith,; Sergey Levine; Chelsea Finn

arXiv:2407.02666·cs.RO·July 4, 2024·2 cites

Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models

Annie S. Chen, Alec M. Lessing, Andy Tang, Govind Chada, Laura Smith,, Sergey Levine, Chelsea Finn

PDF

Open Access

TL;DR

This paper introduces VLM-PC, a system leveraging vision-language models for adaptive, commonsense reasoning to improve legged robot navigation in complex, unpredictable environments without heavy human supervision.

Contribution

The paper presents VLM-PC, a novel approach combining in-context adaptation and multi-skill planning using VLMs to enhance robot generalization and autonomous decision-making in challenging scenarios.

Findings

01

VLM-PC enables robots to navigate complex obstacle courses autonomously.

02

The system improves handling of unexpected and ambiguous situations.

03

Experiments demonstrate successful real-world deployment on a quadruped robot.

Abstract

Legged robots are physically capable of navigating a diverse variety of environments and overcoming a wide range of obstructions. For example, in a search and rescue mission, a legged robot could climb over debris, crawl through gaps, and navigate out of dead ends. However, the robot's controller needs to respond intelligently to such varied obstacles, and this requires handling unexpected and unusual scenarios successfully. This presents an open challenge to current learning methods, which often struggle with generalization to the long tail of unexpected situations without heavy human supervision. To address this issue, we investigate how to leverage the broad knowledge about the structure of the world and commonsense reasoning capabilities of vision-language models (VLMs) to aid legged robots in handling difficult, ambiguous situations. We propose a system, VLM-Predictive Control…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robotic Path Planning Algorithms · Advanced Image and Video Retrieval Techniques