LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language,   Vision, and Action

Dhruv Shah; Blazej Osinski; Brian Ichter; Sergey Levine

arXiv:2207.04429·cs.RO·July 27, 2022·80 cites

LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action

Dhruv Shah, Blazej Osinski, Brian Ichter, Sergey Levine

PDF

Open Access 1 Repo

TL;DR

LM-Nav enables natural language-guided robotic navigation in complex outdoor environments by leveraging large pre-trained models for language, vision, and action without requiring task-specific fine-tuning or annotated datasets.

Contribution

The paper introduces LM-Nav, a novel system that uses pre-trained models for navigation, image-language association, and language modeling to facilitate natural language-guided robot navigation without fine-tuning.

Findings

01

Successfully navigates complex outdoor environments from natural language instructions.

02

Operates on a real-world mobile robot without fine-tuning or annotated data.

03

Demonstrates long-horizon navigation capabilities.

Abstract

Goal-conditioned policies for robotic navigation can be trained on large, unannotated datasets, providing for good generalization to real-world settings. However, particularly in vision-based settings where specifying goals requires an image, this makes for an unnatural interface. Language provides a more convenient modality for communication with robots, but contemporary methods typically require expensive supervision, in the form of trajectories annotated with language descriptions. We present a system, LM-Nav, for robotic navigation that enjoys the benefits of training on unannotated large datasets of trajectories, while still providing a high-level interface to the user. Instead of utilizing a labeled instruction following dataset, we show that such a system can be constructed entirely out of pre-trained models for navigation (ViNG), image-language association (CLIP), and language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

blazejosinski/lm_nav
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning