LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action
Dhruv Shah, Blazej Osinski, Brian Ichter, Sergey Levine

TL;DR
LM-Nav enables natural language-guided robotic navigation in complex outdoor environments by leveraging large pre-trained models for language, vision, and action without requiring task-specific fine-tuning or annotated datasets.
Contribution
The paper introduces LM-Nav, a novel system that uses pre-trained models for navigation, image-language association, and language modeling to facilitate natural language-guided robot navigation without fine-tuning.
Findings
Successfully navigates complex outdoor environments from natural language instructions.
Operates on a real-world mobile robot without fine-tuning or annotated data.
Demonstrates long-horizon navigation capabilities.
Abstract
Goal-conditioned policies for robotic navigation can be trained on large, unannotated datasets, providing for good generalization to real-world settings. However, particularly in vision-based settings where specifying goals requires an image, this makes for an unnatural interface. Language provides a more convenient modality for communication with robots, but contemporary methods typically require expensive supervision, in the form of trajectories annotated with language descriptions. We present a system, LM-Nav, for robotic navigation that enjoys the benefits of training on unannotated large datasets of trajectories, while still providing a high-level interface to the user. Instead of utilizing a labeled instruction following dataset, we show that such a system can be constructed entirely out of pre-trained models for navigation (ViNG), image-language association (CLIP), and language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning
