Language-Conditioned Offline RL for Multi-Robot Navigation

Steven Morad; Ajay Shankar; Jan Blumenkamp; Amanda Prorok

arXiv:2407.20164·cs.RO·July 30, 2024·1 cites

Language-Conditioned Offline RL for Multi-Robot Navigation

Steven Morad, Ajay Shankar, Jan Blumenkamp, Amanda Prorok

PDF

Open Access

TL;DR

This paper introduces a method for multi-robot navigation using language-conditioned policies trained via offline reinforcement learning, enabling real-world deployment with minimal data and no simulation.

Contribution

It leverages pretrained Large Language Models to interpret natural language instructions and trains navigation policies offline, avoiding the need for simulators or environment models.

Findings

01

Policies generalize to unseen commands

02

Effective with only 20 minutes of data

03

No fine-tuning required for deployment

Abstract

We present a method for developing navigation policies for multi-robot teams that interpret and follow natural language instructions. We condition these policies on embeddings from pretrained Large Language Models (LLMs), and train them via offline reinforcement learning with as little as 20 minutes of randomly-collected data. Experiments on a team of five real robots show that these policies generalize well to unseen commands, indicating an understanding of the LLM latent space. Our method requires no simulators or environment models, and produces low-latency control policies that can be deployed directly to real robots without finetuning. We provide videos of our experiments at https://sites.google.com/view/llm-marl.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems · Robotic Path Planning Algorithms