Language Models as Zero-Shot Trajectory Generators

Teyun Kwon; Norman Di Palo; Edward Johns

arXiv:2310.11604·cs.RO·June 19, 2024·1 cites

Language Models as Zero-Shot Trajectory Generators

Teyun Kwon, Norman Di Palo, Edward Johns

PDF

Open Access

TL;DR

This paper demonstrates that GPT-4 can directly generate dense low-level robot trajectories for manipulation tasks using only vision models and a simple prompt, challenging previous assumptions about LLM limitations in robotics.

Contribution

It shows that LLMs like GPT-4 can produce low-level control trajectories for robots without specialized training or external optimizers, using a task-agnostic prompt.

Findings

01

GPT-4 successfully predicts trajectories for 30 real-world tasks

02

LLMs can detect failures and re-plan trajectories

03

A simple prompt suffices without in-context examples

Abstract

Large Language Models (LLMs) have recently shown promise as high-level planners for robots when given access to a selection of low-level skills. However, it is often assumed that LLMs do not possess sufficient knowledge to be used for the low-level trajectories themselves. In this work, we address this assumption thoroughly, and investigate if an LLM (GPT-4) can directly predict a dense sequence of end-effector poses for manipulation tasks, when given access to only object detection and segmentation vision models. We designed a single, task-agnostic prompt, without any in-context examples, motion primitives, or external trajectory optimisers. Then we studied how well it can perform across 30 real-world language-based tasks, such as "open the bottle cap" and "wipe the plate with the sponge", and we investigated which design choices in this prompt are the most important. Our conclusions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Robot Manipulation and Learning