Language Models as Zero-Shot Trajectory Generators
Teyun Kwon, Norman Di Palo, Edward Johns

TL;DR
This paper demonstrates that GPT-4 can directly generate dense low-level robot trajectories for manipulation tasks using only vision models and a simple prompt, challenging previous assumptions about LLM limitations in robotics.
Contribution
It shows that LLMs like GPT-4 can produce low-level control trajectories for robots without specialized training or external optimizers, using a task-agnostic prompt.
Findings
GPT-4 successfully predicts trajectories for 30 real-world tasks
LLMs can detect failures and re-plan trajectories
A simple prompt suffices without in-context examples
Abstract
Large Language Models (LLMs) have recently shown promise as high-level planners for robots when given access to a selection of low-level skills. However, it is often assumed that LLMs do not possess sufficient knowledge to be used for the low-level trajectories themselves. In this work, we address this assumption thoroughly, and investigate if an LLM (GPT-4) can directly predict a dense sequence of end-effector poses for manipulation tasks, when given access to only object detection and segmentation vision models. We designed a single, task-agnostic prompt, without any in-context examples, motion primitives, or external trajectory optimisers. Then we studied how well it can perform across 30 real-world language-based tasks, such as "open the bottle cap" and "wipe the plate with the sponge", and we investigated which design choices in this prompt are the most important. Our conclusions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Robot Manipulation and Learning
