Language to Rewards for Robotic Skill Synthesis

Wenhao Yu; Nimrod Gileadi; Chuyuan Fu; Sean Kirmani; Kuang-Huei Lee,; Montse Gonzalez Arenas; Hao-Tien Lewis Chiang; Tom Erez; Leonard Hasenclever,; Jan Humplik; Brian Ichter; Ted Xiao; Peng Xu; Andy Zeng; Tingnan Zhang,; Nicolas Heess; Dorsa Sadigh; Jie Tan; Yuval Tassa; Fei Xia

arXiv:2306.08647·cs.RO·June 21, 2023·38 cites

Language to Rewards for Robotic Skill Synthesis

Wenhao Yu, Nimrod Gileadi, Chuyuan Fu, Sean Kirmani, Kuang-Huei Lee,, Montse Gonzalez Arenas, Hao-Tien Lewis Chiang, Tom Erez, Leonard Hasenclever,, Jan Humplik, Brian Ichter, Ted Xiao, Peng Xu, Andy Zeng, Tingnan Zhang,, Nicolas Heess, Dorsa Sadigh, Jie Tan, Yuval Tassa, Fei Xia

PDF

Open Access

TL;DR

This paper introduces a novel approach where large language models define reward functions to enable flexible, interactive robotic skill synthesis, bridging high-level instructions with low-level control in simulation and real robots.

Contribution

The work presents a new paradigm using LLMs to generate reward parameters for robotic control, improving task success rates and enabling real-time interactive behavior creation.

Findings

01

Achieved 90% success on 17 simulated robotic tasks.

02

Outperformed baseline primitive skill methods which succeeded in 50%.

03

Validated on real robot arm with complex manipulation skills.

Abstract

Large language models (LLMs) have demonstrated exciting progress in acquiring diverse new capabilities through in-context learning, ranging from logical reasoning to code-writing. Robotics researchers have also explored using LLMs to advance the capabilities of robotic control. However, since low-level robot actions are hardware-dependent and underrepresented in LLM training corpora, existing efforts in applying LLMs to robotics have largely treated LLMs as semantic planners or relied on human-engineered control primitives to interface with the robot. On the other hand, reward functions are shown to be flexible representations that can be optimized for control policies to achieve diverse tasks, while their semantic richness makes them suitable to be specified by LLMs. In this work, we introduce a new paradigm that harnesses this realization by utilizing LLMs to define reward parameters…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications