Prompt, Plan, Perform: LLM-based Humanoid Control via Quantized   Imitation Learning

Jingkai Sun; Qiang Zhang; Yiqun Duan; Xiaoyang Jiang; Chong Cheng and; Renjing Xu

arXiv:2309.11359·cs.RO·August 1, 2024·1 cites

Prompt, Plan, Perform: LLM-based Humanoid Control via Quantized Imitation Learning

Jingkai Sun, Qiang Zhang, Yiqun Duan, Xiaoyang Jiang, Chong Cheng and, Renjing Xu

PDF

Open Access

TL;DR

This paper introduces a novel humanoid robot control framework combining adversarial imitation learning, large language models as planners, and vector quantization to enable zero-shot task execution with a single policy.

Contribution

It presents the first framework integrating LLMs as strategic planners with a unified policy for humanoid control, enhancing adaptability and reusability.

Findings

01

Efficient zero-shot task execution demonstrated in experiments.

02

Single policy network effectively controls complex humanoid motions.

03

Incorporation of vector quantization improves action generation for unseen commands.

Abstract

In recent years, reinforcement learning and imitation learning have shown great potential for controlling humanoid robots' motion. However, these methods typically create simulation environments and rewards for specific tasks, resulting in the requirements of multiple policies and limited capabilities for tackling complex and unknown tasks. To overcome these issues, we present a novel approach that combines adversarial imitation learning with large language models (LLMs). This innovative method enables the agent to learn reusable skills with a single policy and solve zero-shot tasks under the guidance of LLMs. In particular, we utilize the LLM as a strategic planner for applying previously learned skills to novel tasks through the comprehension of task-specific prompts. This empowers the robot to perform the specified actions in a sequence. To improve our model, we incorporate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition