From Text to Motion: Grounding GPT-4 in a Humanoid Robot "Alter3"
Takahide Yoshida, Atsushi Masumori, Takashi Ikegami

TL;DR
This paper presents Alter3, a humanoid robot that uses GPT-4 to generate spontaneous, pose-based motions and sequences without explicit programming, demonstrating zero-shot learning and verbal feedback adjustments.
Contribution
It introduces a novel method for grounding GPT-4 in humanoid robot control, enabling natural motion generation without task-specific training.
Findings
Alter3 can adopt various poses like 'selfie' and 'ghost' without explicit programming.
The robot demonstrates zero-shot learning capabilities in motion generation.
Verbal feedback effectively adjusts poses without fine-tuning.
Abstract
We report the development of Alter3, a humanoid robot capable of generating spontaneous motion using a Large Language Model (LLM), specifically GPT-4. This achievement was realized by integrating GPT-4 into our proprietary android, Alter3, thereby effectively grounding the LLM with Alter's bodily movement. Typically, low-level robot control is hardware-dependent and falls outside the scope of LLM corpora, presenting challenges for direct LLM-based robot control. However, in the case of humanoid robots like Alter3, direct control is feasible by mapping the linguistic expressions of human actions onto the robot's body through program code. Remarkably, this approach enables Alter3 to adopt various poses, such as a 'selfie' stance or 'pretending to be a ghost,' and generate sequences of actions over time without explicit programming for each body part. This demonstrates the robot's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
