Semantic Co-Speech Gesture Synthesis and Real-Time Control for Humanoid Robots

Gang Zhang

arXiv:2512.17183·cs.RO·December 22, 2025

Semantic Co-Speech Gesture Synthesis and Real-Time Control for Humanoid Robots

Gang Zhang

PDF

Open Access

TL;DR

This paper introduces a comprehensive system that generates semantically meaningful co-speech gestures and controls a humanoid robot in real-time, enhancing natural non-verbal communication capabilities.

Contribution

It presents a novel end-to-end framework combining gesture synthesis from speech with real-time robot control, integrating large language models, Motion-GPT, and a robust retargeting method.

Findings

01

Gestures are semantically appropriate and expressive.

02

The robot accurately executes complex, synchronized motions.

03

The system operates in real-time with high fidelity.

Abstract

We present an innovative end-to-end framework for synthesizing semantically meaningful co-speech gestures and deploying them in real-time on a humanoid robot. This system addresses the challenge of creating natural, expressive non-verbal communication for robots by integrating advanced gesture generation techniques with robust physical control. Our core innovation lies in the meticulous integration of a semantics-aware gesture synthesis module, which derives expressive reference motions from speech input by leveraging a generative retrieval mechanism based on large language models (LLMs) and an autoregressive Motion-GPT model. This is coupled with a high-fidelity imitation learning control policy, the MotionTracker, which enables the Unitree G1 humanoid robot to execute these complex motions dynamically and maintain balance. To ensure feasibility, we employ a robust General Motion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSocial Robot Interaction and HRI · Human Motion and Animation · Robot Manipulation and Learning