Do You Have Freestyle? Expressive Humanoid Locomotion via Audio Control
Zhe Li, Cheng Chi, Yangyang Wei, Boan Zhu, Tao Huang, Zhenguo Sun, Yibo Peng, Pengwei Wang, Zhongyuan Wang, Fangzhou Liu, Chang Xu, Shanghang Zhang

TL;DR
This paper introduces RoboPerform, a novel unified framework that directly generates expressive humanoid locomotion from audio, enabling robots to perform dance and gestures with low latency and high fidelity, without explicit motion reconstruction.
Contribution
RoboPerform is the first framework to directly map audio to humanoid motion using a retargeting-free approach, combining a ResMoE teacher policy and diffusion-based student policy.
Findings
Achieves high physical plausibility in generated motions
Ensures low latency for real-time audio-driven control
Successfully aligns robot movements with audio content
Abstract
Humans intuitively move to sound, but current humanoid robots lack expressive improvisational capabilities, confined to predefined motions or sparse commands. Generating motion from audio and then retargeting it to robots relies on explicit motion reconstruction, leading to cascaded errors, high latency, and disjointed acoustic-actuation mapping. We propose RoboPerform, the first unified audio-to-locomotion framework that can directly generate music-driven dance and speech-driven co-speech gestures from audio. Guided by the core principle of "motion = content + style", the framework treats audio as implicit style signals and eliminates the need for explicit motion reconstruction. RoboPerform integrates a ResMoE teacher policy for adapting to diverse motion patterns and a diffusion-based student policy for audio style injection. This retargeting-free design ensures low latency and high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Robot Interaction and HRI · Music Technology and Sound Studies · Human Motion and Animation
