Learning from Massive Human Videos for Universal Humanoid Pose Control

Jiageng Mao; Siheng Zhao; Siqi Song; Tianheng Shi; Junjie Ye; Mingtong; Zhang; Haoran Geng; Jitendra Malik; Vitor Guizilini; Yue Wang

arXiv:2412.14172·cs.RO·December 19, 2024

Learning from Massive Human Videos for Universal Humanoid Pose Control

Jiageng Mao, Siheng Zhao, Siqi Song, Tianheng Shi, Junjie Ye, Mingtong, Zhang, Haoran Geng, Jitendra Malik, Vitor Guizilini, Yue Wang

PDF

Open Access 1 Models 2 Datasets

TL;DR

This paper presents Humanoid-X, a large-scale dataset of human poses and descriptions, enabling a humanoid robot model to learn from massive human videos for improved generalization in control tasks.

Contribution

Introduction of Humanoid-X dataset and UH-1 model, leveraging internet-sourced human videos for scalable, generalizable humanoid robot control through text instructions.

Findings

01

UH-1 outperforms existing models in generalization tasks

02

Humanoid-X enables effective real-world deployment

03

Scalable training improves adaptability of humanoid robots

Abstract

Scalable learning of humanoid robots is crucial for their deployment in real-world applications. While traditional approaches primarily rely on reinforcement learning or teleoperation to achieve whole-body control, they are often limited by the diversity of simulated environments and the high costs of demonstration collection. In contrast, human videos are ubiquitous and present an untapped source of semantic and motion information that could significantly enhance the generalization capabilities of humanoid robots. This paper introduces Humanoid-X, a large-scale dataset of over 20 million humanoid robot poses with corresponding text-based motion descriptions, designed to leverage this abundant data. Humanoid-X is curated through a comprehensive pipeline: data mining from the Internet, video caption generation, motion retargeting of humans to humanoid robots, and policy learning for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
USC-PSI-Lab/UH-1
model· ♡ 8
♡ 8

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Robot Manipulation and Learning