RoboWheel: A Data Engine from Real-World Human Demonstrations for Cross-Embodiment Robotic Learning

Yuhong Zhang; Zihan Gao; Shengpeng Li; Ling-Hao Chen; Kaisheng Liu; Runqing Cheng; Xiao Lin; Junjia Liu; Zhuoheng Li; Jingyi Feng; Ziyan He; Jintian Lin; Zheyan Huang; Zhifang Liu; Haoqian Wang

arXiv:2512.02729·cs.RO·December 3, 2025

RoboWheel: A Data Engine from Real-World Human Demonstrations for Cross-Embodiment Robotic Learning

Yuhong Zhang, Zihan Gao, Shengpeng Li, Ling-Hao Chen, Kaisheng Liu, Runqing Cheng, Xiao Lin, Junjia Liu, Zhuoheng Li, Jingyi Feng, Ziyan He, Jintian Lin, Zheyan Huang, Zhifang Liu, Haoqian Wang

PDF

Open Access

TL;DR

RoboWheel converts human interaction videos into versatile, cross-embodiment robotic training data using a novel reconstruction and retargeting pipeline, validated on vision-language and imitation learning tasks.

Contribution

It introduces a comprehensive end-to-end pipeline for transforming human demonstration videos into scalable, cross-embodiment robotic training data with physical plausibility and domain randomization.

Findings

01

Trajectories are as stable as teleoperation data.

02

Comparable performance gains in imitation learning.

03

First quantitative evidence of HOI as effective supervision.

Abstract

We introduce Robowheel, a data engine that converts human hand object interaction (HOI) videos into training-ready supervision for cross morphology robotic learning. From monocular RGB or RGB-D inputs, we perform high precision HOI reconstruction and enforce physical plausibility via a reinforcement learning (RL) optimizer that refines hand object relative poses under contact and penetration constraints. The reconstructed, contact rich trajectories are then retargeted to cross-embodiments, robot arms with simple end effectors, dexterous hands, and humanoids, yielding executable actions and rollouts. To scale coverage, we build a simulation-augmented framework on Isaac Sim with diverse domain randomization (embodiments, trajectories, object retrieval, background textures, hand motion mirroring), which enriches the distributions of trajectories and observations while preserving spatial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Multimodal Machine Learning Applications