Sim-and-Human Co-training for Data-Efficient and Generalizable Robotic Manipulation
Kaipeng Fang, Weiqing Liang, Yuyang Li, Ji Zhang, Pengpeng Zeng, Lianli Gao, Jingkuan Song, Heng Tao Shen

TL;DR
This paper introduces SimHum, a co-training framework that leverages synthetic simulation data and real-world human observations to improve robotic manipulation, achieving higher efficiency and generalization with less data.
Contribution
The work presents a novel co-training approach combining simulation and human data to enhance robot policy generalization and data efficiency in manipulation tasks.
Findings
Outperforms baseline by up to 40% under same data budget
Achieves 62.5% OOD success with only 80 real data points
Outperforms real-only baseline by 7.1 times
Abstract
Synthetic simulation data and real-world human data provide scalable alternatives to circumvent the prohibitive costs of robot data collection. However, these sources suffer from the sim-to-real visual gap and the human-to-robot embodiment gap, respectively, which limits the policy's generalization to real-world scenarios. In this work, we identify a natural yet underexplored complementarity between these sources: simulation offers the robot action that human data lacks, while human data provides the real-world observation that simulation struggles to render. Motivated by this insight, we present SimHum, a co-training framework to simultaneously extract kinematic prior from simulated robot actions and visual prior from real-world human observations. Based on the two complementary priors, we achieve data-efficient and generalizable robotic manipulation in real-world tasks. Empirically,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · Multimodal Machine Learning Applications
