Sim-and-Human Co-training for Data-Efficient and Generalizable Robotic Manipulation

Kaipeng Fang; Weiqing Liang; Yuyang Li; Ji Zhang; Pengpeng Zeng; Lianli Gao; Jingkuan Song; Heng Tao Shen

arXiv:2601.19406·cs.RO·January 28, 2026

Sim-and-Human Co-training for Data-Efficient and Generalizable Robotic Manipulation

Kaipeng Fang, Weiqing Liang, Yuyang Li, Ji Zhang, Pengpeng Zeng, Lianli Gao, Jingkuan Song, Heng Tao Shen

PDF

Open Access

TL;DR

This paper introduces SimHum, a co-training framework that leverages synthetic simulation data and real-world human observations to improve robotic manipulation, achieving higher efficiency and generalization with less data.

Contribution

The work presents a novel co-training approach combining simulation and human data to enhance robot policy generalization and data efficiency in manipulation tasks.

Findings

01

Outperforms baseline by up to 40% under same data budget

02

Achieves 62.5% OOD success with only 80 real data points

03

Outperforms real-only baseline by 7.1 times

Abstract

Synthetic simulation data and real-world human data provide scalable alternatives to circumvent the prohibitive costs of robot data collection. However, these sources suffer from the sim-to-real visual gap and the human-to-robot embodiment gap, respectively, which limits the policy's generalization to real-world scenarios. In this work, we identify a natural yet underexplored complementarity between these sources: simulation offers the robot action that human data lacks, while human data provides the real-world observation that simulation struggles to render. Motivated by this insight, we present SimHum, a co-training framework to simultaneously extract kinematic prior from simulated robot actions and visual prior from real-world human observations. Based on the two complementary priors, we achieve data-efficient and generalizable robotic manipulation in real-world tasks. Empirically,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · Multimodal Machine Learning Applications