SUGAR: A Scalable Human-Video-Driven Generalizable Humanoid Loco-Manipulation Learning Framework

Tianshu Wu; Xiangqi Kong; Yue Chen; Qize Yu; Hang Ye; Jia Li; Yizhou Wang; and Hao Dong

arXiv:2605.20373·cs.RO·May 21, 2026

SUGAR: A Scalable Human-Video-Driven Generalizable Humanoid Loco-Manipulation Learning Framework

Tianshu Wu, Xiangqi Kong, Yue Chen, Qize Yu, Hang Ye, Jia Li, Yizhou Wang, and Hao Dong

PDF

1 Repo

TL;DR

SUGAR is a scalable framework that converts human videos into generalizable humanoid loco-manipulation skills without task-specific reward engineering or reference motion conditioning, enabling zero-shot real-world transfer.

Contribution

It introduces a fully automated pipeline and a physics-based refiner to transform human videos into high-fidelity humanoid skills for diverse tasks.

Findings

01

Outperforms reference-tracking baselines in simulation and real-world tasks.

02

Performance improves with more human video data.

03

Achieves zero-shot transfer with reliable execution and failure recovery.

Abstract

Building humanoid robots capable of generalizable whole-body loco-manipulation in the real world remains a fundamental challenge. Existing methods either rely on laborious task-specific reward engineering, rigidly replay reference motions that fail to generalize, or depend on costly teleoperation that limits scalability. While human videos capture diverse human behaviors, motion priors inferred from them are inherently imperfect, suffering from occlusion, contact artifacts, and retargeting errors that render them unsuitable for direct policy learning. To address this, we present SUGAR, a scalable data-driven framework that converts diverse human videos into deployable humanoid loco-manipulation skills, without any task-specific reward engineering or reference-motion conditioning at inference. SUGAR proceeds in three stages. First, a fully automated pipeline extracts kinematic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://tianshuwu.github.io/sugar-humanoid
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.