DemoBot: Efficient Learning of Bimanual Manipulation with Dexterous Hands From Third-Person Human Videos

Yucheng Xu; Xiaofeng Mao; Elle Miller; Xinyu Yi; Yang Li; Zhibin Li; Robert B. Fisher

arXiv:2601.01651·cs.RO·January 6, 2026

DemoBot: Efficient Learning of Bimanual Manipulation with Dexterous Hands From Third-Person Human Videos

Yucheng Xu, Xiaofeng Mao, Elle Miller, Xinyu Yi, Yang Li, Zhibin Li, Robert B. Fisher

PDF

Open Access

TL;DR

DemoBot introduces a scalable framework that learns complex bimanual manipulation skills from a single human video demonstration, combining structured motion extraction with reinforcement learning enhancements for long-horizon tasks.

Contribution

The paper presents a novel integration of video-based motion extraction with reinforcement learning, including new strategies for temporal alignment, skill refinement, and adaptive reward shaping.

Findings

01

Successfully learned long-horizon bimanual assembly tasks

02

Achieved synchronous and asynchronous manipulation skills

03

Demonstrated scalability from unannotated human videos

Abstract

This work presents DemoBot, a learning framework that enables a dual-arm, multi-finger robotic system to acquire complex manipulation skills from a single unannotated RGB-D video demonstration. The method extracts structured motion trajectories of both hands and objects from raw video data. These trajectories serve as motion priors for a novel reinforcement learning (RL) pipeline that learns to refine them through contact-rich interactions, thereby eliminating the need to learn from scratch. To address the challenge of learning long-horizon manipulation skills, we introduce: (1) Temporal-segment based RL to enforce temporal alignment of the current state with demonstrations; (2) Success-Gated Reset strategy to balance the refinement of readily acquired skills and the exploration of subsequent task stages; and (3) Event-Driven Reward curriculum with adaptive thresholding to guide the RL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Reinforcement Learning in Robotics