DemoBot: Efficient Learning of Bimanual Manipulation with Dexterous Hands From Third-Person Human Videos
Yucheng Xu, Xiaofeng Mao, Elle Miller, Xinyu Yi, Yang Li, Zhibin Li, Robert B. Fisher

TL;DR
DemoBot introduces a scalable framework that learns complex bimanual manipulation skills from a single human video demonstration, combining structured motion extraction with reinforcement learning enhancements for long-horizon tasks.
Contribution
The paper presents a novel integration of video-based motion extraction with reinforcement learning, including new strategies for temporal alignment, skill refinement, and adaptive reward shaping.
Findings
Successfully learned long-horizon bimanual assembly tasks
Achieved synchronous and asynchronous manipulation skills
Demonstrated scalability from unannotated human videos
Abstract
This work presents DemoBot, a learning framework that enables a dual-arm, multi-finger robotic system to acquire complex manipulation skills from a single unannotated RGB-D video demonstration. The method extracts structured motion trajectories of both hands and objects from raw video data. These trajectories serve as motion priors for a novel reinforcement learning (RL) pipeline that learns to refine them through contact-rich interactions, thereby eliminating the need to learn from scratch. To address the challenge of learning long-horizon manipulation skills, we introduce: (1) Temporal-segment based RL to enforce temporal alignment of the current state with demonstrations; (2) Success-Gated Reset strategy to balance the refinement of readily acquired skills and the exploration of subsequent task stages; and (3) Event-Driven Reward curriculum with adaptive thresholding to guide the RL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Reinforcement Learning in Robotics
