BridgeACT: Bridging Human Demonstrations to Robot Actions via Unified Tool-Target Affordances
Yifan Han, Jianxiang Liu, Haoyu Zhang, Yuqi Gu, Yunhan Guo, Wenzhao Lian

TL;DR
BridgeACT is a novel framework that learns robot manipulation directly from human videos by modeling affordances as an embodiment-agnostic bridge, enabling direct real-world robot deployment without robot demonstration data.
Contribution
It introduces an affordance-based approach that bridges human demonstrations and robot actions, supporting diverse tasks and generalizing to unseen objects and scenes.
Findings
Outperforms prior baselines on real-world tasks.
Generalizes to unseen objects, scenes, and viewpoints.
Enables direct deployment on real robots.
Abstract
Learning robot manipulation from human videos is appealing due to the scale and diversity of human demonstrations, but transferring such demonstrations to executable robot behavior remains challenging. Prior work either relies on robot data for downstream adaptation or learns affordance representations that remain at the perception level and do not directly support real-world execution. We present BridgeACT, an affordance-driven framework that learns robotic manipulation directly from human videos without requiring any robot demonstration data. Our key idea is to model affordance as an embodiment-agnostic intermediate representation that bridges human demonstrations and robot actions. BridgeACT decomposes manipulation into two complementary problems: where to grasp and how to move. To this end, BridgeACT first grounds task-relevant affordance regions in the current scene, and then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
