Ag2x2: Robust Agent-Agnostic Visual Representations for Zero-Shot Bimanual Manipulation
Ziyin Xiong, Yinghan Chen, Puhao Li, Yixin Zhu, Tengyu Liu, Siyuan Huang

TL;DR
Ag2x2 introduces a novel agent-agnostic visual representation framework that enables robust zero-shot bimanual manipulation, outperforming baselines and facilitating imitation learning without expert supervision.
Contribution
The paper presents Ag2x2, a new coordination-aware visual representation method for bimanual manipulation that encodes object and hand motion information while remaining agent-agnostic.
Findings
Achieves 73.5% success rate across 13 bimanual tasks.
Outperforms baseline methods and surpasses expert-engineered reward policies.
Enables effective imitation learning without human demonstrations.
Abstract
Bimanual manipulation, fundamental to human daily activities, remains a challenging task due to its inherent complexity of coordinated control. Recent advances have enabled zero-shot learning of single-arm manipulation skills through agent-agnostic visual representations derived from human videos; however, these methods overlook crucial agent-specific information necessary for bimanual coordination, such as end-effector positions. We propose Ag2x2, a computational framework for bimanual manipulation through coordination-aware visual representations that jointly encode object states and hand motion patterns while maintaining agent-agnosticism. Extensive experiments demonstrate that Ag2x2 achieves a 73.5% success rate across 13 diverse bimanual tasks from Bi-DexHands and PerAct2, including challenging scenarios with deformable objects like ropes. This performance outperforms baseline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
