Generalization Through Hand-Eye Coordination: An Action Space for Learning Spatially-Invariant Visuomotor Control
Chen Wang, Rui Wang, Ajay Mandlekar, Li Fei-Fei, Silvio Savarese,, Danfei Xu

TL;DR
This paper introduces Hand-eye Action Networks (HAN), a learnable action space inspired by human hand-eye coordination, enabling visuomotor policies to generalize to new scene configurations in manipulation tasks.
Contribution
The paper proposes HAN, a novel learnable action space that captures hand-eye coordination, improving generalization in visuomotor control beyond training scenarios.
Findings
HAN enables zero-shot generalization to new scene configurations.
Visuomotor policies with HAN outperform baseline methods.
HAN mimics human-like spatial invariance in manipulation tasks.
Abstract
Imitation Learning (IL) is an effective framework to learn visuomotor skills from offline demonstration data. However, IL methods often fail to generalize to new scene configurations not covered by training data. On the other hand, humans can manipulate objects in varying conditions. Key to such capability is hand-eye coordination, a cognitive ability that enables humans to adaptively direct their movements at task-relevant objects and be invariant to the objects' absolute spatial location. In this work, we present a learnable action space, Hand-eye Action Networks (HAN), that can approximate human's hand-eye coordination behaviors by learning from human teleoperated demonstrations. Through a set of challenging multi-stage manipulation tasks, we show that a visuomotor policy equipped with HAN is able to inherit the key spatial invariance property of hand-eye coordination and achieve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Action Observation and Synchronization · Hand Gesture Recognition Systems
