IGOR: Image-GOal Representations are the Atomic Control Units for Foundation Models in Embodied AI
Xiaoyu Chen, Junliang Guo, Tianyu He, Chuheng Zhang, Pushi Zhang,, Derek Cathera Yang, Li Zhao, Jiang Bian

TL;DR
IGOR introduces a unified semantic action space that bridges human and robot activities, enabling knowledge transfer, cross-domain movement migration, and natural language alignment for improved embodied AI control.
Contribution
The paper presents IGOR, a novel method for learning a unified, semantically consistent latent action space across humans and robots, facilitating transfer and control in embodied AI.
Findings
IGOR learns a consistent action space for humans and robots.
IGOR enables movement migration across videos and domains.
IGOR aligns actions with natural language for robot control.
Abstract
We introduce Image-GOal Representations (IGOR), aiming to learn a unified, semantically consistent action space across human and various robots. Through this unified latent action space, IGOR enables knowledge transfer among large-scale robot and human activity data. We achieve this by compressing visual changes between an initial image and its goal state into latent actions. IGOR allows us to generate latent action labels for internet-scale video data. This unified latent action space enables the training of foundation policy and world models across a wide variety of tasks performed by both robots and humans. We demonstrate that: (1) IGOR learns a semantically consistent action space for both human and robots, characterizing various possible motions of objects representing the physical interaction knowledge; (2) IGOR can "migrate" the movements of the object in the one video to other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Machine Learning in Materials Science · Medical Imaging Techniques and Applications
MethodsALIGN
