Look Closer: Bridging Egocentric and Third-Person Views with Transformers for Robotic Manipulation
Rishabh Jangir, Nicklas Hansen, Sambaran Ghosal, Mohit Jain, Xiaolong, Wang

TL;DR
This paper introduces a novel multi-view robotic manipulation framework using Transformers with cross-view attention, enabling effective fusion of egocentric and third-person visual feedback for improved RL-based control.
Contribution
It proposes a cross-view attention mechanism with Transformers to fuse egocentric and third-person visual data, enhancing manipulation performance and transferability to real robots.
Findings
Achieved 75% success in hammer manipulation tasks on real robots.
Outperformed single-view and multi-view baselines in learning efficiency.
Successfully transferred from simulation to real-world scenarios without camera calibration.
Abstract
Learning to solve precision-based manipulation tasks from visual feedback using Reinforcement Learning (RL) could drastically reduce the engineering efforts required by traditional robot systems. However, performing fine-grained motor control from visual inputs alone is challenging, especially with a static third-person camera as often used in previous work. We propose a setting for robotic manipulation in which the agent receives visual feedback from both a third-person camera and an egocentric camera mounted on the robot's wrist. While the third-person camera is static, the egocentric camera enables the robot to actively control its vision to aid in precise manipulation. To fuse visual information from both cameras effectively, we additionally propose to use Transformers with a cross-view attention mechanism that models spatial attention from one view to another (and vice-versa), and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Tactile and Sensory Interactions
