Reinforcement Learning of Active Vision for Manipulating Objects under Occlusions
Ricson Cheng, Arpit Agarwal, Katerina Fragkiadaki

TL;DR
This paper introduces reinforcement learning methods for active vision in manipulation tasks, enabling agents to control their camera and gripper to handle occlusions and distractors, improving performance over static setups.
Contribution
It proposes a novel joint control approach with object-centric attention biases and curriculum learning, advancing active vision in cluttered manipulation environments.
Findings
Active vision policies outperform static camera setups.
Object-centric attention biases improve manipulation performance.
Curriculum learning enhances policy robustness.
Abstract
We consider artificial agents that learn to jointly control their gripperand camera in order to reinforcement learn manipulation policies in the presenceof occlusions from distractor objects. Distractors often occlude the object of in-terest and cause it to disappear from the field of view. We propose hand/eye con-trollers that learn to move the camera to keep the object within the field of viewand visible, in coordination to manipulating it to achieve the desired goal, e.g.,pushing it to a target location. We incorporate structural biases of object-centricattention within our actor-critic architectures, which our experiments suggest tobe a key for good performance. Our results further highlight the importance ofcurriculum with regards to environment difficulty. The resulting active vision /manipulation policies outperform static camera setups for a variety of clutteredenvironments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Advanced Vision and Imaging
