Entity-Centric Reinforcement Learning for Object Manipulation from Pixels
Dan Haramati, Tal Daniel, Aviv Tamar

TL;DR
This paper introduces a structured visual reinforcement learning approach for multi-object manipulation from pixels, enabling generalization to tasks with more objects than seen during training.
Contribution
It proposes an entity-centric architecture that models object interactions and goal dependencies, improving generalization in multi-object manipulation tasks from raw images.
Findings
Agents trained on 3 objects generalize to over 10 objects
The approach handles goal dependencies between objects
Demonstrates improved generalization in visual RL tasks
Abstract
Manipulating objects is a hallmark of human intelligence, and an important task in domains such as robotics. In principle, Reinforcement Learning (RL) offers a general approach to learn object manipulation. In practice, however, domains with more than a few objects are difficult for RL agents due to the curse of dimensionality, especially when learning from raw image observations. In this work we propose a structured approach for visual RL that is suitable for representing multiple objects and their interaction, and use it to learn goal-conditioned manipulation of several objects. Key to our method is the ability to handle goals with dependencies between the objects (e.g., moving objects in a certain order). We further relate our architecture to the generalization capability of the trained agent, based on a theoretical result for compositional generalization, and demonstrate agents that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics
