Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives
Anirudh Goyal, Shagun Sodhani, Jonathan Binas, Xue Bin Peng, Sergey, Levine, Yoshua Bengio

TL;DR
This paper introduces a decentralized reinforcement learning approach where primitives compete based on information needs, leading to improved generalization without a high-level meta-policy.
Contribution
The work proposes a novel primitive-based policy architecture with decentralized decision-making and information-theoretic competition, eliminating the need for a meta-policy.
Findings
Outperforms flat policies in generalization tasks
Enables primitives to specialize through information regularization
Demonstrates effective decentralized decision-making
Abstract
Reinforcement learning agents that operate in diverse and complex environments can benefit from the structured decomposition of their behavior. Often, this is addressed in the context of hierarchical reinforcement learning, where the aim is to decompose a policy into lower-level primitives or options, and a higher-level meta-policy that triggers the appropriate behaviors for a given situation. However, the meta-policy must still produce appropriate decisions in all states. In this work, we propose a policy design that decomposes into primitives, similarly to hierarchical reinforcement learning, but without a high-level meta-policy. Instead, each primitive can decide for themselves whether they wish to act in the current state. We use an information-theoretic mechanism for enabling this decentralized decision: each primitive chooses how much information it needs about the current state…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Elevator Systems and Control · Adaptive Dynamic Programming Control
