Bag of Policies for Distributional Deep Exploration
Asen Nachkov, Luchen Li, Giulia Luise, Filippo Valdettaro and, Aldo Faisal

TL;DR
This paper introduces Bag of Policies (BoP), a novel ensemble approach for distributional reinforcement learning that enhances exploration, robustness, and learning speed by maintaining multiple independent policy heads and leveraging posterior uncertainty.
Contribution
The paper proposes BoP, a general ensemble method for distributional RL that improves exploration and robustness by maintaining multiple independent heads and integrating posterior uncertainty analysis.
Findings
BoP improves learning speed on Atari games.
BoP enhances robustness and exploration efficiency.
Ensemble of distributional critics provides better uncertainty estimation.
Abstract
Efficient exploration in complex environments remains a major challenge for reinforcement learning (RL). Compared to previous Thompson sampling-inspired mechanisms that enable temporally extended exploration, i.e., deep exploration, we focus on deep exploration in distributional RL. We develop here a general purpose approach, Bag of Policies (BoP), that can be built on top of any return distribution estimator by maintaining a population of its copies. BoP consists of an ensemble of multiple heads that are updated independently. During training, each episode is controlled by only one of the heads and the collected state-action pairs are used to update all heads off-policy, leading to distinct learning signals for each head which diversify learning and behaviour. To test whether optimistic ensemble method can improve on distributional RL as did on scalar RL, by e.g. Bootstrapped DQN, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Machine Learning and Data Classification
MethodsDense Connections · Convolution · Q-Learning · Focus · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Deep Q-Network
