SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning
Kimin Lee, Michael Laskin, Aravind Srinivas, Pieter Abbeel

TL;DR
SUNRISE is a unified ensemble framework for off-policy deep reinforcement learning that improves stability and exploration by combining weighted Bellman backups and upper-confidence bound action selection.
Contribution
The paper introduces SUNRISE, a simple ensemble method compatible with various off-policy RL algorithms, enhancing performance through uncertainty-based re-weighting and exploration strategies.
Findings
Improves stability of off-policy RL algorithms.
Enhances exploration efficiency with upper-confidence bounds.
Achieves better performance on diverse control tasks.
Abstract
Off-policy deep reinforcement learning (RL) has been successful in a range of challenging domains. However, standard off-policy RL algorithms can suffer from several issues, such as instability in Q-learning and balancing exploration and exploitation. To mitigate these issues, we present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy RL algorithms. SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration. By enforcing the diversity between agents using Bootstrap with random initialization, we show that these different ideas are largely orthogonal and can be fruitfully integrated, together further improving the performance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
MethodsDense Connections · Convolution · Double Q-learning · Deep Q-Network · Prioritized Experience Replay · Q-Learning · Noisy Linear Layer · Dueling Network · N-step Returns · Rainbow DQN
