Continuous Control With Ensemble Deep Deterministic Policy Gradients
Piotr Januszewski, Mateusz Olko, Micha{\l} Kr\'olikowski, Jakub, \'Swi\k{a}tkowski, Marcin Andrychowicz, {\L}ukasz Kuci\'nski, Piotr, Mi{\l}o\'s

TL;DR
This paper empirically investigates various components of deep reinforcement learning in continuous control, revealing insights that lead to the development of the ED2 method, which achieves state-of-the-art results with practical simplicity.
Contribution
The paper introduces ED2, a novel ensemble-based approach that combines multiple insights to improve continuous control performance in deep RL.
Findings
Ensembling multiple actors improves performance.
Existing methods are unstable across training conditions.
Posterior sampling exploration outperforms UCB-based methods.
Abstract
The growth of deep reinforcement learning (RL) has brought multiple exciting tools and methods to the field. This rapid expansion makes it important to understand the interplay between individual elements of the RL toolbox. We approach this task from an empirical perspective by conducting a study in the continuous control setting. We present multiple insights of fundamental nature, including: an average of multiple actors trained from the same data boosts performance; the existing methods are unstable across training runs, epochs of training, and evaluation runs; a commonly used additive action noise is not required for effective training; a strategy based on posterior sampling explores better than the approximated UCB combined with the weighted Bellman backup; the weighted Bellman backup alone cannot replace the clipped double Q-Learning; the critics' initialization plays the major…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Model Reduction and Neural Networks · Adversarial Robustness in Machine Learning
