MEPG: A Minimalist Ensemble Policy Gradient Framework for Deep Reinforcement Learning
Qiang He, Huangyuan Su, Chen Gong, Xinwen Hou

TL;DR
The paper introduces MEPG, a minimalist ensemble policy gradient framework for deep reinforcement learning that improves generalization and robustness without additional computational costs by integrating multiple models into a single network using dropout.
Contribution
It proposes a novel ensemble RL framework that maintains ensemble properties with minimal resources through a dropout-based Bellman update, enhancing generalization and robustness.
Findings
MEPG outperforms or matches state-of-the-art ensemble methods in experiments.
It maintains ensemble properties with a single model using dropout.
The framework does not increase computational resource requirements.
Abstract
During the training of a reinforcement learning (RL) agent, the distribution of training data is non-stationary as the agent's behavior changes over time. Therefore, there is a risk that the agent is overspecialized to a particular distribution and its performance suffers in the larger picture. Ensemble RL can mitigate this issue by learning a robust policy. However, it suffers from heavy computational resource consumption due to the newly introduced value and policy functions. In this paper, to avoid the notorious resources consumption issue, we design a novel and simple ensemble deep RL framework that integrates multiple models into a single model. Specifically, we propose the \underline{M}inimalist \underline{E}nsemble \underline{P}olicy \underline{G}radient framework (MEPG), which introduces minimalist ensemble consistent Bellman update by utilizing a modified dropout operator. MEPG…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Reinforcement Learning in Robotics · Machine Learning and Data Classification
MethodsDropout · Q-Learning · Gaussian Process
