MEPG: A Minimalist Ensemble Policy Gradient Framework for Deep   Reinforcement Learning

Qiang He; Huangyuan Su; Chen Gong; Xinwen Hou

arXiv:2109.10552·cs.LG·July 12, 2022·6 cites

MEPG: A Minimalist Ensemble Policy Gradient Framework for Deep Reinforcement Learning

Qiang He, Huangyuan Su, Chen Gong, Xinwen Hou

PDF

Open Access

TL;DR

The paper introduces MEPG, a minimalist ensemble policy gradient framework for deep reinforcement learning that improves generalization and robustness without additional computational costs by integrating multiple models into a single network using dropout.

Contribution

It proposes a novel ensemble RL framework that maintains ensemble properties with minimal resources through a dropout-based Bellman update, enhancing generalization and robustness.

Findings

01

MEPG outperforms or matches state-of-the-art ensemble methods in experiments.

02

It maintains ensemble properties with a single model using dropout.

03

The framework does not increase computational resource requirements.

Abstract

During the training of a reinforcement learning (RL) agent, the distribution of training data is non-stationary as the agent's behavior changes over time. Therefore, there is a risk that the agent is overspecialized to a particular distribution and its performance suffers in the larger picture. Ensemble RL can mitigate this issue by learning a robust policy. However, it suffers from heavy computational resource consumption due to the newly introduced value and policy functions. In this paper, to avoid the notorious resources consumption issue, we design a novel and simple ensemble deep RL framework that integrates multiple models into a single model. Specifically, we propose the \underline{M}inimalist \underline{E}nsemble \underline{P}olicy \underline{G}radient framework (MEPG), which introduces minimalist ensemble consistent Bellman update by utilizing a modified dropout operator. MEPG…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Reinforcement Learning in Robotics · Machine Learning and Data Classification

MethodsDropout · Q-Learning · Gaussian Process