EPOpt: Learning Robust Neural Network Policies Using Model Ensembles

Aravind Rajeswaran; Sarvjeet Ghotra; Balaraman Ravindran; Sergey; Levine

arXiv:1610.01283·cs.LG·March 7, 2017·144 cites

EPOpt: Learning Robust Neural Network Policies Using Model Ensembles

Aravind Rajeswaran, Sarvjeet Ghotra, Balaraman Ravindran, Sergey, Levine

PDF

Open Access

TL;DR

EPOpt introduces an ensemble-based reinforcement learning method that enhances policy robustness and generalization across diverse and unmodeled real-world domains through adversarial training and adaptive source domain weighting.

Contribution

The paper presents EPOpt, a novel algorithm combining model ensembles, adversarial training, and Bayesian adaptation to improve policy robustness and domain generalization in reinforcement learning.

Findings

01

EPOpt achieves robust policies across varied simulated domains.

02

The adaptive source domain weighting improves transfer to real-world tasks.

03

EPOpt outperforms traditional methods in domain generalization experiments.

Abstract

Sample complexity and safety are major challenges when learning policies with reinforcement learning for real-world tasks, especially when the policies are represented using rich function approximators like deep neural networks. Model-based methods where the real-world target domain is approximated using a simulated source domain provide an avenue to tackle the above challenges by augmenting real data with simulated data. However, discrepancies between the simulated source domain and the target domain pose a challenge for simulated training. We introduce the EPOpt algorithm, which uses an ensemble of simulated source domains and a form of adversarial training to learn policies that are robust and generalize to a broad range of possible target domains, including unmodeled effects. Further, the probability distribution over source domains in the ensemble can be adapted using data from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning