EPOpt: Learning Robust Neural Network Policies Using Model Ensembles
Aravind Rajeswaran, Sarvjeet Ghotra, Balaraman Ravindran, Sergey, Levine

TL;DR
EPOpt introduces an ensemble-based reinforcement learning method that enhances policy robustness and generalization across diverse and unmodeled real-world domains through adversarial training and adaptive source domain weighting.
Contribution
The paper presents EPOpt, a novel algorithm combining model ensembles, adversarial training, and Bayesian adaptation to improve policy robustness and domain generalization in reinforcement learning.
Findings
EPOpt achieves robust policies across varied simulated domains.
The adaptive source domain weighting improves transfer to real-world tasks.
EPOpt outperforms traditional methods in domain generalization experiments.
Abstract
Sample complexity and safety are major challenges when learning policies with reinforcement learning for real-world tasks, especially when the policies are represented using rich function approximators like deep neural networks. Model-based methods where the real-world target domain is approximated using a simulated source domain provide an avenue to tackle the above challenges by augmenting real data with simulated data. However, discrepancies between the simulated source domain and the target domain pose a challenge for simulated training. We introduce the EPOpt algorithm, which uses an ensemble of simulated source domains and a form of adversarial training to learn policies that are robust and generalize to a broad range of possible target domains, including unmodeled effects. Further, the probability distribution over source domains in the ensemble can be adapted using data from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
