Bayes-Adaptive Deep Model-Based Policy Optimisation
Tai Hoang, Ngo Anh Vien

TL;DR
This paper presents RoMBRL, a Bayesian deep model-based reinforcement learning method that effectively captures model uncertainty for more sample-efficient policy optimization, outperforming existing methods on control benchmarks.
Contribution
Introduction of RoMBRL, a Bayesian deep RL approach using belief distributions and history-based policies, enabling better uncertainty handling and sample efficiency.
Findings
RoMBRL outperforms existing methods on control benchmarks.
The method achieves higher sample efficiency and task performance.
Uncertainty propagation improves policy optimization.
Abstract
We introduce a Bayesian (deep) model-based reinforcement learning method (RoMBRL) that can capture model uncertainty to achieve sample-efficient policy optimisation. We propose to formulate the model-based policy optimisation problem as a Bayes-adaptive Markov decision process (BAMDP). RoMBRL maintains model uncertainty via belief distributions through a deep Bayesian neural network whose samples are generated via stochastic gradient Hamiltonian Monte Carlo. Uncertainty is propagated through simulations controlled by sampled models and history-based policies. As beliefs are encoded in visited histories, we propose a history-based policy network that can be end-to-end trained to generalise across history space and will be trained using recurrent Trust-Region Policy Optimisation. We show that RoMBRL outperforms existing approaches on many challenging control benchmark tasks in terms of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Simulation Techniques and Applications
