Fast Model-based Policy Search for Universal Policy Networks
Buddhika Laknath Semage, Thommen George Karimpanal, Santu Rana and, Svetha Venkatesh

TL;DR
This paper introduces a Gaussian Process prior combined with Bayesian Optimization to efficiently select the best policy from a universal policy network for new environments, improving adaptation in reinforcement learning.
Contribution
It presents a novel method that integrates a GP prior with Bayesian Optimization to enhance policy selection from universal policy networks in unseen environments.
Findings
Outperforms baseline methods in continuous control tasks
Efficiently identifies suitable policies for new environments
Applicable to both continuous and discrete control scenarios
Abstract
Adapting an agent's behaviour to new environments has been one of the primary focus areas of physics based reinforcement learning. Although recent approaches such as universal policy networks partially address this issue by enabling the storage of multiple policies trained in simulation on a wide range of dynamic/latent factors, efficiently identifying the most appropriate policy for a given environment remains a challenge. In this work, we propose a Gaussian Process-based prior learned in simulation, that captures the likely performance of a policy when transferred to a previously unseen environment. We integrate this prior with a Bayesian Optimisation-based policy search process to improve the efficiency of identifying the most appropriate policy from the universal policy network. We empirically evaluate our approach in a range of continuous and discrete control environments, and show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Reinforcement Learning in Robotics · Machine Learning and Data Classification
