Random Actions vs Random Policies: Bootstrapping Model-Based Direct Policy Search
Elias Hanna, Alex Coninx, St\'ephane Doncieux

TL;DR
This paper investigates how different initial data collection methods affect the efficiency of model-based policy search, highlighting the importance of bootstrap strategies and potential hybrid approaches for improved learning.
Contribution
It compares initialization methods across two policy search frameworks, providing insights into their impact on model performance and suggesting avenues for hybrid method development.
Findings
Task-dependent factors can negatively affect each method
Probabilistic ensembles are used for dynamics modeling
Hybrid approaches may improve bootstrap efficiency
Abstract
This paper studies the impact of the initial data gathering method on the subsequent learning of a dynamics model. Dynamics models approximate the true transition function of a given task, in order to perform policy search directly on the model rather than on the costly real system. This study aims to determine how to bootstrap a model as efficiently as possible, by comparing initialization methods employed in two different policy search frameworks in the literature. The study focuses on the model performance under the episode-based framework of Evolutionary methods using probabilistic ensembles. Experimental results show that various task-dependant factors can be detrimental to each method, suggesting to explore hybrid approaches.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Multi-Objective Optimization Algorithms
