Model-Based Policy Gradients with Parameter-Based Exploration by Least-Squares Conditional Density Estimation
Syogo Mori, Voot Tangkaratt, Tingting Zhao, Jun Morimoto, and Masashi, Sugiyama

TL;DR
This paper introduces a novel model-based reinforcement learning method that combines policy gradients with parameter-based exploration and least-squares conditional density estimation to improve learning efficiency with fewer samples.
Contribution
It proposes integrating policy gradients with parameter-based exploration and advanced transition model estimation for more sample-efficient reinforcement learning.
Findings
Demonstrates improved performance in experiments
Achieves better sample efficiency than traditional methods
Validates the approach's practical usefulness
Abstract
The goal of reinforcement learning (RL) is to let an agent learn an optimal control policy in an unknown environment so that future expected rewards are maximized. The model-free RL approach directly learns the policy based on data samples. Although using many samples tends to improve the accuracy of policy learning, collecting a large number of samples is often expensive in practice. On the other hand, the model-based RL approach first estimates the transition model of the environment and then learns the policy based on the estimated transition model. Thus, if the transition model is accurately learned from a small amount of data, the model-based approach can perform better than the model-free approach. In this paper, we propose a novel model-based RL method by combining a recently proposed model-free policy search method called policy gradients with parameter-based exploration and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Machine Learning and Data Classification
