Model-Based Policy Gradients with Parameter-Based Exploration by   Least-Squares Conditional Density Estimation

Syogo Mori; Voot Tangkaratt; Tingting Zhao; Jun Morimoto; and Masashi; Sugiyama

arXiv:1307.5118·stat.ML·July 22, 2013

Model-Based Policy Gradients with Parameter-Based Exploration by Least-Squares Conditional Density Estimation

Syogo Mori, Voot Tangkaratt, Tingting Zhao, Jun Morimoto, and Masashi, Sugiyama

PDF

Open Access

TL;DR

This paper introduces a novel model-based reinforcement learning method that combines policy gradients with parameter-based exploration and least-squares conditional density estimation to improve learning efficiency with fewer samples.

Contribution

It proposes integrating policy gradients with parameter-based exploration and advanced transition model estimation for more sample-efficient reinforcement learning.

Findings

01

Demonstrates improved performance in experiments

02

Achieves better sample efficiency than traditional methods

03

Validates the approach's practical usefulness

Abstract

The goal of reinforcement learning (RL) is to let an agent learn an optimal control policy in an unknown environment so that future expected rewards are maximized. The model-free RL approach directly learns the policy based on data samples. Although using many samples tends to improve the accuracy of policy learning, collecting a large number of samples is often expensive in practice. On the other hand, the model-based RL approach first estimates the transition model of the environment and then learns the policy based on the estimated transition model. Thus, if the transition model is accurately learned from a small amount of data, the model-based approach can perform better than the model-free approach. In this paper, we propose a novel model-based RL method by combining a recently proposed model-free policy search method called policy gradients with parameter-based exploration and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Machine Learning and Data Classification