Model-Based Policy Search Using Monte Carlo Gradient Estimation with   Real Systems Application

Fabio Amadio; Alberto Dalla Libera; Riccardo Antonello; Daniel; Nikovski; Ruggero Carli; Diego Romeres

arXiv:2101.12115·cs.LG·November 29, 2022

Model-Based Policy Search Using Monte Carlo Gradient Estimation with Real Systems Application

Fabio Amadio, Alberto Dalla Libera, Riccardo Antonello, Daniel, Nikovski, Ruggero Carli, Diego Romeres

PDF

Open Access

TL;DR

This paper introduces MC-PILCO, a model-based reinforcement learning algorithm using Gaussian Processes and Monte Carlo gradient estimation, demonstrating improved data efficiency and control in simulations and real systems.

Contribution

The paper presents MC-PILCO, a novel GP-based MBRL algorithm with structured kernels and Monte Carlo gradient estimation, optimized for real systems with partial observability.

Findings

01

MC-PILCO outperforms state-of-the-art GP-based MBRL algorithms in simulated environments.

02

MC-PILCO achieves effective control on real systems like Furuta pendulum and ball-and-plate.

03

Structured kernels and policy optimization techniques enhance data efficiency and control performance.

Abstract

In this paper, we present a Model-Based Reinforcement Learning (MBRL) algorithm named \emph{Monte Carlo Probabilistic Inference for Learning COntrol} (MC-PILCO). The algorithm relies on Gaussian Processes (GPs) to model the system dynamics and on a Monte Carlo approach to estimate the policy gradient. This defines a framework in which we ablate the choice of the following components: (i) the selection of the cost function, (ii) the optimization of policies using dropout, (iii) an improved data efficiency through the use of structured kernels in the GP models. The combination of the aforementioned aspects affects dramatically the performance of MC-PILCO. Numerical comparisons in a simulated cart-pole environment show that MC-PILCO exhibits better data efficiency and control performance w.r.t. state-of-the-art GP-based MBRL algorithms. Finally, we apply MC-PILCO to real systems,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Advanced Control Systems Optimization · Control Systems and Identification