Path Integral Policy Improvement with Covariance Matrix Adaptation

Freek Stulp (Ecole Nationale Superieure de Techniques Avancees),; Olivier Sigaud (Universite Pierre et Marie Curie)

arXiv:1206.4621·cs.LG·June 22, 2012·158 cites

Path Integral Policy Improvement with Covariance Matrix Adaptation

Freek Stulp (Ecole Nationale Superieure de Techniques Avancees),, Olivier Sigaud (Universite Pierre et Marie Curie)

PDF

Open Access

TL;DR

This paper introduces PI2-CMA, a novel reinforcement learning algorithm that combines path integral policy improvement with covariance matrix adaptation, enabling automatic exploration noise tuning and improved performance in continuous control tasks.

Contribution

The paper presents PI2-CMA, a new algorithm that integrates path integral policy improvement with CMA-ES, offering automatic noise scaling and enhanced optimization capabilities.

Findings

01

PI2-CMA automatically determines exploration noise magnitude.

02

PI2-CMA outperforms existing methods in benchmark tasks.

03

The approach unifies several stochastic optimization techniques.

Abstract

There has been a recent focus in reinforcement learning on addressing continuous state and action problems by optimizing parameterized policies. PI2 is a recent example of this approach. It combines a derivation from first principles of stochastic optimal control with tools from statistical estimation theory. In this paper, we consider PI2 as a member of the wider family of methods which share the concept of probability-weighted averaging to iteratively update parameters to optimize a cost function. We compare PI2 to other members of the same family - Cross-Entropy Methods and CMAES - at the conceptual level and in terms of performance. The comparison suggests the derivation of a novel algorithm which we call PI2-CMA for "Path Integral Policy Improvement with Covariance Matrix Adaptation". PI2-CMA's main advantage is that it determines the magnitude of the exploration noise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Multi-Objective Optimization Algorithms · Simulation Techniques and Applications