Reparameterized Policy Learning for Multimodal Trajectory Optimization

Zhiao Huang; Litian Liang; Zhan Ling; Xuanlin Li; Chuang Gan; Hao Su

arXiv:2307.10710·cs.LG·July 21, 2023

Reparameterized Policy Learning for Multimodal Trajectory Optimization

Zhiao Huang, Litian Liang, Zhan Ling, Xuanlin Li, Chuang Gan, Hao Su

PDF

Open Access 1 Video

TL;DR

This paper introduces a new multimodal policy parameterization for reinforcement learning that models policies as generative trajectory models, improving exploration and performance in complex, high-dimensional environments.

Contribution

It proposes a novel variational framework for multimodal policy modeling and the RPG algorithm that enhances exploration and data efficiency in RL.

Findings

01

Outperforms previous methods on various tasks.

02

Effectively escapes local optima in dense reward environments.

03

Successfully tackles sparse-reward challenges with intrinsic motivation.

Abstract

We investigate the challenge of parametrizing policies for reinforcement learning (RL) in high-dimensional continuous action spaces. Our objective is to develop a multimodal policy that overcomes limitations inherent in the commonly-used Gaussian parameterization. To achieve this, we propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories. By conditioning the policy on a latent variable, we derive a novel variational bound as the optimization objective, which promotes exploration of the environment. We then present a practical model-based RL method, called Reparameterized Policy Gradient (RPG), which leverages the multimodal policy parameterization and learned world model to achieve strong exploration capabilities and high data efficiency. Empirical results demonstrate that our method can help agents evade local optima in tasks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Reparameterized Policy Learning for Multimodal Trajectory Optimization· slideslive

Taxonomy

TopicsHuman Pose and Action Recognition · Reinforcement Learning in Robotics · Machine Learning and Data Classification