Latent-Variable Advantage-Weighted Policy Optimization for Offline RL

Xi Chen; Ali Ghadirzadeh; Tianhe Yu; Yuan Gao; Jianhao Wang; Wenzhe; Li; Bin Liang; Chelsea Finn; Chongjie Zhang

arXiv:2203.08949·cs.LG·March 18, 2022

Latent-Variable Advantage-Weighted Policy Optimization for Offline RL

Xi Chen, Ali Ghadirzadeh, Tianhe Yu, Yuan Gao, Jianhao Wang, Wenzhe, Li, Bin Liang, Chelsea Finn, Chongjie Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces LAPO, a novel offline reinforcement learning method using latent-variable policies to better handle heterogeneous datasets and improve policy performance in continuous control tasks.

Contribution

LAPO leverages latent-variable policies to address distribution shift in offline RL, enhancing performance on diverse and biased datasets.

Findings

01

Improves average performance by 49% on heterogeneous datasets.

02

Enhances performance by 8% on narrow, biased datasets.

03

Demonstrates effectiveness across locomotion, navigation, and manipulation tasks.

Abstract

Offline reinforcement learning methods hold the promise of learning policies from pre-collected datasets without the need to query the environment for new transitions. This setting is particularly well-suited for continuous control robotic applications for which online data collection based on trial-and-error is costly and potentially unsafe. In practice, offline datasets are often heterogeneous, i.e., collected in a variety of scenarios, such as data from several human demonstrators or from policies that act with different purposes. Unfortunately, such datasets can exacerbate the distribution shift between the behavior policy underlying the data and the optimal policy to be learned, leading to poor performance. To address this challenge, we propose to leverage latent-variable policies that can represent a broader class of policy distributions, leading to better adherence to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pcchenxi/lapo-offlienrl
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Robot Manipulation and Learning