Latent-Variable Advantage-Weighted Policy Optimization for Offline RL
Xi Chen, Ali Ghadirzadeh, Tianhe Yu, Yuan Gao, Jianhao Wang, Wenzhe, Li, Bin Liang, Chelsea Finn, Chongjie Zhang

TL;DR
This paper introduces LAPO, a novel offline reinforcement learning method using latent-variable policies to better handle heterogeneous datasets and improve policy performance in continuous control tasks.
Contribution
LAPO leverages latent-variable policies to address distribution shift in offline RL, enhancing performance on diverse and biased datasets.
Findings
Improves average performance by 49% on heterogeneous datasets.
Enhances performance by 8% on narrow, biased datasets.
Demonstrates effectiveness across locomotion, navigation, and manipulation tasks.
Abstract
Offline reinforcement learning methods hold the promise of learning policies from pre-collected datasets without the need to query the environment for new transitions. This setting is particularly well-suited for continuous control robotic applications for which online data collection based on trial-and-error is costly and potentially unsafe. In practice, offline datasets are often heterogeneous, i.e., collected in a variety of scenarios, such as data from several human demonstrators or from policies that act with different purposes. Unfortunately, such datasets can exacerbate the distribution shift between the behavior policy underlying the data and the optimal policy to be learned, leading to poor performance. To address this challenge, we propose to leverage latent-variable policies that can represent a broader class of policy distributions, leading to better adherence to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Robot Manipulation and Learning
