Adaptive Advantage-Guided Policy Regularization for Offline   Reinforcement Learning

Tenglong Liu; Yang Li; Yixing Lan; Hao Gao; Wei Pan; Xin Xu

arXiv:2405.19909·cs.LG·July 16, 2024

Adaptive Advantage-Guided Policy Regularization for Offline Reinforcement Learning

Tenglong Liu, Yang Li, Yixing Lan, Hao Gao, Wei Pan, Xin Xu

PDF

Open Access 1 Repo

TL;DR

This paper introduces A2PR, a novel offline reinforcement learning method that adaptively guides policy regularization using advantage estimates and VAE-generated actions, improving performance on suboptimal datasets.

Contribution

A2PR is the first method to adaptively select high-advantage actions for policy regularization, balancing conservatism and policy improvement in offline RL.

Findings

01

Achieves state-of-the-art results on D4RL benchmarks.

02

Effectively mitigates value overestimation issues.

03

Performs well on suboptimal mixed datasets.

Abstract

In offline reinforcement learning, the challenge of out-of-distribution (OOD) is pronounced. To address this, existing methods often constrain the learned policy through policy regularization. However, these methods often suffer from the issue of unnecessary conservativeness, hampering policy improvement. This occurs due to the indiscriminate use of all actions from the behavior policy that generates the offline dataset as constraints. The problem becomes particularly noticeable when the quality of the dataset is suboptimal. Thus, we propose Adaptive Advantage-guided Policy Regularization (A2PR), obtaining high-advantage actions from an augmented behavior policy combined with VAE to guide the learned policy. A2PR can select high-advantage actions that differ from those present in the dataset, while still effectively maintaining conservatism from OOD actions. This is achieved by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ltlhuuu/a2pr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Elevator Systems and Control · Reinforcement Learning in Robotics