Reducing Conservativeness Oriented Offline Reinforcement Learning

Hongchang Zhang; Jianzhun Shao; Yuhang Jiang; Shuncheng He; Xiangyang; Ji

arXiv:2103.00098·cs.LG·March 2, 2021·1 cites

Reducing Conservativeness Oriented Offline Reinforcement Learning

Hongchang Zhang, Jianzhun Shao, Yuhang Jiang, Shuncheng He, Xiangyang, Ji

PDF

Open Access

TL;DR

This paper introduces a novel offline reinforcement learning method that reduces conservativeness by focusing on minority samples and providing a tighter value lower bound, improving performance on skewed datasets.

Contribution

The proposed method addresses data imbalance and tightens value bounds, enhancing policy generalization and performance in conservative offline reinforcement learning.

Findings

01

Outperforms state-of-the-art methods on D4RL benchmarks

02

Effectively handles mixed and skewed datasets

03

Improves value function estimation accuracy

Abstract

In offline reinforcement learning, a policy learns to maximize cumulative rewards with a fixed collection of data. Towards conservative strategy, current methods choose to regularize the behavior policy or learn a lower bound of the value function. However, exorbitant conservation tends to impair the policy's generalization ability and degrade its performance, especially for the mixed datasets. In this paper, we propose the method of reducing conservativeness oriented reinforcement learning. On the one hand, the policy is trained to pay more attention to the minority samples in the static dataset to address the data imbalance problem. On the other hand, we give a tighter lower bound of value function than previous methods to discover potential optimal actions. Consequently, our proposed method is able to tackle the skewed distribution of the provided dataset and derive a value function…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Mobile Crowdsensing and Crowdsourcing