Reducing Conservativeness Oriented Offline Reinforcement Learning
Hongchang Zhang, Jianzhun Shao, Yuhang Jiang, Shuncheng He, Xiangyang, Ji

TL;DR
This paper introduces a novel offline reinforcement learning method that reduces conservativeness by focusing on minority samples and providing a tighter value lower bound, improving performance on skewed datasets.
Contribution
The proposed method addresses data imbalance and tightens value bounds, enhancing policy generalization and performance in conservative offline reinforcement learning.
Findings
Outperforms state-of-the-art methods on D4RL benchmarks
Effectively handles mixed and skewed datasets
Improves value function estimation accuracy
Abstract
In offline reinforcement learning, a policy learns to maximize cumulative rewards with a fixed collection of data. Towards conservative strategy, current methods choose to regularize the behavior policy or learn a lower bound of the value function. However, exorbitant conservation tends to impair the policy's generalization ability and degrade its performance, especially for the mixed datasets. In this paper, we propose the method of reducing conservativeness oriented reinforcement learning. On the one hand, the policy is trained to pay more attention to the minority samples in the static dataset to address the data imbalance problem. On the other hand, we give a tighter lower bound of value function than previous methods to discover potential optimal actions. Consequently, our proposed method is able to tackle the skewed distribution of the provided dataset and derive a value function…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Mobile Crowdsensing and Crowdsourcing
