Semi-pessimistic Reinforcement Learning

Jin Zhu; Xin Zhou; Jiaang Yao; Gholamali Aminian; Omar Rivasplata; Simon Little; Lexin Li; Chengchun Shi

arXiv:2505.19002·cs.LG·May 27, 2025

Semi-pessimistic Reinforcement Learning

Jin Zhu, Xin Zhou, Jiaang Yao, Gholamali Aminian, Omar Rivasplata, Simon Little, Lexin Li, Chengchun Shi

PDF

Open Access

TL;DR

This paper introduces a semi-pessimistic reinforcement learning approach that leverages large amounts of unlabeled data to improve policy learning in offline RL, addressing distributional shift and reward scarcity.

Contribution

The paper proposes a novel semi-pessimistic RL method that simplifies reward estimation, is flexible across algorithms, and guarantees improvement with abundant unlabeled data.

Findings

01

Method outperforms existing solutions analytically and numerically.

02

Applicable to both model-free and model-based RL algorithms.

03

Demonstrated effectiveness in a Parkinson's disease application.

Abstract

Offline reinforcement learning (RL) aims to learn an optimal policy from pre-collected data. However, it faces challenges of distributional shift, where the learned policy may encounter unseen scenarios not covered in the offline data. Additionally, numerous applications suffer from a scarcity of labeled reward data. Relying on labeled data alone often leads to a narrow state-action distribution, further amplifying the distributional shift, and resulting in suboptimal policy learning. To address these issues, we first recognize that the volume of unlabeled data is typically substantially larger than that of labeled data. We then propose a semi-pessimistic RL method to effectively leverage abundant unlabeled data. Our approach offers several advantages. It considerably simplifies the learning process, as it seeks a lower bound of the reward function, rather than that of the Q-function or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEvolutionary Algorithms and Applications · Mental Health Research Topics · Innovation Diffusion and Forecasting