Semi-pessimistic Reinforcement Learning
Jin Zhu, Xin Zhou, Jiaang Yao, Gholamali Aminian, Omar Rivasplata, Simon Little, Lexin Li, Chengchun Shi

TL;DR
This paper introduces a semi-pessimistic reinforcement learning approach that leverages large amounts of unlabeled data to improve policy learning in offline RL, addressing distributional shift and reward scarcity.
Contribution
The paper proposes a novel semi-pessimistic RL method that simplifies reward estimation, is flexible across algorithms, and guarantees improvement with abundant unlabeled data.
Findings
Method outperforms existing solutions analytically and numerically.
Applicable to both model-free and model-based RL algorithms.
Demonstrated effectiveness in a Parkinson's disease application.
Abstract
Offline reinforcement learning (RL) aims to learn an optimal policy from pre-collected data. However, it faces challenges of distributional shift, where the learned policy may encounter unseen scenarios not covered in the offline data. Additionally, numerous applications suffer from a scarcity of labeled reward data. Relying on labeled data alone often leads to a narrow state-action distribution, further amplifying the distributional shift, and resulting in suboptimal policy learning. To address these issues, we first recognize that the volume of unlabeled data is typically substantially larger than that of labeled data. We then propose a semi-pessimistic RL method to effectively leverage abundant unlabeled data. Our approach offers several advantages. It considerably simplifies the learning process, as it seeks a lower bound of the reward function, rather than that of the Q-function or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Mental Health Research Topics · Innovation Diffusion and Forecasting
