Sample-Efficient Policy Constraint Offline Deep Reinforcement Learning based on Sample Filtering

Yuanhao Chen; Qi Liu; Pengbin Chen; Zhongjian Qiao; Yanjie Li

arXiv:2512.20115·cs.LG·December 24, 2025

Sample-Efficient Policy Constraint Offline Deep Reinforcement Learning based on Sample Filtering

Yuanhao Chen, Qi Liu, Pengbin Chen, Zhongjian Qiao, Yanjie Li

PDF

Open Access

TL;DR

This paper introduces a sample filtering technique for offline deep reinforcement learning that enhances sample efficiency and performance by selecting high-quality transitions based on episode rewards, addressing the distribution shift problem.

Contribution

It proposes a simple, effective sample filtering method that improves the training process of offline RL algorithms by focusing on high-reward transitions, leading to better performance.

Findings

01

Outperforms baseline methods on benchmark tasks.

02

Improves sample efficiency in offline RL.

03

Enhances final policy performance.

Abstract

Offline reinforcement learning (RL) aims to learn a policy that maximizes the expected return using a given static dataset of transitions. However, offline RL faces the distribution shift problem. The policy constraint offline RL method is proposed to solve the distribution shift problem. During the policy constraint offline RL training, it is important to ensure the difference between the learned policy and behavior policy within a given threshold. Thus, the learned policy heavily relies on the quality of the behavior policy. However, a problem exists in existing policy constraint methods: if the dataset contains many low-reward transitions, the learned will be contained with a suboptimal reference policy, leading to slow learning speed, low sample efficiency, and inferior performances. This paper shows that the sampling method in policy constraint offline RL that uses all the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Adaptive Dynamic Programming Control