Pessimistic Auxiliary Policy for Offline Reinforcement Learning

Fan Zhang; Baoru Huang; Xin Zhang

arXiv:2602.23974·cs.AI·March 6, 2026

Pessimistic Auxiliary Policy for Offline Reinforcement Learning

Fan Zhang, Baoru Huang, Xin Zhang

PDF

Open Access

TL;DR

This paper introduces a pessimistic auxiliary policy for offline reinforcement learning that reduces overestimation and error accumulation by sampling actions with high confidence bounds, improving overall learning performance.

Contribution

The paper proposes a novel pessimistic auxiliary strategy based on lower confidence bounds of the Q-function to enhance offline RL algorithms.

Findings

01

Improves offline RL performance across benchmarks.

02

Reduces overestimation and error accumulation.

03

Enhances existing offline RL methods with the auxiliary strategy.

Abstract

Offline reinforcement learning aims to learn an agent from pre-collected datasets, avoiding unsafe and inefficient real-time interaction. However, inevitable access to out-ofdistribution actions during the learning process introduces approximation errors, causing the error accumulation and considerable overestimation. In this paper, we construct a new pessimistic auxiliary policy for sampling reliable actions. Specifically, we develop a pessimistic auxiliary strategy by maximizing the lower confidence bound of the Q-function. The pessimistic auxiliary strategy exhibits a relatively high value and low uncertainty in the vicinity of the learned policy, avoiding the learned policy sampling high-value actions with potentially high errors during the learning process. Less approximation error introduced by sampled action from pessimistic auxiliary strategy leads to the alleviation of error…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)