Learning to Trust Bellman Updates: Selective State-Adaptive Regularization for Offline RL

Qin-Wen Luo; Ming-Kun Xie; Ye-Wen Wang; Sheng-Jun Huang

arXiv:2505.19923·cs.LG·May 27, 2025

Learning to Trust Bellman Updates: Selective State-Adaptive Regularization for Offline RL

Qin-Wen Luo, Ming-Kun Xie, Ye-Wen Wang, Sheng-Jun Huang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel state-adaptive regularization technique for offline RL that dynamically adjusts regularization strength based on state quality, improving policy learning by balancing extrapolation error mitigation and policy flexibility.

Contribution

It proposes a selective, state-adaptive regularization method that enhances existing offline RL algorithms by tailoring regularization to state-specific data quality, leading to better performance.

Findings

01

Outperforms state-of-the-art methods on D4RL benchmarks

02

Effectively balances regularization to reduce extrapolation errors

03

Improves offline-to-online policy transfer

Abstract

Offline reinforcement learning (RL) aims to learn an effective policy from a static dataset. To alleviate extrapolation errors, existing studies often uniformly regularize the value function or policy updates across all states. However, due to substantial variations in data quality, the fixed regularization strength often leads to a dilemma: Weak regularization strength fails to address extrapolation errors and value overestimation, while strong regularization strength shifts policy learning toward behavior cloning, impeding potential performance enabled by Bellman updates. To address this issue, we propose the selective state-adaptive regularization method for offline RL. Specifically, we introduce state-adaptive regularization coefficients to trust state-level Bellman-driven results, while selectively applying regularization on high-quality actions, aiming to avoid performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qinwenluo/ssar
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Security and Intrusion Detection · Smart Grid Security and Resilience · Cryptography and Data Security