Learning to Trust Bellman Updates: Selective State-Adaptive Regularization for Offline RL
Qin-Wen Luo, Ming-Kun Xie, Ye-Wen Wang, Sheng-Jun Huang

TL;DR
This paper introduces a novel state-adaptive regularization technique for offline RL that dynamically adjusts regularization strength based on state quality, improving policy learning by balancing extrapolation error mitigation and policy flexibility.
Contribution
It proposes a selective, state-adaptive regularization method that enhances existing offline RL algorithms by tailoring regularization to state-specific data quality, leading to better performance.
Findings
Outperforms state-of-the-art methods on D4RL benchmarks
Effectively balances regularization to reduce extrapolation errors
Improves offline-to-online policy transfer
Abstract
Offline reinforcement learning (RL) aims to learn an effective policy from a static dataset. To alleviate extrapolation errors, existing studies often uniformly regularize the value function or policy updates across all states. However, due to substantial variations in data quality, the fixed regularization strength often leads to a dilemma: Weak regularization strength fails to address extrapolation errors and value overestimation, while strong regularization strength shifts policy learning toward behavior cloning, impeding potential performance enabled by Bellman updates. To address this issue, we propose the selective state-adaptive regularization method for offline RL. Specifically, we introduce state-adaptive regularization coefficients to trust state-level Bellman-driven results, while selectively applying regularization on high-quality actions, aiming to avoid performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Smart Grid Security and Resilience · Cryptography and Data Security
