Adaptive Neighborhood-Constrained Q Learning for Offline Reinforcement Learning
Yixiu Mao, Yun Qu, Qi Wang, Xiangyang Ji

TL;DR
This paper introduces an adaptive neighborhood constraint for offline RL that bounds extrapolation errors, improves over existing constraints, and achieves state-of-the-art results with robustness to data noise and limitations.
Contribution
It proposes a novel adaptive neighborhood constraint that approximates support constraints without behavior policy modeling, enhancing offline RL performance.
Findings
Achieves state-of-the-art results on standard benchmarks.
Demonstrates robustness in noisy or limited data scenarios.
Effectively bounds extrapolation errors and distribution shift.
Abstract
Offline reinforcement learning (RL) suffers from extrapolation errors induced by out-of-distribution (OOD) actions. To address this, offline RL algorithms typically impose constraints on action selection, which can be systematically categorized into density, support, and sample constraints. However, we show that each category has inherent limitations: density and sample constraints tend to be overly conservative in many scenarios, while the support constraint, though least restrictive, faces challenges in accurately modeling the behavior policy. To overcome these limitations, we propose a new neighborhood constraint that restricts action selection in the Bellman target to the union of neighborhoods of dataset actions. Theoretically, the constraint not only bounds extrapolation errors and distribution shift under certain conditions, but also approximates the support constraint without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques
