The Role of Inherent Bellman Error in Offline Reinforcement Learning with Linear Function Approximation
Noah Golowich, Ankur Moitra

TL;DR
This paper investigates offline reinforcement learning with linear function approximation, demonstrating that low inherent Bellman error enables efficient algorithms under single-policy coverage, with suboptimality scaling as the square root of the Bellman error.
Contribution
It introduces a computationally efficient algorithm for offline RL under low inherent Bellman error and single-policy coverage, providing the first guarantees in the linear Bellman completeness setting.
Findings
Algorithm succeeds under single-policy coverage.
Suboptimality scales with the square root of Bellman error.
Lower bound shows this scaling cannot be improved.
Abstract
In this paper, we study the offline RL problem with linear function approximation. Our main structural assumption is that the MDP has low inherent Bellman error, which stipulates that linear value functions have linear Bellman backups with respect to the greedy policy. This assumption is natural in that it is essentially the minimal assumption required for value iteration to succeed. We give a computationally efficient algorithm which succeeds under a single-policy coverage condition on the dataset, namely which outputs a policy whose value is at least that of any policy which is well-covered by the dataset. Even in the setting when the inherent Bellman error is 0 (termed linear Bellman completeness), our algorithm yields the first known guarantee under single-policy coverage. In the setting of positive inherent Bellman error , we show that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Adaptive Dynamic Programming Control
