Offline Reinforcement Learning via Linear-Programming with Error-Bound Induced Constraints
Asuman Ozdaglar, Sarath Pattathil, Jiawei Zhang, Kaiqing Zhang

TL;DR
This paper develops a linear programming approach for offline reinforcement learning that achieves optimal sample complexity by incorporating error bounds as constraints, applicable to various MDP settings and function approximation methods.
Contribution
It introduces a novel LP framework with error-bound constraints that attain optimal sample complexity under general conditions, relaxing previous assumptions.
Findings
Achieves $O(1/\sqrt{n})$ sample complexity with partial data coverage.
Handles both infinite-horizon discounted and average-reward MDPs.
Provides state-of-the-art or first sample complexities in tabular offline RL settings.
Abstract
Offline reinforcement learning (RL) aims to find an optimal policy for Markov decision processes (MDPs) using a pre-collected dataset. In this work, we revisit the linear programming (LP) reformulation of Markov decision processes for offline RL, with the goal of developing algorithms with optimal sample complexity, where is the sample size, under partial data coverage and general function approximation, and with favorable computational tractability. To this end, we derive new \emph{error bounds} for both the dual and primal-dual formulations of the LP, and incorporate them properly as \emph{constraints} in the LP reformulation. We then show that under a completeness-type assumption, sample complexity can be achieved under standard single-policy coverage assumption, when one properly \emph{relaxes} the occupancy validity constraint in the LP. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning
