Offline Reinforcement Learning via Linear-Programming with Error-Bound   Induced Constraints

Asuman Ozdaglar; Sarath Pattathil; Jiawei Zhang; Kaiqing Zhang

arXiv:2212.13861·cs.LG·December 11, 2024·1 cites

Offline Reinforcement Learning via Linear-Programming with Error-Bound Induced Constraints

Asuman Ozdaglar, Sarath Pattathil, Jiawei Zhang, Kaiqing Zhang

PDF

Open Access

TL;DR

This paper develops a linear programming approach for offline reinforcement learning that achieves optimal sample complexity by incorporating error bounds as constraints, applicable to various MDP settings and function approximation methods.

Contribution

It introduces a novel LP framework with error-bound constraints that attain optimal sample complexity under general conditions, relaxing previous assumptions.

Findings

01

Achieves $O(1/\sqrt{n})$ sample complexity with partial data coverage.

02

Handles both infinite-horizon discounted and average-reward MDPs.

03

Provides state-of-the-art or first sample complexities in tabular offline RL settings.

Abstract

Offline reinforcement learning (RL) aims to find an optimal policy for Markov decision processes (MDPs) using a pre-collected dataset. In this work, we revisit the linear programming (LP) reformulation of Markov decision processes for offline RL, with the goal of developing algorithms with optimal $O (1/ n)$ sample complexity, where $n$ is the sample size, under partial data coverage and general function approximation, and with favorable computational tractability. To this end, we derive new \emph{error bounds} for both the dual and primal-dual formulations of the LP, and incorporate them properly as \emph{constraints} in the LP reformulation. We then show that under a completeness-type assumption, $O (1/ n)$ sample complexity can be achieved under standard single-policy coverage assumption, when one properly \emph{relaxes} the occupancy validity constraint in the LP. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning