Offline Reinforcement Learning with Realizability and Single-policy Concentrability
Wenhao Zhan, Baihe Huang, Audrey Huang, Nan Jiang, Jason D. Lee

TL;DR
This paper demonstrates that offline reinforcement learning can be made sample-efficient under weak assumptions by using a primal-dual algorithm with density-ratio modeling, relaxing previous strong assumptions on data coverage and function classes.
Contribution
It introduces a simple primal-dual algorithm for offline RL that achieves polynomial sample complexity under realizability and single-policy concentrability, relaxing previous strong assumptions.
Findings
Algorithm achieves polynomial sample complexity.
Relaxes assumptions on data coverage and function classes.
Provides alternative analyses under different assumptions.
Abstract
Sample-efficiency guarantees for offline reinforcement learning (RL) often rely on strong assumptions on both the function classes (e.g., Bellman-completeness) and the data coverage (e.g., all-policy concentrability). Despite the recent efforts on relaxing these assumptions, existing works are only able to relax one of the two factors, leaving the strong assumption on the other factor intact. As an important open problem, can we achieve sample-efficient offline RL with weak assumptions on both factors? In this paper we answer the question in the positive. We analyze a simple algorithm based on the primal-dual formulation of MDPs, where the dual variables (discounted occupancy) are modeled using a density-ratio function against offline data. With proper regularization, we show that the algorithm enjoys polynomial sample complexity, under only realizability and single-policy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Mobile Crowdsensing and Crowdsourcing
