Offline Reinforcement Learning with Realizability and Single-policy   Concentrability

Wenhao Zhan; Baihe Huang; Audrey Huang; Nan Jiang; Jason D. Lee

arXiv:2202.04634·cs.LG·June 29, 2022·6 cites

Offline Reinforcement Learning with Realizability and Single-policy Concentrability

Wenhao Zhan, Baihe Huang, Audrey Huang, Nan Jiang, Jason D. Lee

PDF

Open Access

TL;DR

This paper demonstrates that offline reinforcement learning can be made sample-efficient under weak assumptions by using a primal-dual algorithm with density-ratio modeling, relaxing previous strong assumptions on data coverage and function classes.

Contribution

It introduces a simple primal-dual algorithm for offline RL that achieves polynomial sample complexity under realizability and single-policy concentrability, relaxing previous strong assumptions.

Findings

01

Algorithm achieves polynomial sample complexity.

02

Relaxes assumptions on data coverage and function classes.

03

Provides alternative analyses under different assumptions.

Abstract

Sample-efficiency guarantees for offline reinforcement learning (RL) often rely on strong assumptions on both the function classes (e.g., Bellman-completeness) and the data coverage (e.g., all-policy concentrability). Despite the recent efforts on relaxing these assumptions, existing works are only able to relax one of the two factors, leaving the strong assumption on the other factor intact. As an important open problem, can we achieve sample-efficient offline RL with weak assumptions on both factors? In this paper we answer the question in the positive. We analyze a simple algorithm based on the primal-dual formulation of MDPs, where the dual variables (discounted occupancy) are modeled using a density-ratio function against offline data. With proper regularization, we show that the algorithm enjoys polynomial sample complexity, under only realizability and single-policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Mobile Crowdsensing and Crowdsourcing