Offline Reinforcement Learning Under Value and Density-Ratio   Realizability: The Power of Gaps

Jinglin Chen; Nan Jiang

arXiv:2203.13935·cs.LG·June 16, 2022·1 cites

Offline Reinforcement Learning Under Value and Density-Ratio Realizability: The Power of Gaps

Jinglin Chen, Nan Jiang

PDF

Open Access

TL;DR

This paper introduces a new theoretical framework for offline reinforcement learning that guarantees sample efficiency under limited data coverage by leveraging gap assumptions and realizability, addressing a key challenge in the field.

Contribution

It provides the first analysis combining realizability and coverage gaps for offline RL, demonstrating guarantees with a simple pessimistic algorithm under weak assumptions.

Findings

01

Guarantees for offline RL with limited coverage using gap assumptions

02

A simple MIS-based algorithm achieves sample-efficiency under realizability and gap conditions

03

First to identify the utility of gap assumptions in offline RL with weak function approximation

Abstract

We consider a challenging theoretical problem in offline reinforcement learning (RL): obtaining sample-efficiency guarantees with a dataset lacking sufficient coverage, under only realizability-type assumptions for the function approximators. While the existing theory has addressed learning under realizability and under non-exploratory data separately, no work has been able to address both simultaneously (except for a concurrent work which we compare in detail). Under an additional gap assumption, we provide guarantees to a simple pessimistic algorithm based on a version space formed by marginalized importance sampling (MIS), and the guarantee only requires the data to cover the optimal policy and the function classes to realize the optimal value and density-ratio functions. While similar gap assumptions have been used in other areas of RL theory, our work is the first to identify the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsProbability and Risk Models