Nearly Horizon-Free Offline Reinforcement Learning

Tongzheng Ren; Jialian Li; Bo Dai; Simon S. Du; Sujay Sanghavi

arXiv:2103.14077·stat.ML·February 11, 2022·6 cites

Nearly Horizon-Free Offline Reinforcement Learning

Tongzheng Ren, Jialian Li, Bo Dai, Simon S. Du, Sujay Sanghavi

PDF

Open Access 1 Video

TL;DR

This paper establishes nearly horizon-free sample complexity bounds for offline reinforcement learning in episodic MDPs, significantly reducing dependency on the horizon length and improving theoretical guarantees.

Contribution

It provides the first nearly horizon-free bounds for offline RL in episodic tabular and linear MDPs, with a novel recursion-based analysis method.

Findings

01

Error bound for offline policy evaluation matches lower bounds up to logs

02

Sub-optimality gap for policy optimization approaches lower bounds

03

Introduces a recursion-based method for variance bounding in offline RL

Abstract

We revisit offline reinforcement learning on episodic time-homogeneous Markov Decision Processes (MDP). For tabular MDP with $S$ states and $A$ actions, or linear MDP with anchor points and feature dimension $d$ , given the collected $K$ episodes data with minimum visiting probability of (anchor) state-action pairs $d_{m}$ , we obtain nearly horizon $H$ -free sample complexity bounds for offline reinforcement learning when the total reward is upper bounded by $1$ . Specifically: 1. For offline policy evaluation, we obtain an $\tilde{O} (\frac{1}{K d _{m}})$ error bound for the plug-in estimator, which matches the lower bound up to logarithmic factors and does not have additional dependency on $poly (H, S, A, d)$ in higher-order term. 2.For offline policy optimization, we obtain an $\tilde{O} (\frac{1}{K d _{m}} + \frac{m i n ( S , d )}{K d _{m}})$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Nearly Horizon-Free Offline Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms