Settling the Horizon-Dependence of Sample Complexity in Reinforcement   Learning

Yuanzhi Li; Ruosong Wang; Lin F. Yang

arXiv:2111.00633·cs.LG·November 2, 2021

Settling the Horizon-Dependence of Sample Complexity in Reinforcement Learning

Yuanzhi Li, Ruosong Wang, Lin F. Yang

PDF

Open Access

TL;DR

This paper demonstrates that the sample complexity for reinforcement learning can be made independent of the horizon length, resolving a key open question by developing an algorithm with constant episode interactions.

Contribution

The authors introduce a novel algorithm that achieves horizon-independent sample complexity in RL, using new techniques connecting discounted and finite-horizon MDPs and perturbation analysis.

Findings

01

Achieves horizon-independent PAC guarantees in RL

02

Develops a new connection between discounted and finite-horizon MDPs

03

Introduces a novel perturbation analysis technique

Abstract

Recently there is a surge of interest in understanding the horizon-dependence of the sample complexity in reinforcement learning (RL). Notably, for an RL environment with horizon length $H$ , previous work have shown that there is a probably approximately correct (PAC) algorithm that learns an $O (1)$ -optimal policy using $polylog (H)$ episodes of environment interactions when the number of states and actions is fixed. It is yet unknown whether the $polylog (H)$ dependence is necessary or not. In this work, we resolve this question by developing an algorithm that achieves the same PAC guarantee while using only $O (1)$ episodes of environment interactions, completely settling the horizon-dependence of the sample complexity in RL. We achieve this bound by (i) establishing a connection between value functions in discounted and finite-horizon Markov decision processes (MDPs)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Gene Regulatory Network Analysis