Settling the Horizon-Dependence of Sample Complexity in Reinforcement Learning
Yuanzhi Li, Ruosong Wang, Lin F. Yang

TL;DR
This paper demonstrates that the sample complexity for reinforcement learning can be made independent of the horizon length, resolving a key open question by developing an algorithm with constant episode interactions.
Contribution
The authors introduce a novel algorithm that achieves horizon-independent sample complexity in RL, using new techniques connecting discounted and finite-horizon MDPs and perturbation analysis.
Findings
Achieves horizon-independent PAC guarantees in RL
Develops a new connection between discounted and finite-horizon MDPs
Introduces a novel perturbation analysis technique
Abstract
Recently there is a surge of interest in understanding the horizon-dependence of the sample complexity in reinforcement learning (RL). Notably, for an RL environment with horizon length , previous work have shown that there is a probably approximately correct (PAC) algorithm that learns an -optimal policy using episodes of environment interactions when the number of states and actions is fixed. It is yet unknown whether the dependence is necessary or not. In this work, we resolve this question by developing an algorithm that achieves the same PAC guarantee while using only episodes of environment interactions, completely settling the horizon-dependence of the sample complexity in RL. We achieve this bound by (i) establishing a connection between value functions in discounted and finite-horizon Markov decision processes (MDPs)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Gene Regulatory Network Analysis
