Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning
Christoph Dann, Emma Brunskill

TL;DR
This paper establishes tight upper and lower PAC bounds for sample complexity in episodic fixed-horizon reinforcement learning, improving understanding of learning efficiency in finite-horizon MDPs.
Contribution
It provides the first matching lower PAC bound and refines upper bounds using Bernstein's inequality for episodic finite-horizon MDPs.
Findings
Upper PAC bound: $ ilde O(rac{| ext{S}|^2 | ext{A}| H^2}{ ext{ε}^2} ext{log}rac{1}{ ext{δ}})$
Lower PAC bound: $ ilde rac{| ext{S}| | ext{A}| H^2}{ ext{ε}^2} ext{log} rac{1}{ ext{δ}+c}$
Improved bounds reduce the dependence on the horizon $H$ from at least $H^3$ to $H^2$.
Abstract
Recently, there has been significant progress in understanding reinforcement learning in discounted infinite-horizon Markov decision processes (MDPs) by deriving tight sample complexity bounds. However, in many real-world applications, an interactive learning agent operates for a fixed or bounded period of time, for example tutoring students for exams or handling customer service requests. Such scenarios can often be better treated as episodic fixed-horizon MDPs, for which only looser bounds on the sample complexity exist. A natural notion of sample complexity in this setting is the number of episodes required to guarantee a certain performance with high probability (PAC guarantee). In this paper, we derive an upper PAC bound and a lower PAC bound $\tilde \Omega(\frac{|\mathcal S| |\mathcal A|…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms
