Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning

Christoph Dann; Emma Brunskill

arXiv:1510.08906·stat.ML·May 12, 2016·33 cites

Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning

Christoph Dann, Emma Brunskill

PDF

Open Access

TL;DR

This paper establishes tight upper and lower PAC bounds for sample complexity in episodic fixed-horizon reinforcement learning, improving understanding of learning efficiency in finite-horizon MDPs.

Contribution

It provides the first matching lower PAC bound and refines upper bounds using Bernstein's inequality for episodic finite-horizon MDPs.

Findings

01

Upper PAC bound: $ ilde O(rac{| ext{S}|^2 | ext{A}| H^2}{ ext{ε}^2} ext{log}rac{1}{ ext{δ}})$

02

Lower PAC bound: $ ilde rac{| ext{S}| | ext{A}| H^2}{ ext{ε}^2} ext{log} rac{1}{ ext{δ}+c}$

03

Improved bounds reduce the dependence on the horizon $H$ from at least $H^3$ to $H^2$.

Abstract

Recently, there has been significant progress in understanding reinforcement learning in discounted infinite-horizon Markov decision processes (MDPs) by deriving tight sample complexity bounds. However, in many real-world applications, an interactive learning agent operates for a fixed or bounded period of time, for example tutoring students for exams or handling customer service requests. Such scenarios can often be better treated as episodic fixed-horizon MDPs, for which only looser bounds on the sample complexity exist. A natural notion of sample complexity in this setting is the number of episodes required to guarantee a certain performance with high probability (PAC guarantee). In this paper, we derive an upper PAC bound $\tilde{O} (\frac{∣ S ∣ ^{2} ∣ A ∣ H ^{2}}{ϵ ^{2}} ln \frac{1}{δ})$ and a lower PAC bound $\tilde \Omega(\frac{|\mathcal S| |\mathcal A|…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms