On the Sample Complexity of Reinforcement Learning with a Generative   Model

Mohammad Gheshlaghi Azar (Radboud University); Remi Munos (INRIA; Lille); Bert Kappen (Radboud University)

arXiv:1206.6461·cs.LG·July 3, 2012·ICML·41 cites

On the Sample Complexity of Reinforcement Learning with a Generative Model

Mohammad Gheshlaghi Azar (Radboud University), Remi Munos (INRIA, Lille), Bert Kappen (Radboud University)

PDF

Open Access

TL;DR

This paper establishes tight bounds on the number of samples needed for model-based reinforcement learning to accurately estimate the optimal action-value function in MDPs, improving understanding of sample complexity in RL.

Contribution

It provides the first matching upper and lower bounds on sample complexity for estimating the optimal value function in RL with a generative model, with tight dependence on key parameters.

Findings

01

Derived a PAC bound of O(N log(N/δ)/((1-γ)^3 ε^2)) for sample complexity.

02

Proved a matching lower bound of Θ(N log(N/δ)/((1-γ)^3 ε^2)).

03

Improved the understanding of sample complexity dependence on 1/(1-γ).

Abstract

We consider the problem of learning the optimal action-value function in the discounted-reward Markov decision processes (MDPs). We prove a new PAC bound on the sample-complexity of model-based value iteration algorithm in the presence of the generative model, which indicates that for an MDP with N state-action pairs and the discount factor \gamma\in[0,1) only O(N\log(N/\delta)/((1-\gamma)^3\epsilon^2)) samples are required to find an \epsilon-optimal estimation of the action-value function with the probability 1-\delta. We also prove a matching lower bound of \Theta (N\log(N/\delta)/((1-\gamma)^3\epsilon^2)) on the sample complexity of estimating the optimal action-value function by every RL algorithm. To the best of our knowledge, this is the first matching result on the sample complexity of estimating the optimal (action-) value function in which the upper bound matches the lower…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Advanced Bandit Algorithms Research