Stochastic approximation with cone-contractive operators: Sharp   $\ell_\infty$-bounds for $Q$-learning

Martin J. Wainwright

arXiv:1905.06265·cs.LG·June 25, 2019·23 cites

Stochastic approximation with cone-contractive operators: Sharp $\ell_\infty$-bounds for $Q$-learning

Martin J. Wainwright

PDF

Open Access 1 Repo

TL;DR

This paper establishes sharp non-asymptotic $oldsymbol{ ext{l}_ ext{infty}}$ bounds for $Q$-learning algorithms using stochastic approximation methods based on cone-contractive operators, improving understanding of their sample complexity.

Contribution

It introduces a general framework for analyzing stochastic approximation with cone-contractive operators and derives the sharpest known $ ext{l}_ ext{infty}$ bounds for $Q$-learning.

Findings

01

Derived non-asymptotic $ ext{l}_ ext{infty}$ bounds for $Q$-learning.

02

Showed bounds are optimal in a worst-case scenario.

03

Revealed $Q$-learning's sample complexity is suboptimal compared to model-based $Q$-iteration.

Abstract

Motivated by the study of $Q$ -learning algorithms in reinforcement learning, we study a class of stochastic approximation procedures based on operators that satisfy monotonicity and quasi-contractivity conditions with respect to an underlying cone. We prove a general sandwich relation on the iterate error at each time, and use it to derive non-asymptotic bounds on the error in terms of a cone-induced gauge norm. These results are derived within a deterministic framework, requiring no assumptions on the noise. We illustrate these general bounds in application to synchronous $Q$ -learning for discounted Markov decision processes with discrete state-action spaces, in particular by deriving non-asymptotic bounds on the $ℓ_{\infty}$ -norm for a range of stepsizes. These results are the sharpest known to date, and we show via simulation that the dependence of our bounds cannot be improved in a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lx10077/AveQLearning
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods