Stochastic approximation with cone-contractive operators: Sharp $\ell_\infty$-bounds for $Q$-learning
Martin J. Wainwright

TL;DR
This paper establishes sharp non-asymptotic $oldsymbol{ ext{l}_ ext{infty}}$ bounds for $Q$-learning algorithms using stochastic approximation methods based on cone-contractive operators, improving understanding of their sample complexity.
Contribution
It introduces a general framework for analyzing stochastic approximation with cone-contractive operators and derives the sharpest known $ ext{l}_ ext{infty}$ bounds for $Q$-learning.
Findings
Derived non-asymptotic $ ext{l}_ ext{infty}$ bounds for $Q$-learning.
Showed bounds are optimal in a worst-case scenario.
Revealed $Q$-learning's sample complexity is suboptimal compared to model-based $Q$-iteration.
Abstract
Motivated by the study of -learning algorithms in reinforcement learning, we study a class of stochastic approximation procedures based on operators that satisfy monotonicity and quasi-contractivity conditions with respect to an underlying cone. We prove a general sandwich relation on the iterate error at each time, and use it to derive non-asymptotic bounds on the error in terms of a cone-induced gauge norm. These results are derived within a deterministic framework, requiring no assumptions on the noise. We illustrate these general bounds in application to synchronous -learning for discounted Markov decision processes with discrete state-action spaces, in particular by deriving non-asymptotic bounds on the -norm for a range of stepsizes. These results are the sharpest known to date, and we show via simulation that the dependence of our bounds cannot be improved in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods
