Concentration of Contractive Stochastic Approximation and Reinforcement Learning
Siddharth Chandak, Vivek S. Borkar, Parth Dodhia

TL;DR
This paper derives concentration bounds for stochastic approximation algorithms with contractive maps, applying these results to reinforcement learning methods like asynchronous Q-learning and TD(0).
Contribution
It introduces new concentration bounds for stochastic approximation with contractive maps, applicable to reinforcement learning algorithms.
Findings
Concentration bounds from time n0 on for stochastic approximation.
Application to asynchronous Q-learning and TD(0).
Enhanced understanding of convergence behavior.
Abstract
Using a martingale concentration inequality, concentration bounds `from time on' are derived for stochastic approximation algorithms with contractive maps and both martingale difference and Markov noises. These are applied to reinforcement learning algorithms, in particular to asynchronous Q-learning and TD(0).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Advanced Thermodynamics and Statistical Mechanics
MethodsQ-Learning
