Non-asymptotic Analysis of Biased Stochastic Approximation Scheme
Belhal Karimi, Blazej Miasojedow, Eric Moulines, Hoi-To Wai

TL;DR
This paper provides a non-asymptotic analysis of a general stochastic approximation scheme that relaxes common assumptions, applicable to complex tasks like reinforcement learning with biased and non-gradient updates.
Contribution
It extends stochastic approximation analysis to non-convex, biased, and Markov-dependent settings, covering algorithms like online EM and policy-gradient methods.
Findings
Analyzes SA with biased, non-gradient, and Markov-dependent updates.
Applies to online EM and reinforcement learning algorithms.
Provides convergence guarantees under relaxed assumptions.
Abstract
Stochastic approximation (SA) is a key method used in statistical learning. Recently, its non-asymptotic convergence analysis has been considered in many papers. However, most of the prior analyses are made under restrictive assumptions such as unbiased gradient estimates and convex objective function, which significantly limit their applications to sophisticated tasks such as online and reinforcement learning. These restrictions are all essentially relaxed in this work. In particular, we analyze a general SA scheme to minimize a non-convex, smooth objective function. We consider update procedure whose drift term depends on a state-dependent Markov chain and the mean field is not necessarily of gradient type, covering approximate second-order method and allowing asymptotic bias for the one-step updates. We illustrate these settings with the online EM algorithm and the policy-gradient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference
