Asymptotic Bias of Stochastic Gradient Search

Vladislav B. Tadic; Arnaud Doucet

arXiv:1709.00291·math.ST·September 4, 2017·CDC/ECC

Asymptotic Bias of Stochastic Gradient Search

Vladislav B. Tadic, Arnaud Doucet

PDF

Open Access

TL;DR

This paper analyzes the long-term bias of stochastic gradient algorithms with biased estimators, providing bounds and insights applicable to high-dimensional nonlinear methods like reinforcement learning and Monte Carlo sampling.

Contribution

It introduces a theoretical framework using dynamic systems and differential geometry to bound the asymptotic bias of biased stochastic gradient algorithms under mild conditions.

Findings

01

Derived tight bounds on asymptotic bias for biased stochastic gradients.

02

Applied results to policy-gradient reinforcement learning.

03

Analyzed asymptotic behavior in hidden Markov model estimation.

Abstract

The asymptotic behavior of the stochastic gradient algorithm with a biased gradient estimator is analyzed. Relying on arguments based on the dynamic system theory (chain-recurrence) and the differential geometry (Yomdin theorem and Lojasiewicz inequality), tight bounds on the asymptotic bias of the iterates generated by such an algorithm are derived. The obtained results hold under mild conditions and cover a broad class of high-dimensional nonlinear algorithms. Using these results, the asymptotic properties of the policy-gradient (reinforcement) learning and adaptive population Monte Carlo sampling are studied. Relying on the same results, the asymptotic behavior of the recursive maximum split-likelihood estimation in hidden Markov models is analyzed, too.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMarkov Chains and Monte Carlo Methods · Target Tracking and Data Fusion in Sensor Networks · Control Systems and Identification