Optimal Matrix Momentum Stochastic Approximation and Applications to   Q-learning

Adithya M. Devraj; Ana Bu\v{s}i\'c; Sean Meyn

arXiv:1809.06277·math.OC·February 7, 2019·5 cites

Optimal Matrix Momentum Stochastic Approximation and Applications to Q-learning

Adithya M. Devraj, Ana Bu\v{s}i\'c, Sean Meyn

PDF

Open Access

TL;DR

This paper introduces two new stochastic approximation algorithms, PolSA and NeSA, with matrix momentum for root finding, demonstrating their optimal asymptotic variance properties and applications to reinforcement learning, especially Q-learning.

Contribution

The paper presents novel algorithms PolSA and NeSA with matrix momentum, extending stochastic approximation methods and achieving optimal asymptotic variance in reinforcement learning applications.

Findings

01

PolSA couples with SNR in non-linear Q-learning settings.

02

PolSA achieves optimal asymptotic covariance.

03

Numerical results confirm coupling in non-ideal models.

Abstract

Acceleration is an increasingly common theme in the stochastic optimization literature. The two most common examples are Nesterov's method, and Polyak's momentum technique. In this paper two new algorithms are introduced for root finding problems: 1) PolSA is a root finding algorithm with specially designed matrix momentum, and 2) NeSA can be regarded as a variant of Nesterov's algorithm, or a simplification of PolSA. The PolSA algorithm is new even in the context of optimization (when cast as a root finding problem). The research surveyed in this paper is motivated by applications to reinforcement learning. It is well known that most variants of TD- and Q-learning may be cast as SA (stochastic approximation) algorithms, and the tools from general SA theory can be used to investigate convergence and bounds on convergence rate. In particular, the asymptotic variance is a common metric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms

MethodsQ-Learning