Simple and optimal methods for stochastic variational inequalities, II:   Markovian noise and policy evaluation in reinforcement learning

Georgios Kotsalis; Guanghui Lan; Tianjiao Li

arXiv:2011.08434·math.OC·August 17, 2021

Simple and optimal methods for stochastic variational inequalities, II: Markovian noise and policy evaluation in reinforcement learning

Georgios Kotsalis, Guanghui Lan, Tianjiao Li

PDF

Open Access

TL;DR

This paper introduces simple, optimal stochastic algorithms for variational inequalities with Markovian noise, significantly improving convergence rates and enabling effective parallelization, especially for policy evaluation in reinforcement learning.

Contribution

It develops new TD learning algorithms with non-asymptotic analysis, improving convergence rates and parallel implementation over prior methods in stochastic variational inequalities.

Findings

01

Improved analysis of standard TD algorithm with parallel benefits

02

Introduction of conditional TD (CTD) with reduced bias and better complexity

03

Development of fast TD (FTD) with optimal convergence rate

Abstract

The focus of this paper is on stochastic variational inequalities (VI) under Markovian noise. A prominent application of our algorithmic developments is the stochastic policy evaluation problem in reinforcement learning. Prior investigations in the literature focused on temporal difference (TD) learning by employing nonsmooth finite time analysis motivated by stochastic subgradient descent leading to certain limitations. These encompass the requirement of analyzing a modified TD algorithm that involves projection to an a-priori defined Euclidean ball, achieving a non-optimal convergence rate and no clear way of deriving the beneficial effects of parallel implementation. Our approach remedies these shortcomings in the broader context of stochastic VIs and in particular when it comes to stochastic policy evaluation. We developed a variety of simple TD learning type algorithms motivated by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptimization and Search Problems · Diffusion and Search Dynamics · Reinforcement Learning in Robotics