Simple and optimal methods for stochastic variational inequalities, II: Markovian noise and policy evaluation in reinforcement learning
Georgios Kotsalis, Guanghui Lan, Tianjiao Li

TL;DR
This paper introduces simple, optimal stochastic algorithms for variational inequalities with Markovian noise, significantly improving convergence rates and enabling effective parallelization, especially for policy evaluation in reinforcement learning.
Contribution
It develops new TD learning algorithms with non-asymptotic analysis, improving convergence rates and parallel implementation over prior methods in stochastic variational inequalities.
Findings
Improved analysis of standard TD algorithm with parallel benefits
Introduction of conditional TD (CTD) with reduced bias and better complexity
Development of fast TD (FTD) with optimal convergence rate
Abstract
The focus of this paper is on stochastic variational inequalities (VI) under Markovian noise. A prominent application of our algorithmic developments is the stochastic policy evaluation problem in reinforcement learning. Prior investigations in the literature focused on temporal difference (TD) learning by employing nonsmooth finite time analysis motivated by stochastic subgradient descent leading to certain limitations. These encompass the requirement of analyzing a modified TD algorithm that involves projection to an a-priori defined Euclidean ball, achieving a non-optimal convergence rate and no clear way of deriving the beneficial effects of parallel implementation. Our approach remedies these shortcomings in the broader context of stochastic VIs and in particular when it comes to stochastic policy evaluation. We developed a variety of simple TD learning type algorithms motivated by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimization and Search Problems · Diffusion and Search Dynamics · Reinforcement Learning in Robotics
