Closing the gap between SVRG and TD-SVRG with Gradient Splitting
Arsenii Mustafin, Alex Olshevsky, Ioannis Ch. Paschalidis

TL;DR
This paper introduces a novel approach that combines TD learning with SVRG using gradient splitting, achieving a convergence rate comparable to SVRG in convex optimization, supported by theoretical analysis and experiments.
Contribution
It presents a new method that simplifies and fuses TD learning with SVRG, attaining a geometric convergence rate matching that of SVRG in convex settings.
Findings
Achieves geometric convergence rate of 1/8 with fixed learning rate
Theoretical convergence bound matches that of SVRG in convex optimization
Experimental results support the theoretical claims
Abstract
Temporal difference (TD) learning is a policy evaluation in reinforcement learning whose performance can be enhanced by variance reduction methods. Recently, multiple works have sought to fuse TD learning with Stochastic Variance Reduced Gradient (SVRG) method to achieve a geometric rate of convergence. However, the resulting convergence rate is significantly weaker than what is achieved by SVRG in the setting of convex optimization. In this work we utilize a recent interpretation of TD-learning as the splitting of the gradient of an appropriately chosen function, thus simplifying the algorithm and fusing TD with SVRG. Our main result is a geometric convergence bound with predetermined learning rate of , which is identical to the convergence bound available for SVRG in the convex setting. Our theoretical findings are supported by a set of experiments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
