Smoothed functional-based gradient algorithms for off-policy reinforcement learning: A non-asymptotic viewpoint
Nithia Vijayan, Prashanth L. A

TL;DR
This paper introduces two novel off-policy reinforcement learning algorithms using smoothed functional gradient estimation, with non-asymptotic convergence guarantees and improved rates over existing methods.
Contribution
It presents two new policy gradient algorithms with non-asymptotic analysis, incorporating variance reduction and smoothed functional techniques for off-policy RL.
Findings
First algorithm converges similarly to REINFORCE
Second algorithm shows faster convergence rate
Both algorithms have provable non-asymptotic bounds
Abstract
We propose two policy gradient algorithms for solving the problem of control in an off-policy reinforcement learning (RL) context. Both algorithms incorporate a smoothed functional (SF) based gradient estimation scheme. The first algorithm is a straightforward combination of importance sampling-based off-policy evaluation with SF-based gradient estimation. The second algorithm, inspired by the stochastic variance-reduced gradient (SVRG) algorithm, incorporates variance reduction in the update iteration. For both algorithms, we derive non-asymptotic bounds that establish convergence to an approximate stationary point. From these results, we infer that the first algorithm converges at a rate that is comparable to the well-known REINFORCE algorithm in an off-policy RL context, while the second algorithm exhibits an improved rate of convergence.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsREINFORCE
