Smoothed functional-based gradient algorithms for off-policy   reinforcement learning: A non-asymptotic viewpoint

Nithia Vijayan; Prashanth L. A

arXiv:2101.02137·cs.LG·June 25, 2024

Smoothed functional-based gradient algorithms for off-policy reinforcement learning: A non-asymptotic viewpoint

Nithia Vijayan, Prashanth L. A

PDF

TL;DR

This paper introduces two novel off-policy reinforcement learning algorithms using smoothed functional gradient estimation, with non-asymptotic convergence guarantees and improved rates over existing methods.

Contribution

It presents two new policy gradient algorithms with non-asymptotic analysis, incorporating variance reduction and smoothed functional techniques for off-policy RL.

Findings

01

First algorithm converges similarly to REINFORCE

02

Second algorithm shows faster convergence rate

03

Both algorithms have provable non-asymptotic bounds

Abstract

We propose two policy gradient algorithms for solving the problem of control in an off-policy reinforcement learning (RL) context. Both algorithms incorporate a smoothed functional (SF) based gradient estimation scheme. The first algorithm is a straightforward combination of importance sampling-based off-policy evaluation with SF-based gradient estimation. The second algorithm, inspired by the stochastic variance-reduced gradient (SVRG) algorithm, incorporates variance reduction in the update iteration. For both algorithms, we derive non-asymptotic bounds that establish convergence to an approximate stationary point. From these results, we infer that the first algorithm converges at a rate that is comparable to the well-known REINFORCE algorithm in an off-policy RL context, while the second algorithm exhibits an improved rate of convergence.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsREINFORCE