Theoretical Guarantees of Fictitious Discount Algorithms for Episodic   Reinforcement Learning and Global Convergence of Policy Gradient Methods

Xin Guo; Anran Hu; Junzi Zhang

arXiv:2109.06362·cs.LG·September 15, 2021

Theoretical Guarantees of Fictitious Discount Algorithms for Episodic Reinforcement Learning and Global Convergence of Policy Gradient Methods

Xin Guo, Anran Hu, Junzi Zhang

PDF

Open Access 1 Video

TL;DR

This paper provides the first theoretical convergence guarantees for fictitious discount algorithms in episodic reinforcement learning, demonstrating their bias reduction and establishing global convergence of policy gradient methods in finite-horizon MDPs.

Contribution

It introduces the first non-asymptotic convergence analysis for fictitious discount algorithms and connects different MDP settings to prove global convergence.

Findings

01

Fictitious discount reduces bias in advantage estimation.

02

Algorithms achieve non-asymptotic convergence guarantees.

03

First global convergence proof for policy gradients in finite-horizon RL.

Abstract

When designing algorithms for finite-time-horizon episodic reinforcement learning problems, a common approach is to introduce a fictitious discount factor and use stationary policies for approximations. Empirically, it has been shown that the fictitious discount factor helps reduce variance, and stationary policies serve to save the per-iteration computational cost. Theoretically, however, there is no existing work on convergence analysis for algorithms with this fictitious discount recipe. This paper takes the first step towards analyzing these algorithms. It focuses on two vanilla policy gradient (VPG) variants: the first being a widely used variant with discounted advantage estimations (DAE), the second with an additional fictitious discount factor in the score functions of the policy gradient estimators. Non-asymptotic convergence guarantees are established for both algorithms, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Theoretical Guarantees of Fictitious Discount Algorithms for Episodic Reinforcement Learning and Global Convergence of Policy Gradient Methods· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Electric Vehicles and Infrastructure · Adaptive Dynamic Programming Control