Theoretical Guarantees of Fictitious Discount Algorithms for Episodic Reinforcement Learning and Global Convergence of Policy Gradient Methods
Xin Guo, Anran Hu, Junzi Zhang

TL;DR
This paper provides the first theoretical convergence guarantees for fictitious discount algorithms in episodic reinforcement learning, demonstrating their bias reduction and establishing global convergence of policy gradient methods in finite-horizon MDPs.
Contribution
It introduces the first non-asymptotic convergence analysis for fictitious discount algorithms and connects different MDP settings to prove global convergence.
Findings
Fictitious discount reduces bias in advantage estimation.
Algorithms achieve non-asymptotic convergence guarantees.
First global convergence proof for policy gradients in finite-horizon RL.
Abstract
When designing algorithms for finite-time-horizon episodic reinforcement learning problems, a common approach is to introduce a fictitious discount factor and use stationary policies for approximations. Empirically, it has been shown that the fictitious discount factor helps reduce variance, and stationary policies serve to save the per-iteration computational cost. Theoretically, however, there is no existing work on convergence analysis for algorithms with this fictitious discount recipe. This paper takes the first step towards analyzing these algorithms. It focuses on two vanilla policy gradient (VPG) variants: the first being a widely used variant with discounted advantage estimations (DAE), the second with an additional fictitious discount factor in the score functions of the policy gradient estimators. Non-asymptotic convergence guarantees are established for both algorithms, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Electric Vehicles and Infrastructure · Adaptive Dynamic Programming Control
